PDF → Word — Images: extraction and anchoring
A PDF holds an image as an object: bytes, metadata, a transformation matrix that places it on the page. Word holds an image as either an inline run inside a paragraph or a floating object anchored to a paragraph with text-wrap rules. The converter has to choose how to anchor each picture, and the choice determines whether the document stays editable or scatters into disconnected fragments the moment anyone touches it.
What a PDF image is
The page content stream contains an instruction like
Do /Image1. This draws an XObject named Image1
declared in the page’s resources. The object itself holds:
- The image bytes, encoded as JPEG, PNG, JBIG2 (for bitonal scans), JPEG 2000, or an uncompressed raw bitmap.
- Metadata — pixel dimensions, color space (DeviceRGB, DeviceCMYK, DeviceGray, Indexed), bits per component, optional ICC profile.
- Optionally, a mask for transparency — stencil or soft mask.
Immediately before the Do command, the content stream
sets a CTM (Current Transformation Matrix) that defines
the rectangle the image will be drawn into. The CTM carries width,
height, rotation, and offset.
What the converter must do
- Extract every image XObject as a separate file.
- Decode when re-scaling or format conversion is needed.
- Read the CTM at each placement to find where the image lands on the page.
- Bind to a paragraph — choose which paragraph the image attaches to.
- Convert to a Word-supported format (PNG, JPEG, sometimes EMF for vectors).
- Pack the file into
/word/media/inside the.docxZIP and reference it from the host paragraph.
Format translation
Word accepts PNG, JPEG, GIF, TIFF, BMP, EMF, WMF, and (starting with Office 2016 / Version 1611, December 2016 and Microsoft 365) static SVG (animation and embedded scripts get stripped on save). The translation table:
- JPEG → JPEG. Byte copy, lossless.
- PNG → PNG. Byte copy, lossless.
- JBIG2 → PNG. JBIG2 (ISO/IEC 14492 for 1-bit images, supported in PDF since 1.4) cannot be opened by Word. Decode and re-encode as PNG. The output file is somewhat larger but readable everywhere. JBIG2’s lossy mode has a notorious history of corrupting digits in scanned documents — the Xerox 2013 incident — so a JBIG2-decoded scan is not necessarily a bit-faithful reproduction of what was originally on paper. OCR run against such an image inherits the substitution.
- JPEG 2000 → JPEG. Word does not support JPEG 2000. Decode and re-encode as ordinary JPEG. Some quality loss; JPEG and JPEG 2000 do not share a pixel representation.
- Raw bitmap → PNG. Wrap the bitmap in a PNG container.
If the source image carries a transparency mask, transparency survives only through PNG. Word does not support transparency in JPEG.
EMU and the size calculation
Word measures everything in EMU (English Metric Units): 914,400 EMU = 1 inch. The number was chosen so that an inch (914,400), a centimeter (360,000), and a typographic point (12,700) all come out to integers — no rounding when converting between units.
For images:
- The PDF places the image in points via the CTM. 1 pt = 12,700 EMU.
- A pixel size at 96 DPI is 9,525 EMU; scale linearly for other DPIs.
The converter reads the CTM, computes the on-page size in points, multiplies by 12,700, and writes the EMU value into the Word XML.
DPI and downsampling
Most converters expose a target DPI for image output to keep file sizes manageable. The option goes by various names — “Image quality”, “Image resolution”, “Downsample images”. Common targets:
- 600 DPI — commercial print.
- 300 DPI — standard office quality.
- 150 DPI — screen-quality, view-only.
- 96 DPI — low-quality, small files.
The converter computes the image’s actual DPI (pixel size divided by on-page size) and downsamples to the target if the actual is higher. If the actual is lower, the image passes through unchanged.
Inline versus floating
Inline images sit inside a paragraph as oversized
characters. Text wraps above and below but not on the sides. Editing the
paragraph moves the image with it. The XML is
<w:p>...<w:r><w:drawing>...inline image...</w:drawing></w:r>...</w:p>.
Simplest, most editable.
Floating images live outside the text flow, anchored
to a paragraph. Text wraps them on the left, right, top, or bottom
depending on settings. The XML is
<w:drawing><wp:anchor>...</wp:anchor></w:drawing>
with wrap parameters. More complex but reproduces complex layouts more
faithfully.
The choice rule:
- Image takes the full text width, or close to it → inline.
- Image is column-wide with text above and below → inline.
- Text wraps the image on one side → floating with
wrap-squareorwrap-tight. - Image is a background with text on top → floating, behind text.
- Image sits in a corner with no clear text binding → floating, anchored to the nearest paragraph.
A subtle CTM trap: image coordinates come from the transformation matrix, and the Y component can be negative to compensate for the image’s own flipped coordinate system. A converter that ignores the sign emits an upside-down image. This is a common cause of accidentally flipped illustrations in converted output.
Paragraph anchoring
A floating image needs a host paragraph; when the host moves, the image moves with it. The selection rule is “the first paragraph that starts above or at the image’s top”. A mid-page image attaches to the paragraph that starts roughly at its top edge.
A page with no paragraphs — a full-page image — gets a special empty placeholder paragraph that Word creates to hold the anchor.
Vector graphics
PDF supports a rich vector model with paths, gradients, and stroke/fill combinations. Word supported vectors via EMF/WMF historically and added SVG in Office 2016/365. PDF→Word tools rarely emit SVG. Realistic options:
- Convert to EMF. Hard, because not every PDF primitive has an EMF equivalent and detail is lost in the gaps.
- Rasterize to PNG. Safe and universal, but the image is now pixel-bound — enlarging it in Word produces blur.
Most converters take option 2: rasterize at 300 DPI and insert as PNG. Acceptable for viewing; vectors are no longer editable in Word.
Where images break
- Overlapping images. PDF stacks images freely — black background with a white logo on top. Word does not preserve the relationship reliably and may insert only one of the layers.
- Soft masks and alpha transparency. Word supports PNG alpha, but converting an SMask to a clean PNG alpha channel takes care; the result is usually acceptable but rarely exact. The PDF compression series covers the underlying mechanics.
- Oversized source images. An 8000×6000-pixel image rendered into a tiny rectangle bloats the Word file by megabytes. Without downsampling, the converter inserts the full pixel grid. The fix is to shrink to the displayed size plus a margin for resizing.
- Many small images packed together. A row of 16×16 icons drawn side by side is N separate XObjects in the PDF. The converter inserts each one as a separate inline image, and Word stacks them vertically rather than rebuilding the horizontal arrangement automatically.
What gets dropped
- PDF layers (OCG). Word has no layer concept.
- 3D models, video annotations, audio. Ignored entirely.
- Images inside form XObjects. Usually extracted, but the surrounding form context is lost.
Images appear in the output at roughly correct dimensions in roughly correct positions. The risks are file size (without downsampling the document balloons), placement in complex layouts (wrap and overlap rarely survive intact), and vector editability (always lost). Standard reports convert cleanly; creative layouts always need cleanup.