Every PDF editor — Acrobat, Smallpdf, Foxit, Sejda, all of them — breaks your document layout the moment you do anything more than fix a typo. It looks like incompetence, like a missing feature, like something a few years of engineering would surely fix. It’s none of those. The cause is the file format itself.
What a PDF actually contains
Open any PDF in a plain text editor — not a viewer, an editor like VS Code or Notepad++. Past the binary header you’ll find streams of operators that look like:
BT /F2 14 Tf 108 742 Td (Service Agreement) Tj ET
That line means: begin text, use font F2 at 14pt, move the cursor to position (108, 742), draw the string “Service Agreement”, end text. That’s how PDF stores text. Every word on every page is a positioned drawing command. The 108 is x-coordinate in points (one point is 1/72 of an inch). The 742 is y-coordinate from the bottom of the page.
There’s no “paragraph” object. No “table cell.” No “column.” The PDF specification — 756 pages of it — defines operators for placing text and drawing shapes. It does not define a layout engine. It does not define reflow.
This wasn’t an oversight. John Warnock, Adobe’s co-founder, wrote in the original Camelot paper) that PDF’s goal was “to communicate documents that look the same on every platform.” The format was designed for one job — pixel-identical rendering across every viewer — and it does that job brilliantly. Reflow was never part of the design.
What “no layout engine” means in practice
Type a paragraph in Google Docs and press Enter in the middle of a sentence. Text after the cursor moves to the next line. The paragraph below shifts down. If there’s a table beneath, the table shifts. If the page fills, content flows to the next page. Headers update. This works because Word and Docs maintain a model: paragraph A is above paragraph B, table C is below them, footer D is at the bottom. Change one thing, the engine recalculates everything.
PDF maintains nothing. Element A is at (108, 742). Element B is at (108, 710). There is no “above-below” relationship — they happen to have y-coordinates that visually order them, and that’s all. They don’t reference each other.
When you edit text in Acrobat, you’re changing the string at (108, 742). Acrobat doesn’t move element B because it has no concept of “down from A.” Element B sits at (108, 710) regardless. Make the text longer? It bleeds past 710 and visually overlaps. Shorter? A gap opens. Either way, nothing else moves.
This isn’t an Acrobat limitation. It’s a PDF limitation. Every tool that edits PDF natively hits the same wall — Foxit, Sejda, Smallpdf, all of them. The deeper consequence is the overlay editing pattern that breaks tables and paragraphs: the only thing you can do with positioned drawing commands is overlay more positioned drawing commands.
Why browsers don’t have this problem
HTML solved structure description in the 1990s. <table>, <p>, <h1> — each tag carries semantic meaning, and CSS describes how to lay them out: this paragraph has a 20px margin-bottom, this table takes 100% width, these cells distribute evenly. The browser’s layout engine takes the structure and the rules and computes positions. When content changes the engine recomputes.
So fundamental to the web that no one thinks about it. Of course adding a paragraph pushes content down. Of course tables grow. It would be bizarre if they didn’t.
In PDF land, that “of course” doesn’t exist. The format predates CSS. It was designed for rendering, not authoring.
The translation approach
Once you accept that PDF can’t reflow, the solution is to stop trying. Edit something that already does reflow — HTML — and render it back to PDF when you’re done.
That’s the architecture underneath ReflowPDF. The pipeline runs in three stages.
First, an AI model reads the PDF’s positioned text and drawn lines and reconstructs document structure. Twenty text fragments and fifteen lines that visually look like a table become a real <table> with <tr> rows and <td> cells. The signals it weighs and where it occasionally gets things wrong are covered in how AI PDF-to-HTML conversion actually works.
Second, you edit the HTML in a browser-native visual editor. The browser’s layout engine handles every reflow naturally — same engine that’s been doing this for thirty years.
Third, the edited HTML gets rendered back to PDF by FlexPDF, the layout engine I built for this. PDF in, PDF out, but the editing happens in a format that was designed for it.
The hardest problem: pagination that knows about layout
Most PDF rendering libraries treat pagination as a post-processing step. Run layout once across an infinite canvas, then walk down and split into pages. This works fine for pure text. It breaks the moment a flex container or a multi-row table crosses a page boundary.
The reason is feedback. Splitting a flex container across two pages changes the available height on the first page, which changes how flex items distribute, which moves where the next break should happen. Splitting a table mid-row changes header repetition, which changes the height available for content. Pagination and layout aren’t two separate problems — they’re the same problem with mutual dependencies.
Every existing library I studied solves them as separate passes. That’s why they all fail at multi-page tables and at flex containers crossing pages. Building FlexPDF meant collapsing the two passes into one — same engine resolves layout and page breaks together, with the splitter able to ask the layout to re-resolve when the available space shifts.
The implementation went through four rewrites before the edge cases stopped showing up. Tables with rowspan that span three pages, flex containers nested inside flex containers, page-break-inside: avoid honored against best-fit splits, header rows that have to repeat at the right column widths after pagination has shifted them — those are the cases that broke library after library. They had to work here, and getting them to work meant abandoning the standard architecture.
The layout engine underneath
Most HTML-to-PDF tools take the lazy route and shell out to a headless browser. Wkhtmltopdf is a Webkit fork. Browsershot wraps Puppeteer/Chromium. They work but they’re heavy: 300-500MB of RAM per process, multi-second startup, brittle to deploy on a server.
FlexPDF is the alternative. Pure PHP, zero runtime dependencies, implementing the CSS specs directly: Flexbox §9 main and cross axis sizing, grow/shrink distribution, frozen-item iteration, multi-line wrapping; CSS Grid; table layout with column spanning, row spanning, header repetition; block and inline formatting. A typical document renders in under 200ms.
Two consequences. The editor can show accurate previews without firing up a browser process. And batch generation — monthly invoice runs, scheduled reports, document automation — stays fast and cheap to run in production.
Round-trip via embedded source
There’s one remaining problem with the conversion approach: AI is deterministic for a given model version, but model updates can shift output. A column might come out 1pt narrower next year. A heading classified differently. Re-converting an old PDF gives slightly different HTML than last time.
The fix is to skip re-conversion entirely on PDFs that came from this pipeline in the first place. Every PDF exported from ReflowPDF carries an embedded source stream: zlib-compressed, AES-256-GCM-encrypted HTML stored in a custom metadata stream inside the PDF. When the file is reopened in ReflowPDF, the source is decrypted and loaded directly. No AI runs. The document opens at exactly the state it was exported in, instantly.
The PDF stays a normal PDF for every other reader. The embedded source is invisible until ReflowPDF detects it.
What changes for you
For routine PDF work — adjusting invoices, updating quarterly reports, customizing proposals — the practical difference is the cleanup tax.
In overlay editors, every edit is followed by manual repositioning. Add a line, drag everything below. Change a number, fix the column alignment. Insert a section, walk through the next ten pages adjusting position. The edit takes ten seconds; the cleanup takes twenty minutes; the result still looks edited.
In a structural editor, the edit is the edit. Add a line, content shifts. Change a number, the column adjusts. Insert a section, page breaks recompute. Cleanup time goes to zero because the layout engine is doing the work that used to be manual.
PDF can’t reflow because it was designed not to. Translate to HTML, edit with reflow, translate back — that’s the whole loop, and once you have it, the document adapts instead of breaking.