PDFPlain .txt

Extract text from a PDF

Drop a PDF below and folio reads every page's selectable text using pdf.js, then offers it as a plain `.txt` file. Everything runs in your browser — your document never leaves your device.

Drop your PDF here

Pull out every line of selectable text as plain .txt

PDF
Upload a file to start.

One PDF up to 100 MB. Selectable text only — no OCR.

Every operation runs entirely in your browser. Your files never leave your device.

About Extract text

What kind of text can folio extract?

folio captures the selectable text stored inside the PDF — the same content you'd get by highlighting and copying inside a PDF viewer. Page breaks are marked with `--- Page N ---` headers so you can stitch context back together.

Why is the output empty?

Scanned PDFs are typically just images of text — the document has no embedded text layer, so there's nothing to extract. You'd need OCR (optical character recognition) to recognise the words, which folio doesn't ship yet.

Does folio preserve formatting?

No. The output is plain text, optimised for searching, indexing, or feeding into another tool. Headings, columns, tables and inline styles are flattened — that's by design for a `.txt` export.

About this operation

PDFTXT

What it does

folio walks every page with pdf.js and dumps the selectable text into a single `.txt` file. Page breaks are marked with `--- Page N ---` headers so you can stitch context back together. This is exactly the content you'd get by highlighting and copying inside a PDF viewer — useful for search indexing, programmatic processing, or pasting into another tool. Scanned PDFs (image-only with no text layer) return empty output because there is nothing to extract; that needs OCR, which folio does not ship.

When to use it

  • Feed a PDF's words into a search index
  • Quickly grep a long document for a phrase
  • Paste a transcript's content into a doc editor
  • Process PDF content with a script

Limitations — what it doesn't do

  • Text layer only — does not OCR scanned-image PDFs
  • Does not preserve formatting, tables or column structure
  • Does not extract images or vector graphics
  • Cannot extract text from password-protected PDFs
  • Output is one big .txt per input PDF — no per-page splitting (use Split first)

Frequently asked questions

Your PDFs never leave your device

folio is a static page. Every operation runs inside your browser via pdf-lib (edit) and pdfjs (render). There is no server-side processing, no upload, no temporary file, no cache. When you close this tab, every file is gone.

  • No account required.
  • No server processing. Your PDFs stay on your device.
  • No caching, no Service Worker, no IndexedDB persistence.
  • pdfjs-dist (lazy-loaded for rendering) is fetched from your own origin; nothing else is sent.