Extract text from PDF online — free
Pull plain text out of any PDF in seconds. UTF-8 output preserves Unicode. Run OCR first for scans.
Need to copy the text out of a PDF for editing, search, or analysis? We use poppler's pdftotext to extract plain UTF-8 text — multi-column layouts read correctly, Unicode characters round-trip cleanly.
For scanned PDFs (images), run OCR PDF first so the image becomes searchable, then extract.
Frequently asked questions
Does it preserve formatting?
It preserves text content and reading order. Formatting (bold, italic, font sizes) is not preserved — that's what plain text means. For rich-format conversion, use PDF to Word or PDF to Markdown.
Why is the output empty or weird?
Most likely your PDF is a scan with no text layer. Run OCR PDF first, then come back here.
Multi-column papers?
Poppler's text extraction handles standard 2-column academic layouts well. The output reads in correct column order.