Question 1

How does this compare to using PyMuPDF or similar?

Accepted Answer

PyMuPDF gives you raw text with positions. We give you Markdown with detected structure. If you have engineering capacity, PyMuPDF is more flexible. If you want a one-shot conversion, this is faster.

Question 2

What's the chunking strategy?

Accepted Answer

Output Markdown has explicit heading levels — chunk on H1/H2/H3 boundaries for naturally semantic chunks. Page breaks are also marked with horizontal rules.

PDF to Markdown for RAG pipelines

Frequently asked questions