PDF → CSV — Table-Focused Text Extraction

Heuristic layout from PDF text positions: works on selectable text and grid-like tables; misaligned columns and mixed images are handled approximately.

Text-based PDF → CSV — Uses PDF.js text positions to infer rows and columns. Works best on selectable text and tables; images and skew are handled only approximately.

Drop a PDF here

All parsing and export runs locally — files are not uploaded.

PDF to CSV helps you heuristic layout from pdf text positions: works on selectable text and grid-like tables; misaligned columns and mixed images are handled approximately. It is commonly used by professionals, students, general users for pdf to csv, extract table from pdf, pdf table to csv.

How PDF to CSV Works

  1. **Choose a text-based PDF** — The file should contain real text objects (you can select/copy text in a PDF reader). Pure image scans are not supported here.
  2. **Page range** — After loading, you’ll see the total page count and can set **Start page** and **End page** (inclusive) so only that slice is converted.
  3. **Extract** — PDF.js reads each selected page, collects each text fragment with its coordinates, groups lines by baseline (rows), then aligns fragments to inferred columns.
  4. **Download CSV** — UTF-8 comma-separated values. A **Page** column is added only when the extracted rows come from **more than one PDF page** within your selected range (so a wide page range with text on a single page does not produce a duplicate page-number column).

Limits & Accuracy

  • **Heuristics, not OCR** — This is not a substitute for OCR on scanned documents.
  • **Images and graphics** are ignored; only text operators contribute to the grid.
  • **Complex layouts** (multi-column articles, footnotes, rotated text) may produce extra columns or merged cells; review the CSV in a spreadsheet app.

Privacy

All processing happens in your browser; your PDF is not uploaded.

Frequently Asked Questions