📚 spacy-layout v0.0.12Mar 8, 2024Support processing PDFs with context, add document index tables and more docs
Best Way to OCR a PDF in Python Python Tutorials for Digital HumanitiesTutorial by WJB Mattingly on how to use the new spaCy Layout package and Docling to convert PDFs to text.
From PDFs to AI-ready structured data: a deep diveThis blog post presents a new modular workflow for converting PDFs and similar documents to structured data and shows you how to build end-to-end document understanding and information extraction pipelines for industry use cases.
Prodigy-PDF for PDF annotation and OCRWant to annotate PDF files? Our new Prodigy plugin can help with that! To explain how to use PDF segmentation and OCR, Vincent made a small demo video.