Task: Object Detection · Explosion · Developer tools and consulting for AI, Machine Learning and NLP

Explosion builds developer tools for AI, Machine Learning and Natural Language Processing. →
Consulting

Project

Topics

Tasks

Authors

Building Multimodal Corpora Using Microtask Pipelines and Local Annotators Hotti, Vázquez, Jokipohja, Kalliokoski, Paakki, Suviranta, Hiippala (2026)

To create the infrastructure needed for supporting this effort, we repurpose an existing commercial annotation tool, Prodigy, which we then enhance with additional components for combining the annotation tasks into pipelines, cross-validating the annotations and supporting annotator access to tasks.

📚 spacy-layout v0.0.12Mar 8, 2025

Support processing PDFs with context, add document index tables and more docs

📚 spacy-layout v0.0.1Nov 18, 2024

Process PDFs, Word documents and more with spaCy

Describing Images Fast and Slow: Quantifying and Predicting the Variation in Human Signals during Visuo-Linguistic Processes Takmaz, Pezzelle, Fernández (2024)

We use the spaCy library for tokenization, part-of-speech tagging, and lemmatization of the words in the descriptions.

Conquering PDFs: document understanding beyond plain text PyData London

In this talk, Ines presents a new and modular approach for building robust document understanding systems, using state-of-the-art models and the awesome Python ecosystem.

📚 spacy-layout v0.0.6Nov 24, 2024

Add support for tables and convert tabular data to pandas.DataFrame

Microsoft Presidio v2.2.352

Context aware, pluggable and customizable PII de-identification and anonymization service for text and images, featuring a spaCy back-end.

🔌 prodigy-segment v0.1.0Dec 13, 2023

Select pixels in Prodigy via Meta’s “Segment Anything” model

Conquering PDFs: document understanding beyond plain text PyCon DE & PyData

In this talk, Ines presents a new and modular approach for building robust document understanding systems, using state-of-the-art models and the awesome Python ecosystem.

🔌 prodigy-pdf v0.4.0Nov 25, 2024

Add text-based span annotation for PDFs

Prodigy-ANN for Image Retrieval via CLIP

Dealing with a huge bucket of images that you want to annotate? The new image retrieval features in Prodigy-ANN (approximate nearest neighbors) might help!

🔌 prodigy-pdf v0.1.0Oct 5, 2023

Annotate and segment PDF files and perform OCR

From PDFs to AI-ready structured data: a deep dive

This blog post presents a new modular workflow for converting PDFs and similar documents to structured data and shows you how to build end-to-end document understanding and information extraction pipelines for industry use cases.

🔌 prodigy-pdf v0.3.0Nov 18, 2024

Support multi-page PDFs in a single view

Prodigy-PDF for PDF annotation and OCR

Want to annotate PDF files? Our new Prodigy plugin can help with that! To explain how to use PDF segmentation and OCR, Vincent made a small demo video.