Topic: Annotation · Explosion · Developer tools and consulting for AI, Machine Learning and NLP

Explosion builds developer tools for AI, Machine Learning and Natural Language Processing. →
Consulting

Project

Topics

Category

Tasks

Authors

Filtered by topic: Annotation

Conquering PDFs: document understanding beyond plain text

Conquering PDFs: document understanding beyond plain text PyData London

In this talk, Ines presents a new and modular approach for building robust document understanding systems, using state-of-the-art models and the awesome Python ecosystem.

✨ prodigy v1.18.0Feb 24, 2025

Text editing during NER and span annotation, custom translations and more JavaScript features

From PDFs to AI-ready structured data: a deep dive

From PDFs to AI-ready structured data: a deep dive

This blog post presents a new modular workflow for converting PDFs and similar documents to structured data and shows you how to build end-to-end document understanding and information extraction pipelines for industry use cases.

🔌 prodigy-pdf v0.3.0Nov 18, 2024

Support multi-page PDFs in a single view

A practical guide to human-in-the-loop distillation

A practical guide to human-in-the-loop distillation

This blog post presents practical solutions for using the latest state-of-the-art models in real-world applications and distilling their knowledge into smaller and faster components that you can run and maintain in-house.

Towards Structured Data: LLMs from Prototype to Production

Towards Structured Data: LLMs from Prototype to Production U.S. Census Bureau: Center for Optimization and Data Science Seminar

This talk presents pragmatic and practical approaches for how to use LLMs beyond just chat bots, how to ship more successful NLP projects from prototype to production and how to use the latest state-of-the-art models in real-world applications.

🔌 prodigy-segment v0.1.0Dec 13, 2023

Select pixels in Prodigy via Meta’s “Segment Anything” model

Prodigy in 2023: LLMs, task routers, QA and plugins

Prodigy in 2023: LLMs, task routers, QA and plugins

We have made a ton of new updates in Prodigy this year with v1.12, v1.13, and v1.14 releases. So we decided to write a post about them.

🔌 prodigy-hf v0.1.0Oct 23, 2023

Train Hugging Face models with Prodigy annotations

🔌 prodigy-pdf v0.1.0Oct 5, 2023

Annotate and segment PDF files and perform OCR

Models as annotators in Prodigy

Models as annotators in Prodigy

How to use models and LLMs as annotators to find disagreements and prioritize examples to annotate first.

✨ prodigy v1.13.0Aug 15, 2023

LLM support for NER, text classification and span categorization

How Good is the Model in Model-in-the-loop Event Coreference Resolution Annotation?

How Good is the Model in Model-in-the-loop Event Coreference Resolution Annotation?Ahmed, Nath, Regan, Pollins, Krishnaswamy, Martin (2023)

Figure 6 illustrates the interface design of the annotation methodology on the popular model-in-the-loop annotation tool - Prodigy. We use this tool for the simplicity it offers in plugging in the various ranking methods we explained.

Finetuning and Bulk Labelling Images with Prodigy

Finetuning and Bulk Labelling Images with Prodigy

In this video, we’ll show how you might be able to improve the annotation experience by using bulk labelling for image classification.

Finding Bad Labels for Text Classification with Jupyter and Prodigy

Finding Bad Labels for Text Classification with Jupyter and Prodigy

In this video, we’ll show you how to use set up Prodigy to find bad labels in text classification tasks. While many of the techniques are applied to text classification, they can also be used for classification tasks in general.

skweak v0.3.1

skweak v0.3.1

Weak supervision and flexible label functions and agrregation, integrated with spaCy.

Image Captioning with Prodigy & PyTorch

Image Captioning with Prodigy & PyTorch

In this video, we’ll show you how you can use Prodigy to script fully custom annotation workflows in Python, how to plug in your own machine learning models and how to mix and match different interfaces for your specific use case.

✨ prodigy v1.9.0Dec 18, 2019

Custom UI blocks, text input UI, better training and data conversion

Training a new entity type with Prodigy – annotation powered by active learning

Training a new entity type with Prodigy – annotation powered by active learning

In this video, we’ll show you how to use Prodigy to train a phrase recognition system for a new concept. Specifically, we’ll train a model to detect references to drugs, using text from Reddit.

Supervised learning is great — it's data collection that's broken

Supervised learning is great — it's data collection that's broken

Short of Artificial General Intelligence, we'll always need some way of specifying what we're trying to compute. Labelled examples are a great way to do that, but the process is often tedious. However, the dissatisfaction with supervised learning is misplaced. Instead of waiting for the unsupervised messiah to arrive, we need to fix the way we're collecting and reusing human knowledge.

Conquering PDFs: document understanding beyond plain text

Conquering PDFs: document understanding beyond plain text PyCon DE & PyData

In this talk, Ines presents a new and modular approach for building robust document understanding systems, using state-of-the-art models and the awesome Python ecosystem.

Prodigy Dashboard Plugin

Prodigy Dashboard Plugin

The new dashboard plugin adds a web application for managing annotations, data analytics and annotation progress, and is now available for early beta testing.

Serverless custom NLP with LLMs, Modal and Prodigy

Serverless custom NLP with LLMs, Modal and Prodigy

In this blog post, we’ll show you how you can go from an idea and little data to a fully custom information extraction model using Prodigy and Modal, no infrastructure or GPU setup required.

✨ prodigy v1.16.0Oct 22, 2024

Modal plugin for on-demand deployment, cross-platform wheels and UI fixes

How S&P Global is making markets more transparent with NLP, spaCy and Prodigy

How S&P Global is making markets more transparent with NLP, spaCy and Prodigy

A case study on S&P Global’s efficient information extraction pipelines for real-time commodities trading insights in a high-security environment.

ZenML v0.58.0

ZenML v0.58.0

New out-of-the-box Prodigy integration in ZenML for LLMs and beyond, to make data development and annotation a core part of your MLOps lifecycle.

✨ prodigy v1.14.12Dec 13, 2023

Audio UI improvements, resetQueue callback, prodigy-segment plugin

Who said what: using machine learning to correctly attribute quotes

Who said what: using machine learning to correctly attribute quotes The Guardian Engineering Blog

How the Guardian uses spaCy and Prodigy to train a custom coreference resolution model.

✨ prodigy v1.14.3Oct 6, 2023

Inter-annotator agreement for document-level and token-level annotations, new plugins

✨ prodigy v1.14.1Sep 29, 2023

Custom event hooks for custom UI interactivity

Task Routers in Prodigy

Task Routers in Prodigy

How to use the new task routers to customize how examples are assigned in multi-annotator workflows.

✨ prodigy v1.12.0Jul 5, 2023

LLM-assisted workflows for annotation and prompt engineering, task routing for multi-annotator setups

Rulers, NER, and data iteration

Rulers, NER, and data iteration

About the power of Rules + ML and the importance of iteration on your pipeline and your data.

How the Guardian approaches quote extraction with NLP

How the Guardian approaches quote extraction with NLP

A case study of the Guardian's spaCy-Prodigy workflow to modularize quote extraction for content creation. This study includes iterative annotation guidelines and custom interface functionality.

Diary of a spaCy project: Predicting GitHub Tags

Diary of a spaCy project: Predicting GitHub Tags

Many people assume that working on an NLP project involves a lot of machine learning. Our experience is that it's much less about flowing tensors, and more about making a tailored solution. This blogposts demonstrates how a typical spaCy project could be initiated, implemented and executed towards a custom solution.

Talking sense: using machine learning to understand quotes

Talking sense: using machine learning to understand quotes The Guardian Blog

How the Guardian uses spaCy and Prodigy to train a machine learning model that helps extract quotes from news articles and match them to the correct source.

Training a Named Entity Recognition Model with Prodigy and Transfer Learning

Training a Named Entity Recognition Model with Prodigy and Transfer Learning

In this video, we’ll show you how to use Prodigy to train a named entity recognition model from scratch, by taking advantage of semi-automatic annotation and modern transfer learning techniques.

✨ prodigy v1.8.0May 20, 2019

Support for spaCy v2.1, basic auth, multi-user sessions, review workflow & more

Training an insults classifier with Prodigy in ~1 hour

Training an insults classifier with Prodigy in ~1 hour

In this video, we’ll show you how to use Prodigy to train a classifier to detect disparaging or insulting comments. Prodigy makes text classification particularly powerful, because you can try out new ideas very quickly.

How to advocate for modular NLP in the age of Generative AI

How to advocate for modular NLP in the age of Generative AI

With all the hype around Generative AI, many are led to believe it’s the solution to everything. So how can you, as a developer, communicate the nuances and advocate for new and modular solutions that are better, easier and cheaper?

🔌 prodigy-pdf v0.4.0Nov 25, 2024

Add text-based span annotation for PDFs

Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation

Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation InfoQ Dev Summit

LLMs have enormous potential, but also challenge existing workflows in industry that require modularity, transparency and data privacy. In this talk, Ines shows some practical solutions for using the latest models in real-world applications and distilling their knowledge into smaller and faster components that you can run and maintain in-house.

How to uncover and avoid structural biases in evaluating your Machine Learning/NLP projects

How to uncover and avoid structural biases in evaluating your Machine Learning/NLP projects PyData London

This talk highlights common pitfalls that occur when evaluating ML and NLP approaches. It provides comprehensive advice on how to set up a solid evaluation procedure in general, and dives into a few specific use-cases to demonstrate artificial bias that unknowingly can creep in.

🔌 prodigy-evaluate v0.1.0Mar 26, 2024

Evaluate spaCy pipelines, print confusion matrices and more

How Nesta uses NLP to process 7m job ads and shed light on the UK’s labor market

How Nesta uses NLP to process 7m job ads and shed light on the UK’s labor market

A case study on Nesta’s workflow for extracting 7 million job ads to better understand UK skill demand, using a custom mapping step to match skills to any government taxonomy.

🔌 prodigy-whisper v0.1.0Nov 12, 2023

Audio transcription with OpenAI’s Whisper model in the loop

Prodigy-ANN for Image Retrieval via CLIP

Prodigy-ANN for Image Retrieval via CLIP

Dealing with a huge bucket of images that you want to annotate? The new image retrieval features in Prodigy-ANN (approximate nearest neighbors) might help!

🔌 prodigy-lunr v0.1.0Oct 5, 2023

Document search via LUNR to fetch relevant data subsets to label

✨ prodigy v1.13.2Sep 7, 2023

New LLM recipes for terms generation and prompt engineering

ACL LAW Workshop Poster

ACL LAW Workshop Poster ACL 2023

Training spaCy NER Models with Prodigy

Training spaCy NER Models with Prodigy

This handy flowchart contains our most common tips, tricks, and best practices for training and updating spaCy named entity recognition models with Prodigy.

Finding Video Games with Sense2Vec

Finding Video Games with Sense2Vec

In this video, we’ll show how you can improve the annotation experience by leveraging sense2vec to pre-fill named entities.

Finding Bad Image Data using UMAP and Prodigy

Finding Bad Image Data using UMAP and Prodigy

In this video, we’ll show you how to use Prodigy to find bad examples in the Google QuickDraw dataset. We will be leveraging a technique that involves UMAP to find strange images semi-automatically.

✨ prodigy v1.11.0Aug 12, 2020

spaCy v3 support, annotation for overlapping and nested spans, better installation & more

Corpus-Level Evaluation for Event QA: The IndiaPoliceEvents Corpus Covering the 2002 Gujarat Violence

Corpus-Level Evaluation for Event QA: The IndiaPoliceEvents Corpus Covering the 2002 Gujarat Violence Halterman, Keith, Sarwar, O’Connor (2021), ACL 2021

Figure A2 shows a stylized version of the custom interface we built using the Prodigy annotation tool. Annotators are presented with an entire document, with sentences sequentially highlighted.

FAQ #1: Tips & tricks for NLP, annotation & training with Prodigy and spaCy

FAQ #1: Tips & tricks for NLP, annotation & training with Prodigy and spaCy

In this video, Ines talks about a few frequently asked questions and shares some general tips and tricks for how to structure your NLP annotation projects, how to design your label schemes and how to solve common problems.

Building Prodigy: Our new tool for efficient machine teaching

Building Prodigy: Our new tool for efficient machine teaching ines.io

The philosophy behind Prodigy’s features and its cloud-free design.

How Love Without Sound helps the music industry recover millions in revenue for artists with NLP, spaCy and Prodigy

How Love Without Sound helps the music industry recover millions in revenue for artists with NLP, spaCy and Prodigy

A case study on Love Without Sound’s innovative AI-powered tools for the music industry and law firms specializing in royalty negotiations.

✨ prodigy v1.17.0Nov 18, 2024

Pages UI for multi-page tasks like longer documents, PDFs or collections of images

Practical Tips for Bootstrapping Information Extraction Pipelines

Practical Tips for Bootstrapping Information Extraction Pipelines DataHack Summit

This talk presents approaches for bootstrapping NLP pipelines and retrieval via information extraction, including tips for training, modelling and data annotation.

Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation

Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation PyData London

LLMs have enormous potential, but also challenge existing workflows in industry that require modularity, transparency and data privacy. In this talk, Ines shows some practical solutions for using the latest models in real-world applications and distilling their knowledge into smaller and faster components that you can run and maintain in-house.

✨ prodigy v1.15.0Feb 15, 2024

New company plugins and support for SSO

Prodigy-Segment for Pixel Segmentation

Prodigy-Segment for Pixel Segmentation

Use Meta’s “Segment Anything” model in Prodigy to help you select the right pixels in images.

✨ prodigy v1.14.5Oct 24, 2023

Toggle for character vs. token highlighting, CSS and JS from local and remote paths

Prodigy-PDF for PDF annotation and OCR

Prodigy-PDF for PDF annotation and OCR

Want to annotate PDF files? Our new Prodigy plugin can help with that! To explain how to use PDF segmentation and OCR, Vincent made a small demo video.

🔌 prodigy-ann v0.1.0Oct 5, 2023

Use ANN techniques to fetch relevant data subsets to label

✨ prodigy v1.13.1Aug 23, 2023

Use models and LLMs as annotators to find disagreements

Prodigy v1.12: OpenAI integration, prompt engineering, task routers, deployment docs and more

Prodigy v1.12: OpenAI integration, prompt engineering, task routers, deployment docs and more

Custom Interfaces with blocks

Custom Interfaces with blocks

You can create custom annotation layouts in Prodigy using the annotation widgets that Prodigy provides by using the blocks feature. This video explains how to use this feature by building a custom interface that can manually annotate and transcribe audio.

Bulk Labelling and Prodigy

Bulk Labelling and Prodigy

In this video, we’ll show a bulk labelling technique that can help you prepare data for Prodigy.

Finding Duplicates in Tabular Data with Jupyter and Prodigy

Finding Duplicates in Tabular Data with Jupyter and Prodigy

In this video, we’ll show you how to use Prodigy to train a named entity recognition model from scratch, by taking advantage of semi-automatic annotation and modern transfer learning techniques.

Prodigy v1.10: Dependencies, relations, audio, video & more

Prodigy v1.10: Dependencies, relations, audio, video & more

Version 1.10 of Prodigy includes tons of new features, including manual dependency and relation annotation, audio and video annotation, a new and improved image UI, new recipe callbacks, more settings for manual NER, plus various new config options and settings.

✨ prodigy v1.10.0Jun 16, 2020

Dependency and relation annotation, audio, video, character-based NER & more

Rapid NLP annotation

Rapid NLP annotation Data Science Summit

This talk presents a fast, flexible and even somewhat fun approach to named entity annotation. Using our approach, a model can be trained for a new entity type in only a few hours, starting from only a feed of unannotated text and a handful of seed terms.

Prodigy: A new tool for radically efficient machine teaching

Prodigy: A new tool for radically efficient machine teaching

Machine learning systems are built from both code and data. It's easy to reuse the code but hard to reuse the data, so building AI mostly means doing annotation. This is good, because the examples are how you program the behaviour – the learner itself is really just a compiler. What's not good is the current technology for creating the examples. That's why we're pleased to introduce Prodigy, a downloadable tool for radically efficient machine teaching.