Project: Prodigy · Explosion · Developer tools and consulting for AI, Machine Learning and NLP

Explosion builds developer tools for AI, Machine Learning and Natural Language Processing. →
Consulting

Project

Topics

Category

Tasks

Authors

Filtered by project: Prodigy

The ultimate guide to optimizing annotation workflows

The ultimate guide to optimizing annotation workflows

This blog post collects tips and advice for how to build efficient human-in-the-loop data development workflows, break down business problems into actionable annotation steps and make the most of automation and model assistance.

Conquering PDFs: document understanding beyond plain text

Conquering PDFs: document understanding beyond plain text PyCon DE & PyData

In this talk, Ines presents a new and modular approach for building robust document understanding systems, using state-of-the-art models and the awesome Python ecosystem.

✨ prodigy v1.18.0Feb 24, 2025

Text editing during NER and span annotation, custom translations and more JavaScript features

From PDFs to AI-ready structured data: a deep dive

From PDFs to AI-ready structured data: a deep dive

This blog post presents a new modular workflow for converting PDFs and similar documents to structured data and shows you how to build end-to-end document understanding and information extraction pipelines for industry use cases.

🔌 prodigy-pdf v0.3.0Nov 18, 2024

Support multi-page PDFs in a single view

Applied NLP in the Age of Generative AI

Applied NLP in the Age of Generative AI PyData Amsterdam Keynote

In this talk, Ines shares the most important lessons we’ve learned from solving real-world information extraction problems in industry, and shows you a new approach and mindset for designing robust and modular NLP pipelines in the age of Generative AI.

Practical Tips for Bootstrapping Information Extraction Pipelines

Practical Tips for Bootstrapping Information Extraction Pipelines DataHack Summit

This talk presents approaches for bootstrapping NLP pipelines and retrieval via information extraction, including tips for training, modelling and data annotation.

How S&P Global is making markets more transparent with NLP, spaCy and Prodigy

How S&P Global is making markets more transparent with NLP, spaCy and Prodigy

A case study on S&P Global’s efficient information extraction pipelines for real-time commodities trading insights in a high-security environment.

Prodigy-Segment for Pixel Segmentation

Prodigy-Segment for Pixel Segmentation

Use Meta’s “Segment Anything” model in Prodigy to help you select the right pixels in images.

🔌 prodigy-segment v0.1.0Dec 13, 2023

Select pixels in Prodigy via Meta’s “Segment Anything” model

Who said what: using machine learning to correctly attribute quotes

Who said what: using machine learning to correctly attribute quotes The Guardian Engineering Blog

How the Guardian uses spaCy and Prodigy to train a custom coreference resolution model.

Half hour of labeling power: Can we beat GPT?

Half hour of labeling power: Can we beat GPT?PyData NYC

Large Language Models (LLMs) offer a lot of value for modern NLP and can typically achieve surprisingly good accuracy on predictive NLP tasks. But can we do even better than that? In this workshop we show how to use LLMs at development time to create high-quality datasets and train specific, smaller, private and more accurate models for your business problems.

🔌 prodigy-hf v0.1.0Oct 23, 2023

Train Hugging Face models with Prodigy annotations

Identifying Signs and Symptoms of Urinary Tract Infection from Emergency Department Clinical Notes Using Large Language Models

Identifying Signs and Symptoms of Urinary Tract Infection from Emergency Department Clinical Notes Using Large Language Models Iscoe, Socrates, Gilson, Chi, Li, Huang, Kearns, Perkins, Khandjian, Taylor (2023)

For annotation we employed Prodigy, a scriptable annotation tool designed to maximize efficiency, enabling data scientists to perform the annotation tasks themselves and facilitating rapid iterative development in natural language processing (NLP) projects.

🔌 prodigy-pdf v0.1.0Oct 5, 2023

Annotate and segment PDF files and perform OCR

✨ prodigy v1.13.0Aug 15, 2023

LLM support for NER, text classification and span categorization

Task Routers in Prodigy

Task Routers in Prodigy

How to use the new task routers to customize how examples are assigned in multi-annotator workflows.

Inter-rater agreement for the annotation of neurologic signs and symptoms in electronic health records

Inter-rater agreement for the annotation of neurologic signs and symptoms in electronic health records Oommen, Howlett-Prieto, Carrithers, Hier (2023)

Prodigy was used to annotate neurologic concepts in the EHR physician notes.

Large Disagreement Modelling

Large Disagreement Modelling

“In this blogpost I’d like to talk about large language models. There’s a bunch of hype, sure, but there’s also an opportunity to revisit one of my favourite machine learning techniques: disagreement.”

Creating Custom Event Data Without Dictionaries: A Bag-of-Tricks

Creating Custom Event Data Without Dictionaries: A Bag-of-Tricks Halterman, Schrodt, Beger, Bagozzi, Scarborough (2023)

While in the past the process of generating training case has been quite time consuming and tedious, newer approaches such as those incorporated into the web-based Prodigy annotation system allow this to be done much more quickly.

Deploying a Prodigy cloud service for Posh’s financial chatbots

Deploying a Prodigy cloud service for Posh’s financial chatbots

A Prodigy case study of Posh AI's production-ready annotation platform and custom chatbot annotation tasks for banking customers.

Extracting Structured Information from Greek Legislation Data

Extracting Structured Information from Greek Legislation Data Alexios (2023)

Worth noting is the existence of an application, called Prodigy, which takes advantage of an active learning framework and provides users with an interactive interface for data annotation.

Finetuning and Bulk Labelling Images with Prodigy

Finetuning and Bulk Labelling Images with Prodigy

In this video, we’ll show how you might be able to improve the annotation experience by using bulk labelling for image classification.

Bulk Labelling and Prodigy

Bulk Labelling and Prodigy

In this video, we’ll show a bulk labelling technique that can help you prepare data for Prodigy.

Finding Bad Image Data using UMAP and Prodigy

Finding Bad Image Data using UMAP and Prodigy

In this video, we’ll show you how to use Prodigy to find bad examples in the Google QuickDraw dataset. We will be leveraging a technique that involves UMAP to find strange images semi-automatically.

Healthsea: an end-to-end spaCy pipeline for exploring health supplement effects

Healthsea: an end-to-end spaCy pipeline for exploring health supplement effects

Create better access to health with machine learning and natural language processing. Read about our journey of developing Healthsea, an end-to-end spaCy pipeline for analyzing user reviews to supplement products and extracting potential effects on health.

How We Analyzed Google’s Search Results

How We Analyzed Google’s Search Results The Markup

Using the Prodigy annotation tool, we created a user interface and a coder manual for two annotators to spot-check 741 stained images randomly sampled from our dataset.

Training a Named Entity Recognition Model with Prodigy and Transfer Learning

Training a Named Entity Recognition Model with Prodigy and Transfer Learning

In this video, we’ll show you how to use Prodigy to train a named entity recognition model from scratch, by taking advantage of semi-automatic annotation and modern transfer learning techniques.

✨ prodigy v1.9.0Dec 18, 2019

Custom UI blocks, text input UI, better training and data conversion

FAQ #1: Tips & tricks for NLP, annotation & training with Prodigy and spaCy

FAQ #1: Tips & tricks for NLP, annotation & training with Prodigy and spaCy

In this video, Ines talks about a few frequently asked questions and shares some general tips and tricks for how to structure your NLP annotation projects, how to design your label schemes and how to solve common problems.

Explosion in 2017: Our Year in Review

Explosion in 2017: Our Year in Review

We founded Explosion in October 2016, so this was our first full calendar year in operation. We set ourselves ambitious goals this year, and we're very happy with how we achieved them. Here's what we got done.

Prodigy: A new tool for radically efficient machine teaching

Prodigy: A new tool for radically efficient machine teaching

Machine learning systems are built from both code and data. It's easy to reuse the code but hard to reuse the data, so building AI mostly means doing annotation. This is good, because the examples are how you program the behaviour – the learner itself is really just a compiler. What's not good is the current technology for creating the examples. That's why we're pleased to introduce Prodigy, a downloadable tool for radically efficient machine teaching.

RiCoRecA: rich cooking recipe annotation schema

RiCoRecA: rich cooking recipe annotation schema Ventirozos, Jacob-Romero, Alrdahi, Clinch, Batista-Navarro (2026)

The annotation process consists of two sections. Firstly, the annotator utilized a customized Prodigy interface to complete the NER and RC annotation tasks.

How Love Without Sound helps the music industry recover millions in revenue for artists with NLP, spaCy and Prodigy

How Love Without Sound helps the music industry recover millions in revenue for artists with NLP, spaCy and Prodigy

A case study on Love Without Sound’s innovative AI-powered tools for the music industry and law firms specializing in royalty negotiations.

Prodigy Dashboard Plugin

Prodigy Dashboard Plugin

The new dashboard plugin adds a web application for managing annotations, data analytics and annotation progress, and is now available for early beta testing.

Serverless custom NLP with LLMs, Modal and Prodigy

Serverless custom NLP with LLMs, Modal and Prodigy

In this blog post, we’ll show you how you can go from an idea and little data to a fully custom information extraction model using Prodigy and Modal, no infrastructure or GPU setup required.

✨ prodigy v1.16.0Oct 22, 2024

Modal plugin for on-demand deployment, cross-platform wheels and UI fixes

How GitLab uses spaCy to analyze support tickets and empower their community

How GitLab uses spaCy to analyze support tickets and empower their community

A case study on GitLab’s large-scale NLP pipelines for extracting actionable insights from support tickets and usage questions.

Back to our roots: Company update and future plans

Back to our roots: Company update and future plans

We’re back to running Explosion as a smaller, independent-minded and self-sufficient company. spaCy and Prodigy will stay stable and sustainable, maintained by their original authors. We’ll keep updating our stack wth the latest technologies, without changing its core identity or purpose.

Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation

Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation PyData London

LLMs have enormous potential, but also challenge existing workflows in industry that require modularity, transparency and data privacy. In this talk, Ines shows some practical solutions for using the latest models in real-world applications and distilling their knowledge into smaller and faster components that you can run and maintain in-house.

✨ prodigy v1.14.12Dec 13, 2023

Audio UI improvements, resetQueue callback, prodigy-segment plugin

On the Creation of Classifiers to Support Assessment of E-Portfolios

On the Creation of Classifiers to Support Assessment of E-Portfolios Gantikow, Isking, Libbrecht, Müller, Rebholz (2023)

In this workflow, Prodigy selects and presents text examples that were classified with a very low degree of certainty. The annotator reviews the proposed classifications and corrects them, if necessary.

Impoliteness and morality as instruments of destructive informal social control in online harassment targeting Swedish journalists

Impoliteness and morality as instruments of destructive informal social control in online harassment targeting Swedish journalists Björkenfeldt, Gustafsson (2023)

In the annotation tool Prodigy used for this process, the tweets directed towards journalists were displayed alongside the initial tweet that initiated the conversation thread and the subsequent reply from the journalist.

Prodigy-ANN for Image Retrieval via CLIP

Prodigy-ANN for Image Retrieval via CLIP

Dealing with a huge bucket of images that you want to annotate? The new image retrieval features in Prodigy-ANN (approximate nearest neighbors) might help!

✨ prodigy v1.14.3Oct 6, 2023

Inter-annotator agreement for document-level and token-level annotations, new plugins

✨ prodigy v1.14.1Sep 29, 2023

Custom event hooks for custom UI interactivity

Into the Single Cell Multiverse: an End-to-End Dataset for Procedural Knowledge Extraction in Biomedical Texts

Into the Single Cell Multiverse: an End-to-End Dataset for Procedural Knowledge Extraction in Biomedical Texts Dannenfelser, Zhong, Zhang, Yao (2023), NeurIPS

Tissue, cell type, tool, and method were annotated using the Prodigy software tool developed by Explosion AI for easy tracking of token-level tags.

ACL LAW Workshop Poster

ACL LAW Workshop Poster ACL 2023

✨ prodigy v1.12.0Jul 5, 2023

LLM-assisted workflows for annotation and prompt engineering, task routing for multi-annotator setups

How Good is the Model in Model-in-the-loop Event Coreference Resolution Annotation?

How Good is the Model in Model-in-the-loop Event Coreference Resolution Annotation?Ahmed, Nath, Regan, Pollins, Krishnaswamy, Martin (2023)

Figure 6 illustrates the interface design of the annotation methodology on the popular model-in-the-loop annotation tool - Prodigy. We use this tool for the simplicity it offers in plugging in the various ranking methods we explained.

You are what you read: Building a personal internet front-page with spaCy and Prodigy

You are what you read: Building a personal internet front-page with spaCy and Prodigy PyCon DE & PyData Berlin

Rulers, NER, and data iteration

Rulers, NER, and data iteration

About the power of Rules + ML and the importance of iteration on your pipeline and your data.

Explosion in 2022: Our Year in Review

Explosion in 2022: Our Year in Review

It's been another exciting year at Explosion! We've developed a new end-to-end neural coref component for spaCy, improved the speed of our CNN pipelines up to 60%, and published new pre-trained pipelines for Finnish, Korean, Swedish and Croatian. We've also released several updates to Prodigy and introduced new recipes to kickstart annotation with zero- or few-shot learning.

Setting your ML project up for success

Setting your ML project up for success

“What can you do to maximize probability of success for your Machine Learning solution? Throughout my 15 years as data scientist in academia, big pharma and through consulting, one common theme has emerged: the most reliable predictor of success for any NLP or ML-based solution is whether or not you involve the data science team early on.”

How the Guardian approaches quote extraction with NLP

How the Guardian approaches quote extraction with NLP

A case study of the Guardian's spaCy-Prodigy workflow to modularize quote extraction for content creation. This study includes iterative annotation guidelines and custom interface functionality.

Introducing Span Categorization in Prodigy and spaCy

Introducing Span Categorization in Prodigy and spaCy

In this video, we’ll show you how to use Prodigy for spaCy’s Span Categorizer. We’ll be annotating food recipes and looking into ways to help with consistent annotations and speed up the process with patterns and temporary models.

Automated Identification of Clinical Procedures in Free-Text Electronic Clinical Records with a Low-Code Named Entity Recognition Workflow

Automated Identification of Clinical Procedures in Free-Text Electronic Clinical Records with a Low-Code Named Entity Recognition Workflow Macri, Teoh, Bacchi, Sun, Selva, Casson, Chan (2022), Methods of Information in Medicine

The use of a low-code annotation software tool [Prodigy] allows the rapid creation of a custom annotation dataset to train a NER model to identify clinical procedures stored in free-text electronic clinical notes.

Talking sense: using machine learning to understand quotes

Talking sense: using machine learning to understand quotes The Guardian Blog

How the Guardian uses spaCy and Prodigy to train a machine learning model that helps extract quotes from news articles and match them to the correct source.

Prodigy v1.10: Dependencies, relations, audio, video & more

Prodigy v1.10: Dependencies, relations, audio, video & more

Version 1.10 of Prodigy includes tons of new features, including manual dependency and relation annotation, audio and video annotation, a new and improved image UI, new recipe callbacks, more settings for manual NER, plus various new config options and settings.

Explosion in 2019: Our Year in Review

Explosion in 2019: Our Year in Review

As 2019 draws to a close and we step into the 2020s, we thought we’d take a look back at the year and all we’ve accomplished. And we realized we had so much that we could give you a month-by-month rundown of everything that happened.

✨ prodigy v1.8.0May 20, 2019

Support for spaCy v2.1, basic auth, multi-user sessions, review workflow & more

Practical transfer learning for NLP with spaCy and Prodigy

Practical transfer learning for NLP with spaCy and Prodigy Applied Machine Learning Days

Training a new entity type with Prodigy – annotation powered by active learning

Training a new entity type with Prodigy – annotation powered by active learning

In this video, we’ll show you how to use Prodigy to train a phrase recognition system for a new concept. Specifically, we’ll train a model to detect references to drugs, using text from Reddit.

Engineering a human-aligned LLM evaluation workflow with Prodigy and DSPy

Engineering a human-aligned LLM evaluation workflow with Prodigy and DSPy

This post demonstrates a human-in-the-loop workflow for developing and evaluating LLMs, using Prodigy and DSPy to create task-specific, human-aligned metrics that guide model optimization beyond generic evaluation measures.

Using natural language processing to identify emergency department patients with incidental lung nodules requiring follow-up

Using natural language processing to identify emergency department patients with incidental lung nodules requiring follow-up Moore, Socrates, Hesami, Denkewicz, Cavallo, Venkatesh, Taylor (2025)

CT reports were annotated by MD raters using Prodigy software to develop a stepwise NLP “pipeline” that first excluded prior or known malignancy, determined the presence of a lung nodule, and then categorized any recommended follow-up. NLP was developed using a RoBERTa large language model on the spaCy platform.

🔌 prodigy-pdf v0.4.0Nov 25, 2024

Add text-based span annotation for PDFs

Applied NLP with LLMs: Beyond Black-Box Monoliths

Applied NLP with LLMs: Beyond Black-Box Monoliths PyBerlin

In this talk, Ines shows some practical solutions for using the latest state-of-the-art models in real-world applications and distilling their knowledge into smaller and faster components.

10 Years of Open Source: Navigating the Next AI Revolution

10 Years of Open Source: Navigating the Next AI Revolution EuroSciPy Keynote

In this talk, Ines shares the most important lessons we’ve learned in 10 years of working on open-source software, our core philosophies that helped us adapt to an ever-changing AI landscape and why open source and interoperability still wins over black-box, proprietary APIs.

Building the Future of NLP: Insights on spaCy, Prodigy and Generative AI

Building the Future of NLP: Insights on spaCy, Prodigy and Generative AI Leading With Data Podcast

ZenML v0.58.0

ZenML v0.58.0

New out-of-the-box Prodigy integration in ZenML for LLMs and beyond, to make data development and annotation a core part of your MLOps lifecycle.

🔌 prodigy-evaluate v0.1.0Mar 26, 2024

Evaluate spaCy pipelines, print confusion matrices and more

Prodigy in 2023: LLMs, task routers, QA and plugins

Prodigy in 2023: LLMs, task routers, QA and plugins

We have made a ton of new updates in Prodigy this year with v1.12, v1.13, and v1.14 releases. So we decided to write a post about them.

Developing a Named Entity Recognition Dataset for Tagalog

Developing a Named Entity Recognition Dataset for Tagalog Miranda (2023), IJCNLP-AACL 2023

We used Prodigy as our annotation tool. We set up a web server on the Google Cloud Platform and routed the examples through Prodigy’s built-in task router.

🔌 prodigy-whisper v0.1.0Nov 12, 2023

Audio transcription with OpenAI’s Whisper model in the loop

Prodigy-PDF for PDF annotation and OCR

Prodigy-PDF for PDF annotation and OCR

Want to annotate PDF files? Our new Prodigy plugin can help with that! To explain how to use PDF segmentation and OCR, Vincent made a small demo video.

🔌 prodigy-lunr v0.1.0Oct 5, 2023

Document search via LUNR to fetch relevant data subsets to label

✨ prodigy v1.13.2Sep 7, 2023

New LLM recipes for terms generation and prompt engineering

Models as annotators in Prodigy

Models as annotators in Prodigy

How to use models and LLMs as annotators to find disagreements and prioritize examples to annotate first.

Prodigy v1.12: OpenAI integration, prompt engineering, task routers, deployment docs and more

Prodigy v1.12: OpenAI integration, prompt engineering, task routers, deployment docs and more

Large Language Models: From Prototype to Production

Large Language Models: From Prototype to Production PyData London Keynote

Incorporating LLMs into practical NLP workflows

Incorporating LLMs into practical NLP workflows PyCon DE & PyData Berlin

Slovak Dataset for Multilingual Question Answering

Slovak Dataset for Multilingual Question Answering Hládek, Staš, Juhár, Koctúr (2023)

We used the Prodigy annotation tool to annotate the questions and answers. One annotation task corresponds to one web application deployment and different configurations.

Robust solutions with Explosion’s applied NLP philosophy

Robust solutions with Explosion’s applied NLP philosophy UNC Charlotte

Custom Interfaces with blocks

Custom Interfaces with blocks

You can create custom annotation layouts in Prodigy using the annotation widgets that Prodigy provides by using the blocks feature. This video explains how to use this feature by building a custom interface that can manually annotate and transcribe audio.

Finding Video Games with Sense2Vec

Finding Video Games with Sense2Vec

In this video, we’ll show how you can improve the annotation experience by leveraging sense2vec to pre-fill named entities.

Finding Bad Labels for Text Classification with Jupyter and Prodigy

Finding Bad Labels for Text Classification with Jupyter and Prodigy

In this video, we’ll show you how to use set up Prodigy to find bad labels in text classification tasks. While many of the techniques are applied to text classification, they can also be used for classification tasks in general.

Finding Duplicates in Tabular Data with Jupyter and Prodigy

Finding Duplicates in Tabular Data with Jupyter and Prodigy

In this video, we’ll show you how to use Prodigy to train a named entity recognition model from scratch, by taking advantage of semi-automatic annotation and modern transfer learning techniques.

✨ prodigy v1.11.0Aug 12, 2020

spaCy v3 support, annotation for overlapping and nested spans, better installation & more

Corpus-Level Evaluation for Event QA: The IndiaPoliceEvents Corpus Covering the 2002 Gujarat Violence

Corpus-Level Evaluation for Event QA: The IndiaPoliceEvents Corpus Covering the 2002 Gujarat Violence Halterman, Keith, Sarwar, O’Connor (2021), ACL 2021

Figure A2 shows a stylized version of the custom interface we built using the Prodigy annotation tool. Annotators are presented with an entire document, with sentences sequentially highlighted.

Identifying Predictors of Suicide in Severe Mental Illness: A Feasibility Study of a Clinical Prediction Rule

Identifying Predictors of Suicide in Severe Mental Illness: A Feasibility Study of a Clinical Prediction Rule Senior, Burghart, Yu, Kormilitzin, Liu, Vaci, Nevado-Holgado, Pandit, Zlodre, Fazel (2020)

The named entity recognition model was developed in two phases: 1) training with“gold-standard” annotations collected with GATE and 2) model fine-tuning with Prodigy—an active learning-based annotation tool.

sense2vec reloaded: contextually-keyed word vectors

sense2vec reloaded: contextually-keyed word vectors

In 2016 we trained a sense2vec model on the 2015 portion of the Reddit comments corpus, leading to a useful library and one of our most popular demos. That work is now due for an update. In this post, we present a new version and a demo NER project that we trained to usable accuracy in just a few hours.

Building new NLP solutions with spaCy and Prodigy

Building new NLP solutions with spaCy and Prodigy PyData Berlin

“Commercial machine learning projects are currently like start-ups: many projects fail, but some are extremely successful, justifying the total investment. While some people will tell you to embrace failure, I say failure sucks — so what can we do to fight it? In this talk, I will discuss how to address some of the most likely causes of failure for new NLP projects.”

Training an insults classifier with Prodigy in ~1 hour

Training an insults classifier with Prodigy in ~1 hour

In this video, we’ll show you how to use Prodigy to train a classifier to detect disparaging or insulting comments. Prodigy makes text classification particularly powerful, because you can try out new ideas very quickly.

Conquering PDFs: document understanding beyond plain text

Conquering PDFs: document understanding beyond plain text PyData London

In this talk, Ines presents a new and modular approach for building robust document understanding systems, using state-of-the-art models and the awesome Python ecosystem.

Recognising non-named spatial entities in literary texts: a novel spatial entities classifier

Recognising non-named spatial entities in literary texts: a novel spatial entities classifier Kababgi, Grisot, Pennino, Herrmann (2024)

In this paper, we present a case study on the prediction of what we call ‘non-named spatial entities’ (NNSE) in a historical corpus of Swiss-German novels using a deep learning model in conjunction with BERT and Prodigy.

✨ prodigy v1.17.0Nov 18, 2024

Pages UI for multi-page tasks like longer documents, PDFs or collections of images

Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation

Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation InfoQ Dev Summit

LLMs have enormous potential, but also challenge existing workflows in industry that require modularity, transparency and data privacy. In this talk, Ines shows some practical solutions for using the latest models in real-world applications and distilling their knowledge into smaller and faster components that you can run and maintain in-house.

The NLP and AI Revolution with the spaCy Creators

The NLP and AI Revolution with the spaCy Creators Vanishing Gradients

In this interview with Hugo Bowne-Anderson, we delve into the forefront of NLP and the future of AI development, covering topics like human-in-the-loop distillation, open-source AI and Explosion’s journey.

A practical guide to human-in-the-loop distillation

A practical guide to human-in-the-loop distillation

This blog post presents practical solutions for using the latest state-of-the-art models in real-world applications and distilling their knowledge into smaller and faster components that you can run and maintain in-house.

✨ prodigy v1.15.0Feb 15, 2024

New company plugins and support for SSO

How Nesta uses NLP to process 7m job ads and shed light on the UK’s labor market

How Nesta uses NLP to process 7m job ads and shed light on the UK’s labor market

A case study on Nesta’s workflow for extracting 7 million job ads to better understand UK skill demand, using a custom mapping step to match skills to any government taxonomy.

Neuradicon: operational representation learning of neuroimaging reports

Neuradicon: operational representation learning of neuroimaging reports Watkins, Gray, Julius, Mah, Pinaya, Wright, Jha, Engleitner, Cardoso, Ourselin, Rees, Jaeger, Nachev (2023)

Labelled data for each task was produced using the Prodigy labelling tool. Each report was labelled in a paired-annotation manner. [...] We used the grammatical dependency parse produced by the spaCy parser as input and implemented the patterns using the spaCy dependency matcher.

Introducing Prodigy-HF

Introducing Prodigy-HF Hugging Face Blog

Last week, Explosion introduced Prodigy-HF, a new Prodigy plugin offering code recipes that directly integrate with the Hugging Face stack.

✨ prodigy v1.14.5Oct 24, 2023

Toggle for character vs. token highlighting, CSS and JS from local and remote paths

Toward a Critical Toponymy Framework for Named Entity Recognition: A Case Study of Airbnb in New York City

Toward a Critical Toponymy Framework for Named Entity Recognition: A Case Study of Airbnb in New York City Brunila, LaViolette, CH-Wang, Verma, Féré, McKenzie (2023), EMNLP 2023

All annotation was performed using Prodigy following an initial training session where annotators collaboratively annotated a randomly chosen set of samples.

🔌 prodigy-ann v0.1.0Oct 5, 2023

Use ANN techniques to fetch relevant data subsets to label

✨ prodigy v1.13.1Aug 23, 2023

Use models and LLMs as annotators to find disagreements

Large Language Models: From Prototype to Production

Large Language Models: From Prototype to Production EuroPython Keynote

Large Language Models (LLMs) have shown some impressive capabilities and their impact is the topic of the moment. In this talk, Ines presents visions for NLP in the age of LLMs and a pragmatic, practical approach for how to use Large Language Models to ship more successful NLP projects from prototype to production today.

Concepts and measures of bureaucratic constraints in European Union laws from hand-coding to machine-learning

Concepts and measures of bureaucratic constraints in European Union laws from hand-coding to machine-learning Franchino, Migliorati, Pagano, Vignoli (2023)

The models “learn” the relations between the text tokens and the entity categories from two randomly selected samples of sentences that are extracted from a pre-processed corpus and have been manually annotated using the Python-implemented platform “Prodigy”.

SpanCat with spaCy and Prodigy on real data

SpanCat with spaCy and Prodigy on real data

YouTube series by WJB Mattingly showing an end-to-end project, from cultivating and annotating data to training, testing and visualizing a model.

Predicting relations between SOAP note sections: The value of incorporating a clinical information model

Predicting relations between SOAP note sections: The value of incorporating a clinical information model Socrates, Gilson, Lopez, Chi, Taylor, Chartash (2023), Journal of Biomedical Informatics

To support human annotation, we first annotate 100 Assessment and Plan subsections manually using Prodigy, and then use spacy-transformers to fine-tune a general domain RoBERTa-base model pretrained on OntoNotes 5 for both the Assessment and Plan section NER tagging.

Fiscal data in text: Information extraction from audit reports using Natural Language Processing

Fiscal data in text: Information extraction from audit reports using Natural Language Processing Beltran (2023), Data & Policy, Cambridge University Press

I relied on the text annotation software Prodigy in Python that offers a friendly user interface where the reviewer can read the text and assign a label to each paragraph.

Training spaCy NER Models with Prodigy

Training spaCy NER Models with Prodigy

This handy flowchart contains our most common tips, tricks, and best practices for training and updating spaCy named entity recognition models with Prodigy.

The triangulation of ethical leader signals using qualitative, experimental, and data science methods

The triangulation of ethical leader signals using qualitative, experimental, and data science methods Banks, Ross, Toth, Tonidandel, Goloujeh, Dou, Wesslen (2022)

This additional text was labeled by the same coding team using Prodigy, [...] a flexible user interface tool built on top of spaCy, a leading open source library in python for natural language processing. We created a spaCy end‐to‐end project workflow including package versioning, data pre‐processing, data ingestion into a database, annotation sessions using Prodigy’s user interface, model training, model evaluation, python packaging, and visual app for testing the model.

Speech acts in the Dutch COVID-19 Press Conferences

Speech acts in the Dutch COVID-19 Press Conferences Schueler, Marx (2022), Language Resources and Evaluation

We used the annotation tool Prodigy. Prodigy provides a simple interface in which the annotator sees a sentence and selects the applicable speech acts. The use of Prodigy considerably sped up the annotation process, allowing the annotators to annotate around 200 sentences per hour.

Diary of a spaCy project: Predicting GitHub Tags

Diary of a spaCy project: Predicting GitHub Tags

Many people assume that working on an NLP project involves a lot of machine learning. Our experience is that it's much less about flowing tensors, and more about making a tailored solution. This blogposts demonstrates how a typical spaCy project could be initiated, implemented and executed towards a custom solution.

Explosion in 2021: Our Year in Review

Explosion in 2021: Our Year in Review

The year 2021 is coming to an end, and like the previous year, it was shaped by unique challenges that impacted our work together. For Explosion, it was a very productive year. We found an investor that fits our strategy, the work on Prodigy Teams is in full swing, and the team has grown a lot. So here's our look back at our highlights of the year 2021.

Explosion in 2020: Our Year in Review

Explosion in 2020: Our Year in Review

While 2020 hasn’t been easy for anyone, at Explosion we’ve considered ourselves relatively fortunate in this most interesting year. We’ve always worked remotely, so we’ve been able to take both pride and comfort in continuing to ship good software. Here’s a look back at what we’ve been up to.

✨ prodigy v1.10.0Jun 16, 2020

Dependency and relation annotation, audio, video, character-based NER & more

Image Captioning with Prodigy & PyTorch

Image Captioning with Prodigy & PyTorch

In this video, we’ll show you how you can use Prodigy to script fully custom annotation workflows in Python, how to plug in your own machine learning models and how to mix and match different interfaces for your specific use case.

Practical transfer learning for NLP with spaCy and Prodigy

Practical transfer learning for NLP with spaCy and Prodigy Infoshare

Rapid NLP annotation

Rapid NLP annotation Data Science Summit

This talk presents a fast, flexible and even somewhat fun approach to named entity annotation. Using our approach, a model can be trained for a new entity type in only a few hours, starting from only a feed of unannotated text and a handful of seed terms.

Building Prodigy: Our new tool for efficient machine teaching

Building Prodigy: Our new tool for efficient machine teaching ines.io

The philosophy behind Prodigy’s features and its cloud-free design.