Task: Text Classification · Explosion · Developer tools and consulting for AI, Machine Learning and NLP

Explosion builds developer tools for AI, Machine Learning and Natural Language Processing. →
Consulting

Project

Topics

Category

Tasks

Authors

Filtered by task: Text Classification

How Love Without Sound helps the music industry recover millions in revenue for artists with NLP, spaCy and Prodigy

How Love Without Sound helps the music industry recover millions in revenue for artists with NLP, spaCy and Prodigy

A case study on Love Without Sound’s innovative AI-powered tools for the music industry and law firms specializing in royalty negotiations.

How GitLab uses spaCy to analyze support tickets and empower their community

How GitLab uses spaCy to analyze support tickets and empower their community

A case study on GitLab’s large-scale NLP pipelines for extracting actionable insights from support tickets and usage questions.

Towards Structured Data: LLMs from Prototype to Production

Towards Structured Data: LLMs from Prototype to Production U.S. Census Bureau: Center for Optimization and Data Science Seminar

This talk presents pragmatic and practical approaches for how to use LLMs beyond just chat bots, how to ship more successful NLP projects from prototype to production and how to use the latest state-of-the-art models in real-world applications.

Prodigy in 2023: LLMs, task routers, QA and plugins

Prodigy in 2023: LLMs, task routers, QA and plugins

We have made a ton of new updates in Prodigy this year with v1.12, v1.13, and v1.14 releases. So we decided to write a post about them.

🔌 prodigy-hf v0.1.0Oct 23, 2023

Train Hugging Face models with Prodigy annotations

Large Language Models: From Prototype to Production

Large Language Models: From Prototype to Production EuroPython Keynote

Large Language Models (LLMs) have shown some impressive capabilities and their impact is the topic of the moment. In this talk, Ines presents visions for NLP in the age of LLMs and a pragmatic, practical approach for how to use Large Language Models to ship more successful NLP projects from prototype to production today.

🦙 spacy-llm v0.4.0Jul 6, 2023

Falcon, sentiment analysis, summarization, backend refactoring

Speech acts in the Dutch COVID-19 Press Conferences

Speech acts in the Dutch COVID-19 Press Conferences Schueler, Marx (2022), Language Resources and Evaluation

We used the annotation tool Prodigy. Prodigy provides a simple interface in which the annotator sees a sentence and selects the applicable speech acts. The use of Prodigy considerably sped up the annotation process, allowing the annotators to annotate around 200 sentences per hour.

Healthsea: an end-to-end spaCy pipeline for exploring health supplement effects

Healthsea: an end-to-end spaCy pipeline for exploring health supplement effects

Create better access to health with machine learning and natural language processing. Read about our journey of developing Healthsea, an end-to-end spaCy pipeline for analyzing user reviews to supplement products and extracting potential effects on health.

Supervised similarity: Learning symmetric relations from duplicate question data

Supervised similarity: Learning symmetric relations from duplicate question data

Supervised models for text-pair classification let you create software that assigns a label to two texts, based on some relationship between them. When the relationship is symmetric, it can be useful to incorporate this constraint into the model. This post shows how a siamese convolutional neural network performs on two duplicate question data sets with experimental results.

Mastering spaCy

Mastering spaCy Déborah Mesquita, Duygu Altinok (Packt Publishing, 2025)

Build structured NLP solutions with custom components and models powered by LLMs. By end of the book you will be empowered to build robust NLP pipelines and integrate them with web applications to build end-to-end solutions.

A practical guide to human-in-the-loop distillation

A practical guide to human-in-the-loop distillation

This blog post presents practical solutions for using the latest state-of-the-art models in real-world applications and distilling their knowledge into smaller and faster components that you can run and maintain in-house.

spaCy meets LLMs: Using Generative AI for Structured Data

spaCy meets LLMs: Using Generative AI for Structured Data Data+ML Community Meetup

This talk dives deeper into spaCy’s LLM integration, which provides a robust framework for extracting structured information from text, distilling large models into smaller components, and closing the gap between prototype and production.

Impoliteness and morality as instruments of destructive informal social control in online harassment targeting Swedish journalists

Impoliteness and morality as instruments of destructive informal social control in online harassment targeting Swedish journalists Björkenfeldt, Gustafsson (2023)

In the annotation tool Prodigy used for this process, the tweets directed towards journalists were displayed alongside the initial tweet that initiated the conversation thread and the subsequent reply from the journalist.

🔌 prodigy-lunr v0.1.0Oct 5, 2023

Document search via LUNR to fetch relevant data subsets to label

Large Language Models: From Prototype to Production

Large Language Models: From Prototype to Production PyData London Keynote

Bulk Labelling and Prodigy

Bulk Labelling and Prodigy

In this video, we’ll show a bulk labelling technique that can help you prepare data for Prodigy.

Mastering spaCy

Mastering spaCy Duygu Altinok (Packt Publishing, 2021)

An end-to-end practical guide to implementing NLP applications using the Python ecosystem. By the end of this book, you'll be able to confidently use spaCy, including its linguistic features, word vectors, and classifiers, to create your own NLP apps.

Deep text-pair classification with Quora's 2017 question dataset

Deep text-pair classification with Quora's 2017 question dataset

Quora recently released the first dataset from their platform: a set of 400,000 question pairs, with annotations indicating whether the questions request the same information. This data set is large, real, and relevant — a rare combination. In this post, I'll explain how to solve text-pair tasks with deep learning, using both new and established tips and technologies.

Serverless custom NLP with LLMs, Modal and Prodigy

Serverless custom NLP with LLMs, Modal and Prodigy

In this blog post, we’ll show you how you can go from an idea and little data to a fully custom information extraction model using Prodigy and Modal, no infrastructure or GPU setup required.

How to uncover and avoid structural biases in evaluating your Machine Learning/NLP projects

How to uncover and avoid structural biases in evaluating your Machine Learning/NLP projects PyData London

This talk highlights common pitfalls that occur when evaluating ML and NLP approaches. It provides comprehensive advice on how to set up a solid evaluation procedure in general, and dives into a few specific use-cases to demonstrate artificial bias that unknowingly can creep in.

The AI Revolution Will Not Be Monopolized: Behind the scenes

The AI Revolution Will Not Be Monopolized: Behind the scenes Open Source ML Mixer

A more in-depth look at the concepts and ideas, academic literature, related experiments and preliminary results for distilled task-specific models.

How many Labelled Examples do you need for a BERT-sized Model to Beat GPT-4 on Predictive Tasks?

How many Labelled Examples do you need for a BERT-sized Model to Beat GPT-4 on Predictive Tasks?Generative AI Summit

How does in-context learning compare to supervised approaches on predictive tasks? How many labelled examples do you need on different problems before a BERT-sized model can beat GPT-4 in accuracy? The answer might surprise you: models with fewer than 1b parameters are actually very good at classic predictive NLP, while in-context learning struggles on many problem shapes.

🔌 prodigy-ann v0.1.0Oct 5, 2023

Use ANN techniques to fetch relevant data subsets to label

You are what you read: Building a personal internet front-page with spaCy and Prodigy

You are what you read: Building a personal internet front-page with spaCy and Prodigy PyCon DE & PyData Berlin

Finding Bad Labels for Text Classification with Jupyter and Prodigy

Finding Bad Labels for Text Classification with Jupyter and Prodigy

In this video, we’ll show you how to use set up Prodigy to find bad labels in text classification tasks. While many of the techniques are applied to text classification, they can also be used for classification tasks in general.

Corpus-Level Evaluation for Event QA: The IndiaPoliceEvents Corpus Covering the 2002 Gujarat Violence

Corpus-Level Evaluation for Event QA: The IndiaPoliceEvents Corpus Covering the 2002 Gujarat Violence Halterman, Keith, Sarwar, O’Connor (2021), ACL 2021

Figure A2 shows a stylized version of the custom interface we built using the Prodigy annotation tool. Annotators are presented with an entire document, with sentences sequentially highlighted.

Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation

Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation InfoQ Dev Summit

LLMs have enormous potential, but also challenge existing workflows in industry that require modularity, transparency and data privacy. In this talk, Ines shows some practical solutions for using the latest models in real-world applications and distilling their knowledge into smaller and faster components that you can run and maintain in-house.

Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation

Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation PyData London

LLMs have enormous potential, but also challenge existing workflows in industry that require modularity, transparency and data privacy. In this talk, Ines shows some practical solutions for using the latest models in real-world applications and distilling their knowledge into smaller and faster components that you can run and maintain in-house.

On the Creation of Classifiers to Support Assessment of E-Portfolios

On the Creation of Classifiers to Support Assessment of E-Portfolios Gantikow, Isking, Libbrecht, Müller, Rebholz (2023)

In this workflow, Prodigy selects and presents text examples that were classified with a very low degree of certainty. The annotator reviews the proposed classifications and corrects them, if necessary.

DaCy v2.7.2

DaCy v2.7.2

State-of-the-Art Danish NLP pipelines for spaCy

✨ prodigy v1.13.0Aug 15, 2023

LLM support for NER, text classification and span categorization

Deploying a Prodigy cloud service for Posh’s financial chatbots

Deploying a Prodigy cloud service for Posh’s financial chatbots

A Prodigy case study of Posh AI's production-ready annotation platform and custom chatbot annotation tasks for banking customers.

Diary of a spaCy project: Predicting GitHub Tags

Diary of a spaCy project: Predicting GitHub Tags

Many people assume that working on an NLP project involves a lot of machine learning. Our experience is that it's much less about flowing tensors, and more about making a tailored solution. This blogposts demonstrates how a typical spaCy project could be initiated, implemented and executed towards a custom solution.

Training an insults classifier with Prodigy in ~1 hour

Training an insults classifier with Prodigy in ~1 hour

In this video, we’ll show you how to use Prodigy to train a classifier to detect disparaging or insulting comments. Prodigy makes text classification particularly powerful, because you can try out new ideas very quickly.