Page: 2 · Explosion · Developer tools and consulting for AI, Machine Learning and NLP

Explosion builds developer tools for AI, Machine Learning and Natural Language Processing. →
Consulting

Project

Topics

Category

Tasks

Authors

Filtered by page: 2

Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation

Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation PyData London

LLMs have enormous potential, but also challenge existing workflows in industry that require modularity, transparency and data privacy. In this talk, Ines shows some practical solutions for using the latest models in real-world applications and distilling their knowledge into smaller and faster components that you can run and maintain in-house.

The AI Revolution Won’t Be Monopolized

The AI Revolution Won’t Be Monopolized TalkPython Podcast

There hasn’t been a boom like the AI boom since the .com days. And it may look like a space destined to be controlled by a couple of tech giants. But Ines Montani thinks open source will play an important role in the future of AI.

The application of natural language processing for the extraction of mechanistic information in toxicology

The application of natural language processing for the extraction of mechanistic information in toxicology Conradi, Luechtefeld, de Haan, Pieters, Freedman, Vanhaecke, Vinken, Teunis (2024)

All steps were conducted using the open-source Python package spaCy. Specifically, the NER model was trained using scispaCy en-core-sci-lg (Neumann et al., 2019) as a starting point, which allowed for a vocabulary (word vectors) and grammar trained on scientific literature.

The AI Revolution Will Not Be Monopolized: Behind the scenes

The AI Revolution Will Not Be Monopolized: Behind the scenes Open Source ML Mixer

A more in-depth look at the concepts and ideas, academic literature, related experiments and preliminary results for distilled task-specific models.

🔮 thinc v9.0.0Apr 19, 2024

Better learning rate schedules and integration of thinc-apple-ops

Designing for tomorrow’s programming workflows

Designing for tomorrow’s programming workflows PyCon Lithuania

Modern editors and AI-powered tools like GitHub Copilot and ChatGPT are changing how people program and are transforming our workflows and developer productivity. But what does this mean for how we should be writing and designing our APIs and libraries?

✨ prodigy v1.15.0Feb 15, 2024

New company plugins and support for SSO

How Nesta uses NLP to process 7m job ads and shed light on the UK’s labor market

How Nesta uses NLP to process 7m job ads and shed light on the UK’s labor market

A case study on Nesta’s workflow for extracting 7 million job ads to better understand UK skill demand, using a custom mapping step to match skills to any government taxonomy.

Microsoft Presidio v2.2.352

Microsoft Presidio v2.2.352

Context aware, pluggable and customizable PII de-identification and anonymization service for text and images, featuring a spaCy back-end.

DeepZensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility

DeepZensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility Landes, Di Eugenio, Caragea (2023)

A linguistic feature mapper that translates spaCy to wordpieces, which are token sub-units with associated vectors, is also accessible as an easy to configure module.

Who said what: using machine learning to correctly attribute quotes

Who said what: using machine learning to correctly attribute quotes The Guardian Engineering Blog

How the Guardian uses spaCy and Prodigy to train a custom coreference resolution model.

Developing a Named Entity Recognition Dataset for Tagalog

Developing a Named Entity Recognition Dataset for Tagalog Miranda (2023), IJCNLP-AACL 2023

We used Prodigy as our annotation tool. We set up a web server on the Google Cloud Platform and routed the examples through Prodigy’s built-in task router.

🔌 prodigy-whisper v0.1.0Nov 12, 2023

Audio transcription with OpenAI’s Whisper model in the loop

GERNERMED++: Semantic annotation in German medical NLP through transfer-learning, translation and word alignment

GERNERMED++: Semantic annotation in German medical NLP through transfer-learning, translation and word alignment Frei, Frei-Stuber, Kramer (2023), Journal of Biomedical Informatics

The training of our entity recognition model employs the entity recognition parser from the spaCy library which follows a transducer-based parsing approach with a BILOU scheme instead of a state-agnostic token tagging approach.

Prodigy-PDF for PDF annotation and OCR

Prodigy-PDF for PDF annotation and OCR

Want to annotate PDF files? Our new Prodigy plugin can help with that! To explain how to use PDF segmentation and OCR, Vincent made a small demo video.

Natural Language Processing and Python

Natural Language Processing and Python The Python Show

🦙 spacy-llm v0.6.0Oct 5, 2023

PaLM, Azure OpenAI, Mistral & fixed OS model responses

Simply Simplify Language

Simply Simplify Language

Interactive app by the Canton of Zurich, Switzerland, using LLMs and spaCy to analyze and simplify institutional communication and make bureaucratic German more inclusive.

KI – Die künstlerische Intelligenz?

KI – Die künstlerische Intelligenz?Immergut Festival (German)

Panelists are discussing the latest developments in Generative AI, hype vs. reality and what those new technologies mean for people, businesses, art, creativity and the music industry.

Economies of Scale Can’t Monopolise the AI Revolution

Economies of Scale Can’t Monopolise the AI Revolution InfoQ Magazine

During her presentation at QCon London, Ines Montani stated that economies of scale are not enough to create monopolies in the AI space and that open-source techniques and models will allow everybody to keep up with the “Gen AI revolution”.

🤖 curated-transformers v2.0.0Apr 17, 2024

Model registry, in-place loading, beter HF Hub integration

Ines Montani on Natural Language Processing

Ines Montani on Natural Language Processing Software Engineering Radio

Ines speaks with host Jeremy Jung about solving problems using natural language processing. They cover generative vs. predictive tasks, creating a pipeline and breaking down problems, labeling examples for training, fine-tuning models, using LLMs to label data and build prototypes, and the spaCy NLP library.

Zero-Shot NER with GliNER and spaCy

Zero-Shot NER with GliNER and spaCy Python Tutorials for Digital Humanities

Tutorial by WJB Mattingly on how to integrate the generalist GLiNER model for Named Entity Recognition with spaCy's versatile NLP environment.

Describing Images Fast and Slow: Quantifying and Predicting the Variation in Human Signals during Visuo-Linguistic Processes

Describing Images Fast and Slow: Quantifying and Predicting the Variation in Human Signals during Visuo-Linguistic Processes Takmaz, Pezzelle, Fernández (2024)

We use the spaCy library for tokenization, part-of-speech tagging, and lemmatization of the words in the descriptions.

🦙 spacy-llm v0.7.0Jan 19, 2024

Supporting arbitrarily long docs and various new tasks

Muted: Multilingual Targeted Offensive Speech Identification and Visualization

Muted: Multilingual Targeted Offensive Speech Identification and Visualization Tillmann, Trivedi, Rosenthal, Borse, Zhang, Sil, Bhattacharjee (2023)

Muted can leverage any transformer-based HAP-classification model [...] to identify toxic spans, without further fine-tuning. In addition, we use the spaCy library to identify the specific targets and arguments for the words predicted by the attention heatmaps.

On the Creation of Classifiers to Support Assessment of E-Portfolios

On the Creation of Classifiers to Support Assessment of E-Portfolios Gantikow, Isking, Libbrecht, Müller, Rebholz (2023)

In this workflow, Prodigy selects and presents text examples that were classified with a very low degree of certainty. The annotator reviews the proposed classifications and corrects them, if necessary.

Launching the Explosion Merch Store

Launching the Explosion Merch Store

Spread the love and support us and our open-source work with some of our unique, custom-designed swag. All orders come with free shipping and stickers!

Introducing Prodigy-HF

Introducing Prodigy-HF Hugging Face Blog

Last week, Explosion introduced Prodigy-HF, a new Prodigy plugin offering code recipes that directly integrate with the Hugging Face stack.

Prodigy-ANN for Image Retrieval via CLIP

Prodigy-ANN for Image Retrieval via CLIP

Dealing with a huge bucket of images that you want to annotate? The new image retrieval features in Prodigy-ANN (approximate nearest neighbors) might help!

✨ prodigy v1.14.5Oct 24, 2023

Toggle for character vs. token highlighting, CSS and JS from local and remote paths

Toward a Critical Toponymy Framework for Named Entity Recognition: A Case Study of Airbnb in New York City

Toward a Critical Toponymy Framework for Named Entity Recognition: A Case Study of Airbnb in New York City Brunila, LaViolette, CH-Wang, Verma, Féré, McKenzie (2023), EMNLP 2023

All annotation was performed using Prodigy following an initial training session where annotators collaboratively annotated a randomly chosen set of samples.

🤖 curated-transformers v1.3.0Oct 2, 2023

Custom model repositories, NVTX Ranges, store config in models

Towards Structured Data: LLMs from Prototype to Production

Towards Structured Data: LLMs from Prototype to Production U.S. Census Bureau: Center for Optimization and Data Science Seminar

This talk presents pragmatic and practical approaches for how to use LLMs beyond just chat bots, how to ship more successful NLP projects from prototype to production and how to use the latest state-of-the-art models in real-world applications.

ZenML v0.58.0

ZenML v0.58.0

New out-of-the-box Prodigy integration in ZenML for LLMs and beyond, to make data development and annotation a core part of your MLOps lifecycle.

spaCyEx v0.0.2

spaCyEx v0.0.2

Extension for spaCy’s powerful, linguistically-aware pattern matching that introduces a RegEx-like syntax.

The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs

The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs QCon London

🦦 weasel v0.4.0Apr 4, 2024

Allow a git repo file as asset and drop support for Python 3.6

Constructing a knowledge base with spaCy and spacy-llm

Constructing a knowledge base with spaCy and spacy-llm MantisNLP Blog

This blog post shows how to use spaCy and LLMs to extract entities and relationships from text and quickly tackle the complex problem of constructing a knowledge base graph from a corpus.

KAZU v1.5

KAZU v1.5

A biomedical NLP framework designed to handle production workloads, built by AstraZeneca and Korea University and using spaCy under the hood.

Prodigy-Segment for Pixel Segmentation

Prodigy-Segment for Pixel Segmentation

Use Meta’s “Segment Anything” model in Prodigy to help you select the right pixels in images.

🔌 prodigy-segment v0.1.0Dec 13, 2023

Select pixels in Prodigy via Meta’s “Segment Anything” model

Prodigy in 2023: LLMs, task routers, QA and plugins

Prodigy in 2023: LLMs, task routers, QA and plugins

We have made a ton of new updates in Prodigy this year with v1.12, v1.13, and v1.14 releases. So we decided to write a post about them.

Impoliteness and morality as instruments of destructive informal social control in online harassment targeting Swedish journalists

Impoliteness and morality as instruments of destructive informal social control in online harassment targeting Swedish journalists Björkenfeldt, Gustafsson (2023)

In the annotation tool Prodigy used for this process, the tweets directed towards journalists were displayed alongside the initial tweet that initiated the conversation thread and the subsequent reply from the journalist.

State-of-the-Art Transformer Pipelines in spaCy

State-of-the-Art Transformer Pipelines in spaCy aiGrunn

In this talk, we will show you how you can use transformer models (from pretrained models such as XLM-RoBERTa to large language models like Llama2) to create state-of-the-art annotation pipelines for text annotation tasks such as named entity recognition.

Explosion, NLP, Generative AI, Entrepreneurship

Explosion, NLP, Generative AI, Entrepreneurship Learning from Machine Learning

🔌 prodigy-hf v0.1.0Oct 23, 2023

Train Hugging Face models with Prodigy annotations

Identifying Signs and Symptoms of Urinary Tract Infection from Emergency Department Clinical Notes Using Large Language Models

Identifying Signs and Symptoms of Urinary Tract Infection from Emergency Department Clinical Notes Using Large Language Models Iscoe, Socrates, Gilson, Chi, Li, Huang, Kearns, Perkins, Khandjian, Taylor (2023)

For annotation we employed Prodigy, a scriptable annotation tool designed to maximize efficiency, enabling data scientists to perform the annotation tasks themselves and facilitating rapid iterative development in natural language processing (NLP) projects.

spaCy meets LLMs: Using Generative AI for Structured Data

spaCy meets LLMs: Using Generative AI for Structured Data Data+ML Community Meetup

This talk dives deeper into spaCy’s LLM integration, which provides a robust framework for extracting structured information from text, distilling large models into smaller components, and closing the gap between prototype and production.

Getting Started with NLP and spaCy

Getting Started with NLP and spaCy TalkPython Course

There is a lot of text data out there and maybe you're interested in getting structured data out of it. There are a lot of options out there and this course will introduce you to the field by focussing on spaCy while also exploring other tools.

The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs

The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs PyCon DE & PyData Berlin

With the latest advancements in NLP and LLMs, and big companies like OpenAI dominating the space, many people wonder: Are we heading further into a black box era with larger and larger models, obscured behind APIs controlled by big tech monopolies?

The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs

The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs PyCon Lithuania Keynote

With the latest advancements in NLP and LLMs, and big companies like OpenAI dominating the space, many people wonder: Are we heading further into a black box era with larger and larger models, obscured behind APIs controlled by big tech monopolies?

🔌 prodigy-evaluate v0.1.0Mar 26, 2024

Evaluate spaCy pipelines, print confusion matrices and more

T-RAG: Lessons from the LLM Trenches

T-RAG: Lessons from the LLM Trenches Fatehkia, Lucas, Chawla (2024)

An important application area is question answering over private enterprise documents where the main considerations are data security, which necessitates applications that can be deployed on-prem, [and] limited computational resources. [...] In addition to retrieving contextual documents, we use the spaCy library with custom rules to detect named entities from the organization.

spacy-llm: From quick prototyping with LLMs to more reliable and efficient NLP solutions

spacy-llm: From quick prototyping with LLMs to more reliable and efficient NLP solutions AstraZeneca NLP Community of Practice

LLMs are paving the way for fast prototyping of NLP applications. Here, Sofie showcases how to build a structured NLP pipeline to mine clinical trials, using spaCy and spacy-llm. Moving beyond a fast prototype, she offers pragmatic solutions to make the pipeline more reliable and cost efficient.

✨ prodigy v1.14.12Dec 13, 2023

Audio UI improvements, resetQueue callback, prodigy-segment plugin

Herding LLMs Towards Structured NLP

Herding LLMs Towards Structured NLP Global AI Conference

This talk shows how we integrate LLMs into spaCy, leveraging its modular and customizable framework. This allows for cheaper, faster and more robust NLP - driven by cutting-edge LLMs, without compromising on having structured, validated data.

Neuradicon: operational representation learning of neuroimaging reports

Neuradicon: operational representation learning of neuroimaging reports Watkins, Gray, Julius, Mah, Pinaya, Wright, Jha, Engleitner, Cardoso, Ourselin, Rees, Jaeger, Nachev (2023)

Labelled data for each task was produced using the Prodigy labelling tool. Each report was labelled in a paired-annotation manner. [...] We used the grammatical dependency parse produced by the spaCy parser as input and implemented the patterns using the spaCy dependency matcher.

calamanCy: A Tagalog Natural Language Processing Toolkit

calamanCy: A Tagalog Natural Language Processing Toolkit Miranda (2023), EMNLP 2023

We introduce calamanCy, an open-source toolkit for constructing NLP pipelines for Tagalog. It is built on top of spaCy, enabling easy experimentation and integration with other frameworks.

Half hour of labeling power: Can we beat GPT?

Half hour of labeling power: Can we beat GPT?PyData NYC

Large Language Models (LLMs) offer a lot of value for modern NLP and can typically achieve surprisingly good accuracy on predictive NLP tasks. But can we do even better than that? In this workshop we show how to use LLMs at development time to create high-quality datasets and train specific, smaller, private and more accurate models for your business problems.

How many Labelled Examples do you need for a BERT-sized Model to Beat GPT-4 on Predictive Tasks?

How many Labelled Examples do you need for a BERT-sized Model to Beat GPT-4 on Predictive Tasks?Generative AI Summit

How does in-context learning compare to supervised approaches on predictive tasks? How many labelled examples do you need on different problems before a BERT-sized model can beat GPT-4 in accuracy? The answer might surprise you: models with fewer than 1b parameters are actually very good at classic predictive NLP, while in-context learning struggles on many problem shapes.

DaCy v2.7.2

DaCy v2.7.2

State-of-the-Art Danish NLP pipelines for spaCy

✨ prodigy v1.14.3Oct 6, 2023

Inter-annotator agreement for document-level and token-level annotations, new plugins