Topic: LLMs · Explosion · Developer tools and consulting for AI, Machine Learning and NLP

Explosion builds developer tools for AI, Machine Learning and Natural Language Processing. →
Consulting

Project

Topics

Category

Tasks

Authors

Filtered by topic: LLMs

Sovereign AI systems instead of black box solutions

Sovereign AI systems instead of black box solutions it-daily

German article featuring Ines’ take on AI in industry, the role of open source, and using Generative AI to create systems.

Feminist AI LAN Party

Feminist AI LAN Party PyCon DE & PyData

Three days of workshops, hacking, creating, publishing and connecting locally, featuring a data development workshop with Prodigy and a session on hacking LLMs.

Mastering spaCy

Mastering spaCy Déborah Mesquita, Duygu Altinok (Packt Publishing, 2025)

Build structured NLP solutions with custom components and models powered by LLMs. By end of the book you will be empowered to build robust NLP pipelines and integrate them with web applications to build end-to-end solutions.

From PDFs to AI-ready structured data: a deep dive

From PDFs to AI-ready structured data: a deep dive

This blog post presents a new modular workflow for converting PDFs and similar documents to structured data and shows you how to build end-to-end document understanding and information extraction pipelines for industry use cases.

Applied NLP with LLMs: Beyond Black-Box Monoliths

Applied NLP with LLMs: Beyond Black-Box Monoliths PyBerlin

In this talk, Ines shows some practical solutions for using the latest state-of-the-art models in real-world applications and distilling their knowledge into smaller and faster components.

The NLP and AI Revolution with the spaCy Creators

The NLP and AI Revolution with the spaCy Creators Vanishing Gradients

In this interview with Hugo Bowne-Anderson, we delve into the forefront of NLP and the future of AI development, covering topics like human-in-the-loop distillation, open-source AI and Explosion’s journey.

A practical guide to human-in-the-loop distillation

A practical guide to human-in-the-loop distillation

This blog post presents practical solutions for using the latest state-of-the-art models in real-world applications and distilling their knowledge into smaller and faster components that you can run and maintain in-house.

Simply Simplify Language

Simply Simplify Language

Interactive app by the Canton of Zurich, Switzerland, using LLMs and spaCy to analyze and simplify institutional communication and make bureaucratic German more inclusive.

KI – Die künstlerische Intelligenz?

KI – Die künstlerische Intelligenz?Immergut Festival (German)

Panelists are discussing the latest developments in Generative AI, hype vs. reality and what those new technologies mean for people, businesses, art, creativity and the music industry.

The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs

The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs PyCon DE & PyData Berlin

With the latest advancements in NLP and LLMs, and big companies like OpenAI dominating the space, many people wonder: Are we heading further into a black box era with larger and larger models, obscured behind APIs controlled by big tech monopolies?

Designing for tomorrow’s programming workflows

Designing for tomorrow’s programming workflows PyCon Lithuania

Modern editors and AI-powered tools like GitHub Copilot and ChatGPT are changing how people program and are transforming our workflows and developer productivity. But what does this mean for how we should be writing and designing our APIs and libraries?

spacy-llm: From quick prototyping with LLMs to more reliable and efficient NLP solutions

spacy-llm: From quick prototyping with LLMs to more reliable and efficient NLP solutions AstraZeneca NLP Community of Practice

LLMs are paving the way for fast prototyping of NLP applications. Here, Sofie showcases how to build a structured NLP pipeline to mine clinical trials, using spaCy and spacy-llm. Moving beyond a fast prototype, she offers pragmatic solutions to make the pipeline more reliable and cost efficient.

🦙 spacy-llm v0.7.0Jan 19, 2024

Supporting arbitrarily long docs and various new tasks

Half hour of labeling power: Can we beat GPT?

Half hour of labeling power: Can we beat GPT?PyData NYC

Large Language Models (LLMs) offer a lot of value for modern NLP and can typically achieve surprisingly good accuracy on predictive NLP tasks. But can we do even better than that? In this workshop we show how to use LLMs at development time to create high-quality datasets and train specific, smaller, private and more accurate models for your business problems.

Panel: Large Language Models

Panel: Large Language Models Big PyData BBQ

with Ines, Alejandro Saucedo (Zalando, Institute for Ethical AI & ML), Alina Lehnhard (Cerence), Michael Gerz (Heidelberg University), Alexander CS Hendorf (Königsweg)

✨ prodigy v1.13.2Sep 7, 2023

New LLM recipes for terms generation and prompt engineering

Prodigy v1.12: OpenAI integration, prompt engineering, task routers, deployment docs and more

Prodigy v1.12: OpenAI integration, prompt engineering, task routers, deployment docs and more

✨ prodigy v1.12.0Jul 5, 2023

LLM-assisted workflows for annotation and prompt engineering, task routing for multi-annotator setups

Against LLM maximalism

Against LLM maximalism

LLMs are not a direct solution to most of the NLP use-cases companies have been working on. They are extremely useful, but if you want to deliver reliable software you can improve over time, you can't just write a prompt and call it a day. Once you're past prototyping and want to deliver the best system you can, supervised learning will often give you better efficiency, accuracy and reliability.

Conquering PDFs: document understanding beyond plain text

Conquering PDFs: document understanding beyond plain text PyData London

In this talk, Ines presents a new and modular approach for building robust document understanding systems, using state-of-the-art models and the awesome Python ecosystem.

Conquering PDFs: document understanding beyond plain text

Conquering PDFs: document understanding beyond plain text PyCon DE & PyData

In this talk, Ines presents a new and modular approach for building robust document understanding systems, using state-of-the-art models and the awesome Python ecosystem.

Prozessvisualisierung mit generativer KI im Praxistest

Prozessvisualisierung mit generativer KI im Praxistest iX Magazin / Heise

German article by Nils Durner on visualizing technical processes with Generative AI, featuring spaCy and Presidio for PII anonymization.

Distill Your LLMs and Surpass Their Performance

Distill Your LLMs and Surpass Their Performance InfoQ Magazine

In her presentation at InfoQ Dev Summit, Ines Montani provided the audience with practical solutions for using the latest state-of-the-art models in real-world applications and distilling their knowledge into smaller and faster components.

Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation

Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation InfoQ Dev Summit

LLMs have enormous potential, but also challenge existing workflows in industry that require modularity, transparency and data privacy. In this talk, Ines shows some practical solutions for using the latest models in real-world applications and distilling their knowledge into smaller and faster components that you can run and maintain in-house.

Practical Tips for Bootstrapping Information Extraction Pipelines

Practical Tips for Bootstrapping Information Extraction Pipelines DataHack Summit

This talk presents approaches for bootstrapping NLP pipelines and retrieval via information extraction, including tips for training, modelling and data annotation.

How S&P Global is making markets more transparent with NLP, spaCy and Prodigy

How S&P Global is making markets more transparent with NLP, spaCy and Prodigy

A case study on S&P Global’s efficient information extraction pipelines for real-time commodities trading insights in a high-security environment.

Towards Structured Data: LLMs from Prototype to Production

Towards Structured Data: LLMs from Prototype to Production U.S. Census Bureau: Center for Optimization and Data Science Seminar

This talk presents pragmatic and practical approaches for how to use LLMs beyond just chat bots, how to ship more successful NLP projects from prototype to production and how to use the latest state-of-the-art models in real-world applications.

ZenML v0.58.0

ZenML v0.58.0

New out-of-the-box Prodigy integration in ZenML for LLMs and beyond, to make data development and annotation a core part of your MLOps lifecycle.

The AI Revolution Will Not Be Monopolized: Behind the scenes

The AI Revolution Will Not Be Monopolized: Behind the scenes Open Source ML Mixer

A more in-depth look at the concepts and ideas, academic literature, related experiments and preliminary results for distilled task-specific models.

Zero-Shot NER with GliNER and spaCy

Zero-Shot NER with GliNER and spaCy Python Tutorials for Digital Humanities

Tutorial by WJB Mattingly on how to integrate the generalist GLiNER model for Named Entity Recognition with spaCy's versatile NLP environment.

Herding LLMs Towards Structured NLP

Herding LLMs Towards Structured NLP Global AI Conference

This talk shows how we integrate LLMs into spaCy, leveraging its modular and customizable framework. This allows for cheaper, faster and more robust NLP - driven by cutting-edge LLMs, without compromising on having structured, validated data.

🔌 prodigy-whisper v0.1.0Nov 12, 2023

Audio transcription with OpenAI’s Whisper model in the loop

How many Labelled Examples do you need for a BERT-sized Model to Beat GPT-4 on Predictive Tasks?

How many Labelled Examples do you need for a BERT-sized Model to Beat GPT-4 on Predictive Tasks?Generative AI Summit

How does in-context learning compare to supervised approaches on predictive tasks? How many labelled examples do you need on different problems before a BERT-sized model can beat GPT-4 in accuracy? The answer might surprise you: models with fewer than 1b parameters are actually very good at classic predictive NLP, while in-context learning struggles on many problem shapes.

Models as annotators in Prodigy

Models as annotators in Prodigy

How to use models and LLMs as annotators to find disagreements and prioritize examples to annotate first.

✨ prodigy v1.13.1Aug 23, 2023

Use models and LLMs as annotators to find disagreements

spaCy: a customizable NLP toolkit designed for developers

spaCy: a customizable NLP toolkit designed for developers ODSC Europe

🦙 spacy-llm v0.3.0Jun 14, 2023

Cohere, Anthropic, OpenLLaMa, StableLM, logging, streamlit demo, lemmatization task

Incorporating LLMs into practical NLP workflows

Incorporating LLMs into practical NLP workflows PyCon DE & PyData Berlin

Applied NLP in the Age of Generative AI: Future-Proof Strategies for Banking and Finance

Applied NLP in the Age of Generative AI: Future-Proof Strategies for Banking and Finance ECONDAT Keynote

A modern approach and mindset for building future-proof NLP pipelines in-house, focusing on use cases from banking, finance and economics.

How to advocate for modular NLP in the age of Generative AI

How to advocate for modular NLP in the age of Generative AI

With all the hype around Generative AI, many are led to believe it’s the solution to everything. So how can you, as a developer, communicate the nuances and advocate for new and modular solutions that are better, easier and cheaper?

What the history of the web can teach us about the future of AI

What the history of the web can teach us about the future of AI

How will AI development look in the future? There is a lot we can learn from another groundbreaking technology: the web. This blog post takes a look at what the history of the web can teach us, and what this means for developers, models, open source and regulation.

Serverless custom NLP with LLMs, Modal and Prodigy

Serverless custom NLP with LLMs, Modal and Prodigy

In this blog post, we’ll show you how you can go from an idea and little data to a fully custom information extraction model using Prodigy and Modal, no infrastructure or GPU setup required.

Applied NLP in the Age of Generative AI

Applied NLP in the Age of Generative AI PyData Amsterdam Keynote

In this talk, Ines shares the most important lessons we’ve learned from solving real-world information extraction problems in industry, and shows you a new approach and mindset for designing robust and modular NLP pipelines in the age of Generative AI.

Building the Future of NLP: Insights on spaCy, Prodigy and Generative AI

Building the Future of NLP: Insights on spaCy, Prodigy and Generative AI Leading With Data Podcast

Exploring the AI nexus with the mind behind spaCy

Exploring the AI nexus with the mind behind spaCy Leading With Data Podcast

In this episode, Matt takes you on a deep dive into the future of data and the challenges facing current Large Language Models (LLMs).

spaCy meets LLMs: Using Generative AI for Structured Data

spaCy meets LLMs: Using Generative AI for Structured Data Data+ML Community Meetup

This talk dives deeper into spaCy’s LLM integration, which provides a robust framework for extracting structured information from text, distilling large models into smaller components, and closing the gap between prototype and production.

Getting Started with NLP and spaCy

Getting Started with NLP and spaCy TalkPython Course

There is a lot of text data out there and maybe you're interested in getting structured data out of it. There are a lot of options out there and this course will introduce you to the field by focussing on spaCy while also exploring other tools.

The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs

The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs QCon London

Constructing a knowledge base with spaCy and spacy-llm

Constructing a knowledge base with spaCy and spacy-llm MantisNLP Blog

This blog post shows how to use spaCy and LLMs to extract entities and relationships from text and quickly tackle the complex problem of constructing a knowledge base graph from a corpus.

Prodigy in 2023: LLMs, task routers, QA and plugins

Prodigy in 2023: LLMs, task routers, QA and plugins

We have made a ton of new updates in Prodigy this year with v1.12, v1.13, and v1.14 releases. So we decided to write a post about them.

Identifying Signs and Symptoms of Urinary Tract Infection from Emergency Department Clinical Notes Using Large Language Models

Identifying Signs and Symptoms of Urinary Tract Infection from Emergency Department Clinical Notes Using Large Language Models Iscoe, Socrates, Gilson, Chi, Li, Huang, Kearns, Perkins, Khandjian, Taylor (2023)

For annotation we employed Prodigy, a scriptable annotation tool designed to maximize efficiency, enabling data scientists to perform the annotation tasks themselves and facilitating rapid iterative development in natural language processing (NLP) projects.

🦙 spacy-llm v0.6.0Oct 5, 2023

PaLM, Azure OpenAI, Mistral & fixed OS model responses

✨ prodigy v1.13.0Aug 15, 2023

LLM support for NER, text classification and span categorization

How to Host Your Own API of Open Language Models For Free

Powered by Explosion’s curated-transformers, FastAPI and ngrok.

Large Language Models: From Prototype to Production

Large Language Models: From Prototype to Production PyData London Keynote

🦙 spacy-llm v0.2.0May 30, 2023

REL and spancat tasks, reading prompt templates from file

E^2GraphRAG: Streamlining Graph-based RAG for High Efficiency and Effectiveness

E^2GraphRAG: Streamlining Graph-based RAG for High Efficiency and Effectiveness Zhao, Zhu, Guo, He, Li (2025)

Instead of using LLMs for entity extraction, we employ the traditional NLP tool spaCy to extract entities, and use their co-occurrence in a chunk as relations.

How Love Without Sound helps the music industry recover millions in revenue for artists with NLP, spaCy and Prodigy

How Love Without Sound helps the music industry recover millions in revenue for artists with NLP, spaCy and Prodigy

A case study on Love Without Sound’s innovative AI-powered tools for the music industry and law firms specializing in royalty negotiations.

What the history of the web can teach us about the future of AI

What the history of the web can teach us about the future of AI PyCon+Web Keynote

In this talk, Ines takes a look at what the history of the web can teach us about the future of AI, and what this means for developers, models, open source and regulation.

Reality is not an End-to-End Prediction Problem: Applied NLP in the Age of Generative AI

Reality is not an End-to-End Prediction Problem: Applied NLP in the Age of Generative AI dotAI

Combining the Best of Two Worlds: From TF-IDF to Llama LLM

Combining the Best of Two Worlds: From TF-IDF to Llama LLM Open Source Summit Europe

Talk by William Arias, Staff Developer Advocate at GitLab, on combining traditional NLP techniques and LLMs to solve hallucination issues and create robust spaCy applications.

The AI Revolution Will Not Be Monopolized

The AI Revolution Will Not Be Monopolized InfoQ

Open-source initiatives are pivotal in democratizing AI technology, offering transparent, extensible tools that empower users. Daniel Dominguez summarizes the key takeaways from Ines’ recent talk for InfoQ.

Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation

Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation PyData London

LLMs have enormous potential, but also challenge existing workflows in industry that require modularity, transparency and data privacy. In this talk, Ines shows some practical solutions for using the latest models in real-world applications and distilling their knowledge into smaller and faster components that you can run and maintain in-house.

The AI Revolution Won’t Be Monopolized

The AI Revolution Won’t Be Monopolized TalkPython Podcast

There hasn’t been a boom like the AI boom since the .com days. And it may look like a space destined to be controlled by a couple of tech giants. But Ines Montani thinks open source will play an important role in the future of AI.

Economies of Scale Can’t Monopolise the AI Revolution

Economies of Scale Can’t Monopolise the AI Revolution InfoQ Magazine

During her presentation at QCon London, Ines Montani stated that economies of scale are not enough to create monopolies in the AI space and that open-source techniques and models will allow everybody to keep up with the “Gen AI revolution”.

The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs

The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs PyCon Lithuania Keynote

With the latest advancements in NLP and LLMs, and big companies like OpenAI dominating the space, many people wonder: Are we heading further into a black box era with larger and larger models, obscured behind APIs controlled by big tech monopolies?

T-RAG: Lessons from the LLM Trenches

T-RAG: Lessons from the LLM Trenches Fatehkia, Lucas, Chawla (2024)

An important application area is question answering over private enterprise documents where the main considerations are data security, which necessitates applications that can be deployed on-prem, [and] limited computational resources. [...] In addition to retrieving contextual documents, we use the spaCy library with custom rules to detect named entities from the organization.

State-of-the-Art Transformer Pipelines in spaCy

State-of-the-Art Transformer Pipelines in spaCy aiGrunn

In this talk, we will show you how you can use transformer models (from pretrained models such as XLM-RoBERTa to large language models like Llama2) to create state-of-the-art annotation pipelines for text annotation tasks such as named entity recognition.

MP Interests Tracker: Utilising GenAI to uncover insights in the UK Register of Financial Interest

MP Interests Tracker: Utilising GenAI to uncover insights in the UK Register of Financial Interest JournalismAI Blog

Project from teams at The Times and BBC using spacy-llm to make complex financial interests data more accessible.

🦙 spacy-llm v0.5.0Sep 8, 2023

Improved user API and novel Chain-of-Thought prompting for more accurate NER

Large Language Models: From Prototype to Production

Large Language Models: From Prototype to Production EuroPython Keynote

Large Language Models (LLMs) have shown some impressive capabilities and their impact is the topic of the moment. In this talk, Ines presents visions for NLP in the age of LLMs and a pragmatic, practical approach for how to use Large Language Models to ship more successful NLP projects from prototype to production today.

🦙 spacy-llm v0.4.0Jul 6, 2023

Falcon, sentiment analysis, summarization, backend refactoring

Large Disagreement Modelling

Large Disagreement Modelling

“In this blogpost I’d like to talk about large language models. There’s a bunch of hype, sure, but there’s also an opportunity to revisit one of my favourite machine learning techniques: disagreement.”

🦙 spacy-llm v0.1.0May 11, 2023

Integrating LLMs into structured NLP pipelines