Page: 3 · Explosion · Developer tools and consulting for AI, Machine Learning and NLP

Explosion builds developer tools for AI, Machine Learning and Natural Language Processing. →
Consulting

Project

Topics

Category

Tasks

Authors

Filtered by page: 3

💫 spacy v3.7.0Oct 2, 2023

Trained pipelines using Curated Transformers and support for Python 3.12

scispacy v0.5.3

scispacy v0.5.3

A Python package containing spaCy models for processing biomedical, scientific or clinical text, developed by AI2.

✨ prodigy v1.14.1Sep 29, 2023

Custom event hooks for custom UI interactivity

Into the Single Cell Multiverse: an End-to-End Dataset for Procedural Knowledge Extraction in Biomedical Texts

Into the Single Cell Multiverse: an End-to-End Dataset for Procedural Knowledge Extraction in Biomedical Texts Dannenfelser, Zhong, Zhang, Yao (2023), NeurIPS

Tissue, cell type, tool, and method were annotated using the Prodigy software tool developed by Explosion AI for easy tracking of token-level tags.

✨ prodigy v1.13.1Aug 23, 2023

Use models and LLMs as annotators to find disagreements

🦦 weasel v0.2.0Aug 4, 2023

Support for Pydantic v2 and cloudpathlib

Task Routers in Prodigy

Task Routers in Prodigy

How to use the new task routers to customize how examples are assigned in multi-annotator workflows.

🍬 confection v0.1.0Jun 29, 2023

Improved JSON parsing, updated utils and warnings

Concepts and measures of bureaucratic constraints in European Union laws from hand-coding to machine-learning

Concepts and measures of bureaucratic constraints in European Union laws from hand-coding to machine-learning Franchino, Migliorati, Pagano, Vignoli (2023)

The models “learn” the relations between the text tokens and the entity categories from two randomly selected samples of sentences that are extracted from a pre-processed corpus and have been manually annotated using the Python-implemented platform “Prodigy”.

Large Language Models: From Prototype to Production

Large Language Models: From Prototype to Production PyData London Keynote

🦙 spacy-llm v0.1.0May 11, 2023

Integrating LLMs into structured NLP pipelines

Efficient Information Extraction From Text With spaCy

Efficient Information Extraction From Text With spaCy JetBrains PyCharm

This webinar takes you through building a spaCy project that uses a named entity recognition (NER) model to extract entities of interest from restaurant reviews, like prices, opening hours and ratings.

Incorporating LLMs into practical NLP workflows

Incorporating LLMs into practical NLP workflows PyCon DE & PyData Berlin

Creating Custom Event Data Without Dictionaries: A Bag-of-Tricks

Creating Custom Event Data Without Dictionaries: A Bag-of-Tricks Halterman, Schrodt, Beger, Bagozzi, Scarborough (2023)

While in the past the process of generating training case has been quite time consuming and tedious, newer approaches such as those incorporated into the web-based Prodigy annotation system allow this to be done much more quickly.

Modular Journalism: The new way to find stories?The Interhacktives Podcast

Deploying a Prodigy cloud service for Posh’s financial chatbots

Deploying a Prodigy cloud service for Posh’s financial chatbots

A Prodigy case study of Posh AI's production-ready annotation platform and custom chatbot annotation tasks for banking customers.

Calmcode, Explosion, Data Science

Calmcode, Explosion, Data Science Learning from Machine Learning

🔌 prodigy-lunr v0.1.0Oct 5, 2023

Document search via LUNR to fetch relevant data subsets to label

MP Interests Tracker: Utilising GenAI to uncover insights in the UK Register of Financial Interest

MP Interests Tracker: Utilising GenAI to uncover insights in the UK Register of Financial Interest JournalismAI Blog

Project from teams at The Times and BBC using spacy-llm to make complex financial interests data more accessible.

🛸 spacy-transformers v1.3.1Sep 26, 2023

Support for newer versions of Transformers

Models as annotators in Prodigy

Models as annotators in Prodigy

How to use models and LLMs as annotators to find disagreements and prioritize examples to annotate first.

✨ prodigy v1.13.0Aug 15, 2023

LLM support for NER, text classification and span categorization

🤖 curated-transformers v1.0.0Aug 3, 2023

Lightweight, composable PyTorch transformers

ACL LAW Workshop Poster

ACL LAW Workshop Poster ACL 2023

spaCy: a customizable NLP toolkit designed for developers

spaCy: a customizable NLP toolkit designed for developers ODSC Europe

🦙 spacy-llm v0.3.0Jun 14, 2023

Cohere, Anthropic, OpenLLaMa, StableLM, logging, streamlit demo, lemmatization task

SpanCat with spaCy and Prodigy on real data

SpanCat with spaCy and Prodigy on real data

YouTube series by WJB Mattingly showing an end-to-end project, from cultivating and annotating data to training, testing and visualizing a model.

spaCy Plugin for VSCode

spaCy Plugin for VSCode

The spaCy VSCode Extension provides additional tooling and features for working with spaCy’s config files. Version 1.0.0 includes hover descriptions for registry functions, variables, and section names within the config as an installable extension.

Predicting relations between SOAP note sections: The value of incorporating a clinical information model

Predicting relations between SOAP note sections: The value of incorporating a clinical information model Socrates, Gilson, Lopez, Chi, Taylor, Chartash (2023), Journal of Biomedical Informatics

To support human annotation, we first annotate 100 Assessment and Plan subsections manually using Prodigy, and then use spacy-transformers to fine-tune a general domain RoBERTa-base model pretrained on OntoNotes 5 for both the Assessment and Plan section NER tagging.

textaCy v0.13.0

textaCy v0.13.0

Utility library for NLP tasks before and after spaCy, including preprocessing, normalization and additional information extraction features.

The Nesta Skills Extractor Library

The Nesta Skills Extractor Library Economic Statistics Centre of Excellence

A new library for extracting skills from job adverts and mapping them to a taxonomy of your choice, built on top of spaCy.

🕊️ radicli v0.0.3Feb 9, 2023

Radically lightweight command-line interfaces

Towards a Tagalog NLP pipeline

Towards a Tagalog NLP pipeline

In this blog post, Lj talks about how he built an NER pipeline for Tagalog, the gold-standard dataset, benchmarking results, and his hopes for the future of Tagalog NLP.

🔌 prodigy-ann v0.1.0Oct 5, 2023

Use ANN techniques to fetch relevant data subsets to label

Panel: Large Language Models

Panel: Large Language Models Big PyData BBQ

with Ines, Alejandro Saucedo (Zalando, Institute for Ethical AI & ML), Alina Lehnhard (Cerence), Michael Gerz (Heidelberg University), Alexander CS Hendorf (Königsweg)

🦙 spacy-llm v0.5.0Sep 8, 2023

Improved user API and novel Chain-of-Thought prompting for more accurate NER

🦦 weasel v0.3.0Aug 14, 2023

Updates for requirements checks

How to Host Your Own API of Open Language Models For Free

Powered by Explosion’s curated-transformers, FastAPI and ngrok.

Introducing spaCy v3.6

Introducing spaCy v3.6

spaCy v3.6 introduces the span finder component and trained pipelines for Slovenian.

🦙 spacy-llm v0.4.0Jul 6, 2023

Falcon, sentiment analysis, summarization, backend refactoring

🦦 weasel v0.1.0Jun 14, 2023

A small and easy workflow system

Inter-rater agreement for the annotation of neurologic signs and symptoms in electronic health records

Inter-rater agreement for the annotation of neurologic signs and symptoms in electronic health records Oommen, Howlett-Prieto, Carrithers, Hier (2023)

Prodigy was used to annotate neurologic concepts in the EHR physician notes.

Large Disagreement Modelling

Large Disagreement Modelling

“In this blogpost I’d like to talk about large language models. There’s a bunch of hype, sure, but there’s also an opportunity to revisit one of my favourite machine learning techniques: disagreement.”

Implementing a custom trainable component for relation extraction

Implementing a custom trainable component for relation extraction

Relation extraction refers to the process of predicting and labeling semantic relationships between named entities. In this blog post, we'll go over the process of building a custom relation extraction component using spaCy and Thinc. We'll also add a Hugging Face transformer to improve performance at the end of the post. You'll see how you can utilize Thinc's flexible and customizable system to build an NLP pipeline for biomedical relation extraction.

Intro to NLP with spaCy for Digital Humanities

Intro to NLP with spaCy for Digital Humanities Princeton University

Rulers, NER, and data iteration

Rulers, NER, and data iteration

About the power of Rules + ML and the importance of iteration on your pipeline and your data.

Fiscal data in text: Information extraction from audit reports using Natural Language Processing

Fiscal data in text: Information extraction from audit reports using Natural Language Processing Beltran (2023), Data & Policy, Cambridge University Press

I relied on the text annotation software Prodigy in Python that offers a friendly user interface where the reviewer can read the text and assign a label to each paragraph.

AI/ML for the rest of us

AI/ML for the rest of us GitHub Newsletter

🔌 prodigy-pdf v0.1.0Oct 5, 2023

Annotate and segment PDF files and perform OCR

Natural Intelligence is All You Need[tm]

Natural Intelligence is All You Need[tm]PyData Amsterdam Keynote

✨ prodigy v1.13.2Sep 7, 2023

New LLM recipes for terms generation and prompt engineering

🔮 thinc v8.2.0Aug 11, 2023

Updates for automatic imports

Large Language Models: From Prototype to Production

Large Language Models: From Prototype to Production EuroPython Keynote

Large Language Models (LLMs) have shown some impressive capabilities and their impact is the topic of the moment. In this talk, Ines presents visions for NLP in the age of LLMs and a pragmatic, practical approach for how to use Large Language Models to ship more successful NLP projects from prototype to production today.

Prodigy v1.12: OpenAI integration, prompt engineering, task routers, deployment docs and more

Prodigy v1.12: OpenAI integration, prompt engineering, task routers, deployment docs and more

✨ prodigy v1.12.0Jul 5, 2023

LLM-assisted workflows for annotation and prompt engineering, task routing for multi-annotator setups

How Good is the Model in Model-in-the-loop Event Coreference Resolution Annotation?

How Good is the Model in Model-in-the-loop Event Coreference Resolution Annotation?Ahmed, Nath, Regan, Pollins, Krishnaswamy, Martin (2023)

Figure 6 illustrates the interface design of the annotation methodology on the popular model-in-the-loop annotation tool - Prodigy. We use this tool for the simplicity it offers in plugging in the various ranking methods we explained.

🦙 spacy-llm v0.2.0May 30, 2023

REL and spancat tasks, reading prompt templates from file

Against LLM maximalism

Against LLM maximalism

LLMs are not a direct solution to most of the NLP use-cases companies have been working on. They are extremely useful, but if you want to deliver reliable software you can improve over time, you can't just write a prompt and call it a day. Once you're past prototyping and want to deliver the best system you can, supervised learning will often give you better efficiency, accuracy and reliability.

You are what you read: Building a personal internet front-page with spaCy and Prodigy

You are what you read: Building a personal internet front-page with spaCy and Prodigy PyCon DE & PyData Berlin

The Tale of Bloom Embeddings and Unseen Entities

The Tale of Bloom Embeddings and Unseen Entities

The default Bloom embedding layer in spaCy is unconventional, but very powerful and efficient. We wrote about it before and showed the advantages it provides in terms of memory efficiency for our floret embeddings. Now we have released the first technical report by Explosion, where we explain Bloom embeddings in more detail and rigorously compare them to traditional embeddings. In this post we'll highlight some of our results with a special focus on unseen entities.

Slovak Dataset for Multilingual Question Answering

Slovak Dataset for Multilingual Question Answering Hládek, Staš, Juhár, Koctúr (2023)

We used the Prodigy annotation tool to annotate the questions and answers. One annotation task corresponds to one web application deployment and different configurations.

NLP: From Prototype to Production

NLP: From Prototype to Production Outerbounds Fireside Chat

Introducing spaCy v3.5

Introducing spaCy v3.5

spaCy v3.5 introduces new CLI commands, fuzzy matching, improvements for entity linking and more.