Page: 4 · Explosion · Developer tools and consulting for AI, Machine Learning and NLP

Explosion builds developer tools for AI, Machine Learning and Natural Language Processing. →
Consulting

Project

Topics

Category

Tasks

Authors

Filtered by page: 4

Explosion in 2022: Our Year in Review

Explosion in 2022: Our Year in Review

It's been another exciting year at Explosion! We've developed a new end-to-end neural coref component for spaCy, improved the speed of our CNN pipelines up to 60%, and published new pre-trained pipelines for Finnish, Korean, Swedish and Croatian. We've also released several updates to Prodigy and introduced new recipes to kickstart annotation with zero- or few-shot learning.

🛸 spacy-transformers v1.2.0Jan 14, 2023

Better alignment for fast tokenizers

Extracting Structured Information from Greek Legislation Data

Extracting Structured Information from Greek Legislation Data Alexios (2023)

Worth noting is the existence of an application, called Prodigy, which takes advantage of an active learning framework and provides users with an interactive interface for data annotation.

Data is the new coffee

Data is the new coffee NormConf

The triangulation of ethical leader signals using qualitative, experimental, and data science methods

The triangulation of ethical leader signals using qualitative, experimental, and data science methods Banks, Ross, Toth, Tonidandel, Goloujeh, Dou, Wesslen (2022)

This additional text was labeled by the same coding team using Prodigy, [...] a flexible user interface tool built on top of spaCy, a leading open source library in python for natural language processing. We created a spaCy end‐to‐end project workflow including package versioning, data pre‐processing, data ingestion into a database, annotation sessions using Prodigy’s user interface, model training, model evaluation, python packaging, and visual app for testing the model.

Coreference Resolution in spaCy

Coreference Resolution in spaCy

In everyday conversation, we use pronouns or other expressions to refer to entities in many different ways, but we effortlessly understand these references. In NLP this is a challenging problem known as Coreference Resolution. In this video, we’ll show how to train spaCy’s new component for Coreference Resolution and how to apply the pipeline to resolve references in a text.

How the Guardian approaches quote extraction with NLP

How the Guardian approaches quote extraction with NLP

A case study of the Guardian's spaCy-Prodigy workflow to modularize quote extraction for content creation. This study includes iterative annotation guidelines and custom interface functionality.

floret: lightweight, robust word vectors

floret: lightweight, robust word vectors

An exploration of floret vectors: lightweight vectors for noisy data, novel words, rich morphology and more.

Bulk Labelling and Prodigy

Bulk Labelling and Prodigy

In this video, we’ll show a bulk labelling technique that can help you prepare data for Prodigy.

Finding Bad Labels for Text Classification with Jupyter and Prodigy

Finding Bad Labels for Text Classification with Jupyter and Prodigy

In this video, we’ll show you how to use set up Prodigy to find bad labels in text classification tasks. While many of the techniques are applied to text classification, they can also be used for classification tasks in general.

How we built a Stack Overflow Community questions analyzer

How we built a Stack Overflow Community questions analyzer GitLab Blog

How GitLab used spaCy to analyze and better understand Stack Overflow community questions about their tools and products.

Finding Duplicates in Tabular Data with Jupyter and Prodigy

Finding Duplicates in Tabular Data with Jupyter and Prodigy

In this video, we’ll show you how to use Prodigy to train a named entity recognition model from scratch, by taking advantage of semi-automatic annotation and modern transfer learning techniques.

🧪 spacy-experimental v0.4.0Mar 22, 2022

Added biaffine parser and other fixes for experimental tools

When Women Make Headlines

When Women Make Headlines The Pudding

Using spaCy and other packages from the NLP ecosystem for analyzing more than 382,000 headlines to see how women are represented (or misrepresented) in the news.

Universal Dependencies v2.5 Benchmarks for spaCy

Universal Dependencies v2.5 Benchmarks for spaCy

We present Universal Dependencies v2.5 benchmarks for spaCy v3.2 that show the competitive performance of spaCy in a direct comparison with Stanza and Trankit using the end-to-end evaluation from the CoNLL 2018 Shared Task.

Introducing spaCy v3.2

Introducing spaCy v3.2

spaCy v3.2 features usability improvements for custom training and scoring, improved performance and support for floret, our new fastText word vectors algorithm.

Robust solutions with Explosion’s applied NLP philosophy

Robust solutions with Explosion’s applied NLP philosophy UNC Charlotte

WW2 spaCy v0.0.9

WW2 spaCy v0.0.9

spaCy pipeline for processing primary and secondary sources for World War 2 texts.

Group-by statements that save the day

Group-by statements that save the day NormConf

Is it possible to have entities within entities within entities?

Is it possible to have entities within entities within entities?PyData Global 2022

Named entity recognition models might not be able to handle a wide variety of spans, but Spancat certainly can! Dive into named entity recognition, its limitations, and how we’ve solved them with a solution-focused talk and practical applications.

Finetuning and Bulk Labelling Images with Prodigy

Finetuning and Bulk Labelling Images with Prodigy

In this video, we’ll show how you might be able to improve the annotation experience by using bulk labelling for image classification.

Finding Video Games with Sense2Vec

Finding Video Games with Sense2Vec

In this video, we’ll show how you can improve the annotation experience by leveraging sense2vec to pre-fill named entities.

🧪 spacy-experimental v0.6.0Sep 28, 2022

Added Coref components and models

Speech acts in the Dutch COVID-19 Press Conferences

Speech acts in the Dutch COVID-19 Press Conferences Schueler, Marx (2022), Language Resources and Evaluation

We used the annotation tool Prodigy. Prodigy provides a simple interface in which the annotator sees a sentence and selects the applicable speech acts. The use of Prodigy considerably sped up the annotation process, allowing the annotators to annotate around 200 sentences per hour.

Introducing Span Categorization in Prodigy and spaCy

Introducing Span Categorization in Prodigy and spaCy

In this video, we’ll show you how to use Prodigy for spaCy’s Span Categorizer. We’ll be annotating food recipes and looking into ways to help with consistent annotations and speed up the process with patterns and temporary models.

Diary of a spaCy project: Predicting GitHub Tags

Diary of a spaCy project: Predicting GitHub Tags

Many people assume that working on an NLP project involves a lot of machine learning. Our experience is that it's much less about flowing tensors, and more about making a tailored solution. This blogposts demonstrates how a typical spaCy project could be initiated, implemented and executed towards a custom solution.

Finding Bad Image Data using UMAP and Prodigy

Finding Bad Image Data using UMAP and Prodigy

In this video, we’ll show you how to use Prodigy to find bad examples in the Google QuickDraw dataset. We will be leveraging a technique that involves UMAP to find strange images semi-automatically.

skweak v0.3.1

skweak v0.3.1

Weak supervision and flexible label functions and agrregation, integrated with spaCy.

Creating Tools that Spark Joy with Ines Montani ZenML Pipeline Conversations

Neural edit-tree lemmatization for spaCy

Neural edit-tree lemmatization for spaCy

We are happy to introduce a new, experimental, machine learning-based lemmatizer that posts accuracies above 95% for many languages. This lemmatizer learns to predict lemmatization rules from a corpus of examples and removes the need to write an exhaustive set of per-language lemmatization rules.

🌸 floret v0.10.0Oct 27, 2021

fastText + Bloom embeddings for compact, full-coverage vectors with spaCy

Anecdotes from 11 Role Models in Machine Learning

Anecdotes from 11 Role Models in Machine Learning Towards Data Science

Training spaCy NER Models with Prodigy

Training spaCy NER Models with Prodigy

This handy flowchart contains our most common tips, tricks, and best practices for training and updating spaCy named entity recognition models with Prodigy.

Setting your ML project up for success

Setting your ML project up for success

“What can you do to maximize probability of success for your Machine Learning solution? Throughout my 15 years as data scientist in academia, big pharma and through consulting, one common theme has emerged: the most reliable predictor of success for any NLP or ML-based solution is whether or not you involve the data science team early on.”

How the Guardian uses AI to analyse articles

How the Guardian uses AI to analyse articles JournalismAI Festival

Fast transformer inference with Metal Performance Shaders

Fast transformer inference with Metal Performance Shaders

We are happy to introduce support for Metal Performance Shaders in Thinc PyTorch layers. This makes it possible to run spaCy transformer-based pipelines on GPU on Apple Silicon Macs and improves inference speed up to 4.7 times.

medspacy v1.0

medspacy v1.0

A library of tools for performing clinical NLP and text processing tasks with spaCy.

End-to-end Neural Coreference Resolution in spaCy

End-to-end Neural Coreference Resolution in spaCy

Coreference resolution is the problem of resolving entities in texts to references such as pronouns. Even if you've never heard of it, it's something we all do constantly every day, and is a key to understanding natural language. We recently added an experimental implementation of an end-to-end neural coreference component to spaCy. This post explains the architecture of our model in detail.

🍏 thinc-apple-ops v0.1.0Jul 19, 2022

Many performance improvements

Introducing Holmes 4.0

Introducing Holmes 4.0

A few weeks ago we released version 4.0 of Holmes, which we are now able to offer under a permissive MIT license. Holmes is a library in the spaCy Universe that runs on top of spaCy and enables information extraction and intelligent search, currently for English and German. Holmes goes beyond simple matching algorithms and allows you to look for a specified idea or ideas in a corpus of documents.

Spancat: a new approach for span labeling

Spancat: a new approach for span labeling

The SpanCategorizer is a spaCy component that answers the NLP community's need to have structured annotation for a wide variety of labeled spans, including long phrases, non-named entities, or overlapping annotations. In this blog post, we're excited to talk more about spancat and showcase new features to help with your span labeling needs!

Solutions for Advanced NLP for Diverse Languages

Solutions for Advanced NLP for Diverse Languages New Languages for NLP Keynote

This talk discusses spaCy’s philosophy for modern NLP, its extensible design and new recent features to enable the development of advanced natural language processing pipelines for typologically diverse languages.

Compact word vectors with Bloom embeddings

Compact word vectors with Bloom embeddings

An introduction to the compact word vectors with Bloom embeddings used in Thinc, spaCy and floret.

Applied Language Technology

Applied Language Technology

Extensive online course on applied language technology with spaCy by Tuomo Hiippala, designed for students new to NLP and programming.

Explosion in 2021: Our Year in Review

Explosion in 2021: Our Year in Review

The year 2021 is coming to an end, and like the previous year, it was shaped by unique challenges that impacted our work together. For Explosion, it was a very productive year. We found an investor that fits our strategy, the work on Prodigy Teams is in full swing, and the team has grown a lot. So here's our look back at our highlights of the year 2021.

Talking sense: using machine learning to understand quotes

Talking sense: using machine learning to understand quotes The Guardian Blog

How the Guardian uses spaCy and Prodigy to train a machine learning model that helps extract quotes from news articles and match them to the correct source.

🛸 spacy-transformers v1.1.0Oct 18, 2021

Better serialization, full ModelOutput, mixed-precision training and more

We’ve sold 5% of Explosion

We’ve sold 5% of Explosion

Since founding Explosion in 2016, we’ve run the company as a profitable business and we decided to only consider external investment if we could find a deal that wouldn’t compromise the direction or stability of the company. We’re pleased to announce that we’ve found an investment that ticks all the boxes.

Reflections on a year of spaCy consulting at Explosion

Reflections on a year of spaCy consulting at Explosion

In this post, Peter shares some lessons learned from chatting with practitioners about their NLP challenges, developing production-ready NLP pipelines for clients, and working with an open-source development team.

Multi hash embeddings in spaCy

Multi hash embeddings in spaCy Miranda, Kádár, Boyd, Van Landeghem, Søgaard, Honnibal (2022)

In this technical report we lay out a bit of history and introduce the embedding methods in spaCy in detail. Second, we critically evaluate the hash embedding architecture with multi-embeddings on Named Entity Recognition datasets from a variety of domains and languages. The experiments validate most key design choices behind spaCy’s embedders, but we also uncover a few surprising results.

Custom Interfaces with blocks

Custom Interfaces with blocks

You can create custom annotation layouts in Prodigy using the annotation widgets that Prodigy provides by using the blocks feature. This video explains how to use this feature by building a custom interface that can manually annotate and transcribe audio.

Tools to Improve Training Data

Tools to Improve Training Data Talking Language AI - Cohere

spaCy Cheat Sheet

spaCy Cheat Sheet

Everything you need to know about spaCy as a handy two-page PDF.

spaCy behind the scenes: library patterns & design concepts explained

spaCy behind the scenes: library patterns & design concepts explained

Developer productivity has been central to our design of spaCy, both in smaller decisions and some of the bigger architectural questions. We believe in embracing the complexities of machine learning, not hiding it away under leaky abstractions, while also maintaining the developer experience. Read on to learn some of the design patterns within the library, how we've implemented them, and most importantly, why.

Introducing spaCy v3.4

Introducing spaCy v3.4

spaCy v3.4 brings typing and speed improvements along with new vectors for English CNN pipelines and new trained pipelines for Croatian.

🧪 spacy-experimental v0.5.0Jun 11, 2022

Added SpanFinder, Span suggesters and bugfixes

Evolution of spaCy

Evolution of spaCy D4 Data

Introducing spaCy v3.3

Introducing spaCy v3.3

spaCy v3.3 improves the speed of core pipeline components, adds a new trainable lemmatizer, and introduces trained pipelines for Finnish, Korean and Swedish.

Automated Identification of Clinical Procedures in Free-Text Electronic Clinical Records with a Low-Code Named Entity Recognition Workflow

Automated Identification of Clinical Procedures in Free-Text Electronic Clinical Records with a Low-Code Named Entity Recognition Workflow Macri, Teoh, Bacchi, Sun, Selva, Casson, Chan (2022), Methods of Information in Medicine

The use of a low-code annotation software tool [Prodigy] allows the rapid creation of a custom annotation dataset to train a NER model to identify clinical procedures stored in free-text electronic clinical notes.

Introducing spaCy Tailored Pipelines

Introducing spaCy Tailored Pipelines

Explosion is pleased to announce a new development services offering, spaCy Tailored Pipelines. We’ll build you a custom natural language processing pipeline, delivered in a standardized format using spaCy’s projects system.

Healthsea: an end-to-end spaCy pipeline for exploring health supplement effects

Healthsea: an end-to-end spaCy pipeline for exploring health supplement effects

Create better access to health with machine learning and natural language processing. Read about our journey of developing Healthsea, an end-to-end spaCy pipeline for analyzing user reviews to supplement products and extracting potential effects on health.

spaCy v3's project and config systems are pretty great

spaCy v3's project and config systems are pretty great

The road to production has become increasingly harder. Machine Learning Engineers who turn prototypes into production-ready software face difficulties with the lack of tooling and best-practices. spaCy v3, with its configuration and project system, introduced a way to solve this problem. Here's my take on how it works, and how it can ramp-up your team!

Reproducible spaCy NLP Experiments with Weights & Biases

Reproducible spaCy NLP Experiments with Weights & Biases Weights & Biases Blog

This tutorial will show how to add Weights & Biases to any spaCy NLP project to track your experiments, save model checkpoints, and version your datasets.