Author: Daniël de Kok · Explosion · Developer tools and consulting for AI, Machine Learning and NLP

Explosion builds developer tools for AI, Machine Learning and Natural Language Processing. →
Consulting

Project

Topics

Category

Tasks

Authors

Filtered by author: Daniël de Kok

State-of-the-Art Transformer Pipelines in spaCy

State-of-the-Art Transformer Pipelines in spaCy aiGrunn

In this talk, we will show you how you can use transformer models (from pretrained models such as XLM-RoBERTa to large language models like Llama2) to create state-of-the-art annotation pipelines for text annotation tasks such as named entity recognition.

Fast transformer inference with Metal Performance Shaders

Fast transformer inference with Metal Performance Shaders

We are happy to introduce support for Metal Performance Shaders in Thinc PyTorch layers. This makes it possible to run spaCy transformer-based pipelines on GPU on Apple Silicon Macs and improves inference speed up to 4.7 times.

Introducing spaCy v3.2

Introducing spaCy v3.2

spaCy v3.2 features usability improvements for custom training and scoring, improved performance and support for floret, our new fastText word vectors algorithm.

Introducing spaCy v3.6

Introducing spaCy v3.6

spaCy v3.6 introduces the span finder component and trained pipelines for Slovenian.

Introducing spaCy v3.4

Introducing spaCy v3.4

spaCy v3.4 brings typing and speed improvements along with new vectors for English CNN pipelines and new trained pipelines for Croatian.

Introducing spaCy v3.5

Introducing spaCy v3.5

spaCy v3.5 introduces new CLI commands, fuzzy matching, improvements for entity linking and more.

Introducing spaCy v3.3

Introducing spaCy v3.3

spaCy v3.3 improves the speed of core pipeline components, adds a new trainable lemmatizer, and introduces trained pipelines for Finnish, Korean and Swedish.

Explosion in 2022: Our Year in Review

Explosion in 2022: Our Year in Review

It's been another exciting year at Explosion! We've developed a new end-to-end neural coref component for spaCy, improved the speed of our CNN pipelines up to 60%, and published new pre-trained pipelines for Finnish, Korean, Swedish and Croatian. We've also released several updates to Prodigy and introduced new recipes to kickstart annotation with zero- or few-shot learning.

Neural edit-tree lemmatization for spaCy

Neural edit-tree lemmatization for spaCy

We are happy to introduce a new, experimental, machine learning-based lemmatizer that posts accuracies above 95% for many languages. This lemmatizer learns to predict lemmatization rules from a corpus of examples and removes the need to write an exhaustive set of per-language lemmatization rules.