Software

We make a suite of AI developer tools that emphasize usability, performance and data privacy. We’re proud to be part of the best-in-class Python data science ecosystem. Most of our software is open-source, and the components that aren’t are just as privacy-conscious and developer-friendly. Unlike most AI companies, we don’t want your data: it never has to leave your servers if you don’t want it to.

spaCy is a free, open-source library for advanced Natural Language Processing (NLP) in Python. It’s designed specifically for production use and helps you build applications that process and “understand” large volumes of text. It can be used to build information extraction or natural language understanding systems, or to pre-process text for deep learning.

20m+total downloads
17k+GitHub stars
400+contributors

spaCy website spaCy on GitHub


Prodigy is a modern annotation tool for creating training data for machine learning models. It’s so efficient that data scientists can do the annotation themselves, enabling a new level of rapid iteration. Whether you’re working on entity recognition, intent detection or image classification, Prodigy can help you train and evaluate your models faster.

19+annotation modes
4k+users
500+companies

Prodigy website Live demo


Thinc is a lightweight deep learning library that offers an elegant, type-checked, functional-programming API for composing models, with support for layers defined in other frameworks such as PyTorch, TensorFlow or MXNet. You can use Thinc as an interface layer, a standalone toolkit or a flexible way to develop new models.


Thinc website Thinc on GitHub


Other open-source software

🪐 projectsExample projects for various NLP tasks with datasets, scripts and results
🛸 spacy-transformersspaCy pipelines for pre-trained BERT and other transformers
👩‍🏫 spacy-courseAdvanced NLP with spaCy: A free online course
🦆 sense2vecContextually-keyed word vectors
👑 spacy-streamlitspaCy building blocks and visualizers for Streamlit apps
💥 spacy-stanzaUse the latest Stanza (StanfordNLP) research models directly in spaCy
🦉 srslyModern high-performance serialization utilities for Python
💥 cython-blisFast matrix-multiplication as a self-contained Python library – no system dependencies!
🍳 prodigy-recipesRecipes for the Prodigy annotation tool
🧬 jupyterlab-prodigyA JupyterLab extension for annotating data with Prodigy
💥 cymemCython memory pool for RAII-style memory management
💥 preshedCython hash tables that assume keys are pre-hashed
📙 catalogueSuper lightweight function registries for your library

Open-source from our team

The following projects aren’t official Explosion projects, but they’re developed by members of our team.

🚀 fastapiModern framework for building APIs with Python based on type hints (by Sebastián)
typerLibrary for building CLI applications based on type hints (by Sebastián)
🍣 wasabiA lightweight console printing and formatting toolkit (by Ines)
🍇 juniperEdit and execute code snippets in the browser using Jupyter kernels (by Ines)
🧮 mathySolve math problems step-by-step with reinforcement learning (by Justin)

Demos

Demos and visualizations aren’t just eye candy — they’re an essential part of explaining and exploring AI technologies, especially during development. A good visualisation lets you understand your model’s behaviour and catch obvious problems early. Our demos include visualisations for spaCy’s depency trees, entity recognition and similarity models.


displacy

displaCy Dependency Visualizer

Visualize spaCy’s guess at the syntactic structure of a sentence. Arrows point from children to heads, and are labelled by their relation type.

displacy ent

displaCy Named Entity Visualizer

Visualize spaCy’s guess at the named entities in the document. You can filter the displayed types, to only show the annotations you’re interested in.

matcher

Rule-based Matcher Explorer

Test spaCy’s rule-based Matcher by creating token patterns interactively and running them over your text. Explore how spaCy processes your text – and why your pattern matches, or doesn’t.

sense2vec

sense2vec: Semantic Analysis of the Reddit Hivemind

We parsed every comment posted Reddit in 2015 and 2019, and trained different word2vec models for each year.

spacy graphql

GraphQL queries for spaCy

A simple and experimental app that lets you query spaCy’s linguistic annotations using GraphQL, a powerful, strongly typed API query language.

prodigy

Prodigy Annotation tool

Whether you’re working on entity recognition, intent detection or image classification, Prodigy can help you train and evaluate your models faster.