We make a suite of AI developer tools that emphasize usability, performance and data privacy. We’re proud to be part of the best-in-class Python data science ecosystem. Most of our software is open-source, and the components that aren’t are just as privacy-conscious and developer-friendly. Unlike most AI companies, we don’t want your data: it never has to leave your servers if you don’t want it to.
spaCy is a free, open-source library for advanced Natural Language Processing (NLP) in Python. It’s designed specifically for production use and helps you build applications that process and “understand” large volumes of text. It can be used to build information extraction or natural language understanding systems, or to pre-process text for deep learning.
Prodigy is a modern annotation tool for creating training data for machine learning models. It’s so efficient that data scientists can do the annotation themselves, enabling a new level of rapid iteration. Whether you’re working on entity recognition, intent detection or image classification, Prodigy can help you train and evaluate your models faster.
Thinc is a lightweight deep learning library that offers an elegant, type-checked, functional-programming API for composing models, with support for layers defined in other frameworks such as PyTorch, TensorFlow or MXNet. You can use Thinc as an interface layer, a standalone toolkit or a flexible way to develop new models.
Other open-source software
|🪐 projects||Project templates for end-to-end NLP workflows|
|🛸 spacy-transformers||spaCy pipelines for pre-trained BERT and other transformers|
|👩🏫 spacy-course||Advanced NLP with spaCy: A free online course|
|🦆 sense2vec||Contextually-keyed word vectors|
|☄️ spacy-ray||Parallel and distributed training with spaCy and Ray|
|👑 spacy-streamlit||spaCy building blocks and visualizers for Streamlit apps|
|💥 spacy-stanza||Use the latest Stanza (StanfordNLP) research models directly in spaCy|
|🦉 srsly||Modern high-performance serialization utilities for Python|
|💥 cython-blis||Fast matrix-multiplication as a self-contained Python library – no system dependencies!|
|🍳 prodigy-recipes||Recipes for the Prodigy annotation tool|
|🧬 jupyterlab-prodigy||A JupyterLab extension for annotating data with Prodigy|
|💥 cymem||Cython memory pool for RAII-style memory management|
|💥 preshed||Cython hash tables that assume keys are pre-hashed|
|📙 catalogue||Super lightweight function registries for your library|
|🍬 confection||The sweetest config system for Python|
|👯 Coreferee||Coreference resolution for English, French, German and Polish|
|🕵️ Holmes||Information extraction from English and German texts|
Demos and visualizations aren’t just eye candy — they’re an essential part of explaining and exploring AI technologies, especially during development. A good visualization lets you understand your model’s behavior and catch obvious problems early. Our demos include visualizations for spaCy’s dependency trees, entity recognition and similarity models.