We make a suite of AI developer tools that emphasize usability, performance and data privacy. We’re proud to be part of the best-in-class Python data science ecosystem. Most of our software is open-source, and the components that aren’t are just as privacy-conscious and developer-friendly. Unlike most AI companies, we don’t want your data: it never has to leave your servers if you don’t want it to.


  • 225m+ downloads
  • 29k+ GitHub stars
  • 87k+ GitHub projects
  • 670+ contributors

spaCy is a free, open-source library for advanced Natural Language Processing (NLP) in Python. It’s designed specifically for production use and helps you build applications that process and “understand” large volumes of text. It can be used to build information extraction or natural language understanding systems, or to pre-process text for deep learning.


  • 10k+ users
  • 900+ companies
WebsiteLive Demo

Prodigy is a modern annotation tool for creating training data for machine learning models. It’s so efficient that data scientists can do the annotation themselves, enabling a new level of rapid iteration. Whether you’re working on entity recognition, intent detection or image classification, Prodigy can help you train and evaluate your models faster.

Prodigy Teams

  • 650+ beta signups
WebsiteJoin waitlist

Prodigy Teams brings collaborative data development into the cloud, built for scale, scriptability and data privacy. Host your own data processing cluster under your control, invite your team, manage annotators, data and tasks, run automated processes like model training and take advantage of Large Language Models to create better data and models faster.

Open Source