Explosion in 2021: Our Year in Review

· by Ines Montani, Matthew Honnibal, Sofie Van Landeghem & Philip Vollet · ~9 min. read

The year 2021 is coming to an end, and like the previous year, it was shaped by unique challenges that impacted our work together. For Explosion, it was a very productive year. We found an investor that fits our strategy, we released spaCy v3, the work on Prodigy Teams is in full swing, and the team has grown a lot. So here’s our look back at our highlights of the year 2021.

January

February

year in review 2021 feb
  • 💫 Feb 1: We kicked off February with the big release of spaCy v3.0, which features new transformer-based pipelines that get spaCy’s accuracy right up to the current state-of-the-art, and a new workflow system to help you take projects from prototype to production. If you’re interested in what spaCy v3 is all about, check out our video, where Ines and Matt guide you through some of the most exciting new features!
  • 🪐 Feb 1: As an add-on to our spaCy v3 release, we published spaCy projects, which allow managing end-to-end spaCy workflows for different use cases and domains.
  • 📺 Feb 1: Along with spaCy v3, Ines published the behind the scene spaCy v3 design concepts video.
  • 📺 Feb 1: Sofie celebrated the release of spaCy v3 with her tutorial on implementing a trainable entity relation extraction component in spaCy v3.
  • 📺 Feb 3: At the Contributing.Today meetup with Guido van Rossum, Sofie presented the new spaCy v3 features.

March

year in review 2021 mar
  • 💫 Mar 4: We released 1.0 of our spaCy and Stanza package, which allows you to use the latest Stanza (StanfordNLP) research models directly in spaCy.
  • 📺 Mar 17: March saw a new episode of Vincent Warmerdam’s “Intro to NLP with spaCy” series. In this episode, Vincent showcased the project system of spaCy v3.
  • 📺 Mar 29: Ines joined the at the German Python Podcast to talk about Natural Language Processing with spaCy.
  • 🥳 Mar 30: At the end of March, we celebrated that spaCy reached 20k+ stars on GitHub.

April

year in review 2021 apr
  • 📺 Apr 22: Ines was invited as a guest on Microsoft’s A bit of AI show to talk about her journey into AI.
  • 📺 Apr 30: Later that month, Ines joined the Snorkel Science Talks. She discussed her path into machine learning, fundamental design decisions behind spaCy, and the importance of bringing together different stakeholders in the machine learning development process.

June

year in review 2021 jun

July

year in review 2021 jul
  • 💫 Jul 7: We released spaCy v3.1, which allows using predicted annotations during training. In addition, the release includes a SpanCategorizer component for predicting arbitrary and overlapping spans. You can create training data for it using Prodigy’s new annotation UI for overlapping spans.
  • 🤗 Jul 13: Hugging Face welcomed spaCy to their Hub. You can now upload any spaCy pipeline using the spacy-huggingface-hub CLI, with auto-generated pretty READMEs and a interactive visualizers to try your pipeline in the browser.
  • 🥳 Jul 14: Sofie became team lead for spaCy.

August

  • Aug 12: We’ve partnered with Weights & Biases and the tracking of reproducible spaCy NLP pipelines became even easier.
  • Aug 12: We released Prodigy v1.11, which includes a bunch of new features, including a new installation process via pip and new wheels for Python 3.9 and ARM architectures, a new recipe and UI for annotating overlapping and nested spans, new recipes for improving a sentence recognizer model, further training and data export recipes that seamlessly integrate with spaCy’s config system.
  • 📺 Aug 17: Ines was live on the radio and joined the Byte Into IT show on Melbourne’s Triple R radio station.

September

year in review 2021 sep
  • 💥 Sep 2: A big moment for us – we sold 5% of Explosion. Since founding Explosion in 2016, we’ve run the company as a profitable business. Our next step is Prodigy Teams, and doing this project well is much more important to us than doing it cheaply, so we decided to consider an external investment. With SignalFire, we found an investor that fits our strategy.

November

year in review 2021 nov
  • 💫 Nov 5: We released spaCy v3.2, which improved performance for spaCy on Apple M1 and Nvidia GPU, added Doc input for pipelines, and provided registered scoring functions.
  • 🍏 Nov 5: Along with the new spaCy 3.2, we published our thinc-apple-ops package to accelerate spaCy on macOS by calling into Apple’s native “Accelerate” library.
  • 🌸 Nov 5: We also presented Adriane’s recent work on our new floret library, which uses fastText and Bloom embeddings for compact, full-coverage vectors with spaCy.
  • 🌳 Nov 17: In mid-November, Daniël presented our new experimental machine learning-based lemmatizer that posts accuracies above 95% for many languages.
  • ✍️ Nov 17: Our machine learning engineer Lj Miranda published a detailed technical overview on using spaCy’s project config system, traversing our stack in increasing levels of abstraction.
  • 🛡️ Nov 17: The Guardian wrote about how their data science team used spaCy and Prodigy to train a machine learning model that helps extract quotes from news articles and match them to the correct source.
  • 📺 Nov 30 Weights & Bias hosted an AMA with Ines, where she talked about software development, Python, startups and product building.

December

year in review 2021 dec
  • ✍️ Dec 8: For KDNuggets, Ines shared her perspective on AI and Machine Learning developments in 2021 and key trends for 2022.
  • 🏫 Dec 9: We’ve updated our interactive NLP course for spaCy v3! The updated course is available in English, Spanish, German, and Japanese. More languages will follow.
  • ✍️ Dec 14: To demonstrate the performance of spaCy v3.2, Adriane compiled a series of UD benchmarks comparable to the Stanza and Trankit evaluations on Universal Dependencies v2.5.
  • 🦑 Dec 15: Our machine learning engineer Edward published his blog post on Healthsea, an end-to-end spaCy pipeline to analyze user reviews to supplementary products and extract their potential effects on health.
  • 📺 Dec 17: Ines was invited as a guest on the TalkPython podcast to discuss machine learning ethics and EU laws.

With the community and the team continuing to grow, we look forward to making 2022 even better. Thanks for all your support!

About the authors

  • Ines Montani CEO, Founder

    • Berlin, Germany
  • Matthew Honnibal CTO, Founder

    • Berlin, Germany
  • Sofie Van Landeghem Lead Machine Learning Engineer, spaCy Core Developer

    • Ghent, Belgium
  • Philip Vollet Community Success

    • Berlin, Germany