Explosion in 2020: Our Year in Review

· by Matthew Honnibal, Ines Montani & Walter Henry· ~10 min. read

While 2020 hasn’t been easy for anyone, at Explosion we’ve considered ourselves relatively fortunate in this most interesting year. We’ve always worked remotely, so we’ve been able to take both pride and comfort in continuing to ship good software. Here’s a look back at what we’ve been up to.

January

year in review 2020 jan
  • 🔮 Jan 28: 2020 started with a big release: the alpha of Thinc v8.0, a lightweight deep learning library that offers an elegant, type-checked, functional-programming API for composing models, with support for layers defined in other frameworks such as PyTorch, TensorFlow or MXNet. Thinc was re-written from the ground up to support some of the new workflows coming to spaCy v3.0, including a flexible training configuration system and the ability to plug in model implementations written in any framework.

February

year in review 2020 feb
  • 🎤 Feb 8: In February, Matt and Ines were invited to PyCon Colombia in Medellín – thanks to the team for organizing such an awesome event! Ines presented a keynote titled “The Future of NLP in Python” about how new Python tooling and advancements in Natural Language Processing help with closing the gap between prototype and production, making it easier to ship powerful natural language understanding pipelines.
  • 📺 Feb 8: At PyCon Colombia, Ines was also interviewed by Karolina Ladino and they talked the history of spaCy, and how to get into programming, machine learning and NLP.

March

year in review 2020 mar
  • 📺 Mar 2: March started with a new episode of Vincent Warmerdam’s popular video series, “Intro to NLP with spaCy”. In this episode, he explored the processing pipeline and trained a simple NER model to detect programming languages.
  • 📺 Mar 16: Ines published an end-to-end video tutorial showing how to use our annotation tool Prodigy to train a named entity recognition model from scratch, by taking advantage of semi-automatic annotation and modern transfer learning techniques.
  • 💻 Mar 20: Sebastián released Typer, a library for building modern CLIs, powered by Python type hints. We’ve been using it extensively in our projects ever since!
  • 📺 Mar 24: In the next Prodigy tutorial video, Ines showed how to build fully custom annotation workflows and UIs for image captioning, and how to plug in a simple PyTorch image captioning model. Also: cats! 😺
  • 📻 Mar 30: Towards the end of the month, Matt joined the Podcast.__init__ podcast again and discussed Explosion’s developer tools stack and what’s next for spaCy, Thinc and Prodigy.

April

year in review 2020 apr
  • 🏫 Apr 21: In April, we released the first translation of the free spaCy online course, Modernes NLP mit spaCy, featuring German instructions and text examples.
  • 📻 Apr 26: Ines was also invited as a guest on the Chai Time Data Science podcast and talked about her NLP journey, spaCy and Prodigy, open-source development, and tattoos.

May

year in review 2020 may
  • 🏫 May 6: May started off with a Japanese translation of the free spaCy online course: spaCy を使った先進的な自然言語処理. Special thanks to Yohei Tamura!
  • 📺 May 7: A day later, Sofie released an end-to-end video tutorial showing how to train your own custom Entity Linking model with spaCy to disambiguate different mentions of a person name to unique identifiers in a knowledge base, and how to create your own training data from scratch.
  • 🏫 May 11: ¡Hola! The free spaCy online course was released in Spanish, complete with Spanish text examples: NLP avanzado con spaCy. Thanks to Camila Gutierrez!
  • 📺 May 14: May featured even more additions to the free spaCy course: Ines recorded video versions in English and German that you can view as standalone lessons on YouTube, or watch as part of the interactive online course.

June

year in review 2020 jun
  • 📺 Jun 13: June saw another new episode of Vincent Warmerdam’s “Intro to NLP with spaCy” series. In this episode, he digs deeper into the performance of the NER model he trained, using a rule-based classifier to probe for errors and improve the training data.
  • 💫 Jun 16: We also released spaCy v2.3, which added trained pipelines for Chinese, Japanese, Danish, Polish and Romanian, updated all 15 model families with word vectors and improved accuracy, while also decreasing model size and loading times for models with vectors.
  • Jun 16: Prodigy got a big upgrade in June with the release of v1.10.0. The version includes a bunch of new features, interfaces and recipes for dependency and relation annotation, audio and video annotation, as well as a new and improved manual image annotation interface with support for editing shapes and bounding boxes.
  • 📺 June 16: To show you the new Prodigy features in action, Ines recorded a video walkthrough that includes examples of dependency and relation annotation, coreference resolution, biomedical event extraction, audio and video annotation, NER annotation for fine-tuning transformers and more!
  • 🎤 Jun 18: At Rasa’s Level 3 AI Assistant conference, Ines talked about “Designing Practical NLP Solutions”, how to break down larger business problems into solvable machine learning tasks, and how to make your NLP projects fail less.
  • 💻 Jun 21: spacy-streamlit is released! It’s a Python library containing building blocks and visualizers for integrating spaCy pipelines into Streamlit apps.
  • 📺 Jun 25: Finally, we published a Spanish video version of the free online course, presented by Camila Gutierrez. ¡Practiquemos!

July

October

year in review 2020 oct
  • 📻 Oct 4: Sebastián was a guest on the Talk Python podcast to discuss building modern and fast APIs with FastAPI.
  • 📻 Oct 13: On the DevJourney Podcast, Ines shared her personal software development journey, from getting her first computer to becoming a core developer of spaCy and founding Explosion.
  • 💫 Oct 15: In mid-October, we finally published the long awaited nightly pre-release of spaCy v3.0! spaCy v3.0 features all new transformer-based pipelines that bring spaCy’s accuracy right up to the current state-of-the-art. You can use any pretrained transformer to train your own pipelines, and even share one transformer between multiple components with multi-task learning. Training is now fully configurable and extensible, and you can define your own custom models using PyTorch, TensorFlow and other frameworks. The new spaCy projects system lets you describe whole end-to-end workflows in a single file, giving you an easy path from prototype to production, and making it easy to clone and adapt best-practice projects for your own use cases.
  • 🎤 Oct 26: In her keynote at Global AI Live, Ines presented the upcoming spaCy v3.0 and how it makes it easier than ever to bring state-of-the-art NLP projects from prototype to production.
  • 🐍 Oct 27: Ines was honored to be recognized as a Python Software Foundation Fellow, due to her work with Explosion on spaCy and other projects.
  • 📻 Oct 29: Wrapping up October, Ines and Sofie joined the Gradient Dissent podcast hosted by Weights & Biases to talk about spaCy v3.0 and the new features, the motivation behind the new release and the various design decisions we made along the way.

November

December

year in review 2020 dec
  • 📰 Dec 4: For KDNuggets, Ines shared her perspective on AI and Machine Learning developments in 2020 and key trends for 2021.
  • 💫 Dec 11: In December, GitHub introduced discussion boards, so we officially launched the spaCy discussion board! Come join the community and ask for help with your code, share tips, tricks and best practices, discuss features and project ideas, collaborate on language support, show off what you’ve built and stay up to date with the latest spaCy news!
  • 💘 Dec 14: To celebrate another year (and Ines’ birthday!), we started another round of sending spaCy stickers to the community! This time with new designs, including cool holographic styles. You can still sign up here to receive yours!
  • 📻 Dec 28: Wrapping up 2020, Ines joined the Python Year in Review episode of Talk Python to talk about what the year had in store for 2020, and what to expect for 2021.

With the community and the team continuing to grow, we look forward to making 2021 even better. Thanks for all your support!

  • About the author

    Matthew Honnibal

    Matthew is a leading expert in AI technology. He completed his PhD in 2009, and spent a further 5 years publishing research on state-of-the-art NLP systems. He left academia in 2014 to write spaCy and found Explosion.

  • About the author

    Ines Montani

    Ines is a co-founder of Explosion and a core developer of the spaCy NLP library and the Prodigy annotation tool. She has helped set a new standard for user experience in developer tools for AI engineers and researchers.

  • About the author

    Walter Henry

    Walter is a journalist with a background in project and event management. At Explosion, he's in charge of personnel, communications and operations, including organizing corporate trainings and our spaCy IRL events.