The year 2021 is coming to an end, and like the previous year, it was shaped by unique challenges that impacted our work together. For Explosion, it was a very productive year. We found an investor that fits our strategy, we released spaCy v3, the work on Prodigy Teams is in full swing, and the team has grown a lot. So here’s our look back at our highlights of the year 2021.
- 🏫 Jan 19: The new year started with the Portuguese translation of our free spaCy online course: PLN avançado com spaCy. Special thanks to Cristiana Straccialana Parada.
- 📺 Jan 22: Ines was invited as a guest to the TalkPython podcast and discussed how to build a data science startup.
- 💫 Feb 1: We kicked off February with the big release of spaCy v3.0, which features new transformer-based pipelines that get spaCy’s accuracy right up to the current state-of-the-art, and a new workflow system to help you take projects from prototype to production. If you’re interested in what spaCy v3 is all about, check out our video, where Ines and Matt guide you through some of the most exciting new features!
- 🪐 Feb 1: As an add-on to our spaCy v3 release, we published spaCy projects, which allow managing end-to-end spaCy workflows for different use cases and domains.
- 📺 Feb 1: Along with spaCy v3, Ines published the behind the scene spaCy v3 design concepts video.
- 📺 Feb 1: Sofie celebrated the release of spaCy v3 with her tutorial on implementing a trainable entity relation extraction component in spaCy v3.
- 📺 Feb 3: At the Contributing.Today meetup with Guido van Rossum, Sofie presented the new spaCy v3 features.
- 💫 Mar 4: We released 1.0 of our spaCy and Stanza package, which allows you to use the latest Stanza (StanfordNLP) research models directly in spaCy.
- 📺 Mar 17: March saw a new episode of Vincent Warmerdam’s “Intro to NLP with spaCy” series. In this episode, Vincent showcased the project system of spaCy v3.
- 📺 Mar 29: Ines joined the at the German Python Podcast to talk about Natural Language Processing with spaCy.
- 🥳 Mar 30: At the end of March, we celebrated that spaCy reached 20k+ stars on GitHub.
- 📺 Apr 22: Ines was invited as a guest on Microsoft’s A bit of AI show to talk about her journey into AI.
- 📺 Apr 30: Later that month, Ines joined the Snorkel Science Talks. She discussed her path into machine learning, fundamental design decisions behind spaCy, and the importance of bringing together different stakeholders in the machine learning development process.
- 📺 Jun 4: At the beginning of June, Ines and Matt gave a talk at the Bay Area NLP group, one of the largest NLP communities in the world.
- 📺 Jun 10: Ines gave a keynote at the Teaching NLP Workshop at NAACL-2021. The keynote was followed by a QA with Ines and Matt.
- 📺 Jun 14: Ines and Sebastian were interviewed at PyFest and talked about open source projects, working together, spaCy and FastAPI.
- 📺 Jun 17: At Rasa’s Level 3 AI Assistant conference, Ines presented the “Applied NLP Thinking” talk.
- ✍️ Jun 19: Ines published the a blog post version of “Applied NLP Thinking” and how to translate complex business problems into machine learning solutions.
- 💫 Jul 7: We released spaCy v3.1,
which allows using predicted annotations during training. In addition, the
release includes a
SpanCategorizercomponent for predicting arbitrary and overlapping spans. You can create training data for it using Prodigy’s new annotation UI for overlapping spans.
- 🤗 Jul 13:
Hugging Face welcomed spaCy to their Hub.
You can now upload any spaCy pipeline using the
spacy-huggingface-hubCLI, with auto-generated pretty READMEs and a interactive visualizers to try your pipeline in the browser.
- 🥳 Jul 14: Sofie became team lead for spaCy.
- ⚙ Aug 12: We’ve partnered with Weights & Biases and the tracking of reproducible spaCy NLP pipelines became even easier.
- ✨ Aug 12: We released Prodigy v1.11, which includes a bunch of new features, including a new installation process via pip and new wheels for Python 3.9 and ARM architectures, a new recipe and UI for annotating overlapping and nested spans, new recipes for improving a sentence recognizer model, further training and data export recipes that seamlessly integrate with spaCy’s config system.
- 📺 Aug 17: Ines was live on the radio and joined the Byte Into IT show on Melbourne’s Triple R radio station.
- 💥 Sep 2: A big moment for us – we sold 5% of Explosion. Since founding Explosion in 2016, we’ve run the company as a profitable business. Our next step is Prodigy Teams, and doing this project well is much more important to us than doing it cheaply, so we decided to consider an external investment. With SignalFire, we found an investor that fits our strategy.
- 💫 Nov 5: We released spaCy v3.2,
which improved performance for spaCy on Apple M1 and Nvidia GPU, added
Docinput for pipelines, and provided registered scoring functions.
- 🍏 Nov 5: Along with the new spaCy 3.2, we published our
thinc-apple-opspackage to accelerate spaCy on macOS by calling into Apple’s native “Accelerate” library.
- 🌸 Nov 5: We also presented Adriane’s recent work on our new
floretlibrary, which uses fastText and Bloom embeddings for compact, full-coverage vectors with spaCy.
- 🌳 Nov 17: In mid-November, Daniël presented our new experimental machine learning-based lemmatizer that posts accuracies above 95% for many languages.
- ✍️ Nov 17: Our machine learning engineer Lj Miranda published a detailed technical overview on using spaCy’s project config system, traversing our stack in increasing levels of abstraction.
- 🛡️ Nov 17: The Guardian wrote about how their data science team used spaCy and Prodigy to train a machine learning model that helps extract quotes from news articles and match them to the correct source.
- 📺 Nov 30 Weights & Bias hosted an AMA with Ines, where she talked about software development, Python, startups and product building.
- ✍️ Dec 8: For KDNuggets, Ines shared her perspective on AI and Machine Learning developments in 2021 and key trends for 2022.
- 🏫 Dec 9: We’ve updated our interactive NLP course for spaCy v3! The updated course is available in English, Spanish, German, and Japanese. More languages will follow.
- ✍️ Dec 14: To demonstrate the performance of spaCy v3.2, Adriane compiled a series of UD benchmarks comparable to the Stanza and Trankit evaluations on Universal Dependencies v2.5.
- 🦑 Dec 15: Our machine learning engineer Edward published his blog post on Healthsea, an end-to-end spaCy pipeline to analyze user reviews to supplementary products and extract their potential effects on health.
- 📺 Dec 17: Ines was invited as a guest on the TalkPython podcast to discuss machine learning ethics and EU laws.