spaCy Natural Language Processing: From Beginner to AdvancedGuan Wang, Xiaoquan Kong (2024)The first Chinese-language book on spaCy for beginners and experienced practitioners, covering traditional NLP techniques and how to leverage LLMs for various NLP tasks.
spaCy Chunks v0.0.2spaCy extension and pipeline component for generating overlapping chunks of sentences or tokens from a document.
Getting Started with NLP and spaCyTalkPython CourseThere is a lot of text data out there and maybe you're interested in getting structured data out of it. There are a lot of options out there and this course will introduce you to the field by focussing on spaCy while also exploring other tools.
Constructing a knowledge base with spaCy and spacy-llmMantisNLP BlogThis blog post shows how to use spaCy and LLMs to extract entities and relationships from text and quickly tackle the complex problem of constructing a knowledge base graph from a corpus.
Introducing Prodigy-HFHugging Face BlogLast week, Explosion introduced Prodigy-HF, a new Prodigy plugin offering code recipes that directly integrate with the Hugging Face stack.
How to Host Your Own API of Open Language Models For FreePowered by Explosion’s curated-transformers, FastAPI and ngrok.
Applied Language TechnologyExtensive online course on applied language technology with spaCy by Tuomo Hiippala, designed for students new to NLP and programming.
Welcome spaCy to the Hugging Face HubHugging Face BlogHugging Face makes it really easy to share your spaCy pipelines with the community! With a single command, you can upload any pipeline package, with a pretty model card and all required metadata auto-generated for you.
Millennials Kill EverythingThe PuddingAnalysis on media reporting of millenials using spaCy. From napkins to marriage to Applebees, just looking at headlines you’d guess that for the past decade the millennial generation’s been on a rampage.
Distill Your LLMs and Surpass Their PerformanceInfoQ MagazineIn her presentation at InfoQ Dev Summit, Ines Montani provided the audience with practical solutions for using the latest state-of-the-art models in real-world applications and distilling their knowledge into smaller and faster components.
The AI Revolution Will Not Be MonopolizedInfoQOpen-source initiatives are pivotal in democratizing AI technology, offering transparent, extensible tools that empower users. Daniel Dominguez summarizes the key takeaways from Ines’ recent talk for InfoQ.
Economies of Scale Can’t Monopolise the AI RevolutionInfoQ MagazineDuring her presentation at QCon London, Ines Montani stated that economies of scale are not enough to create monopolies in the AI space and that open-source techniques and models will allow everybody to keep up with the “Gen AI revolution”.
KAZU v1.5A biomedical NLP framework designed to handle production workloads, built by AstraZeneca and Korea University and using spaCy under the hood.
SpanCat with spaCy and Prodigy on real dataYouTube series by WJB Mattingly showing an end-to-end project, from cultivating and annotating data to training, testing and visualizing a model.
When Women Make HeadlinesThe PuddingUsing spaCy and other packages from the NLP ecosystem for analyzing more than 382,000 headlines to see how women are represented (or misrepresented) in the news.
How We Found Pricey Provisions in New Jersey Police ContractsProPublicaProPublica and the Asbury Park Press scoured hundreds of police union agreements for details on publicly funded payouts to cops, using spaCy under the hood.
Combining the Best of Two Worlds: From TF-IDF to Llama LLMOpen Source Summit EuropeTalk by William Arias, Staff Developer Advocate at GitLab, on combining traditional NLP techniques and LLMs to solve hallucination issues and create robust spaCy applications.
Simply Simplify LanguageInteractive app by the Canton of Zurich, Switzerland, using LLMs and spaCy to analyze and simplify institutional communication and make bureaucratic German more inclusive.
spaCyEx v0.0.2Extension for spaCy’s powerful, linguistically-aware pattern matching that introduces a RegEx-like syntax.
Microsoft Presidio v2.2.352Context aware, pluggable and customizable PII de-identification and anonymization service for text and images, featuring a spaCy back-end.
scispacy v0.5.3A Python package containing spaCy models for processing biomedical, scientific or clinical text, developed by AI2.
textaCy v0.13.0Utility library for NLP tasks before and after spaCy, including preprocessing, normalization and additional information extraction features.
How we built a Stack Overflow Community questions analyzerGitLab BlogHow GitLab used spaCy to analyze and better understand Stack Overflow community questions about their tools and products.
Talking sense: using machine learning to understand quotesThe Guardian BlogHow the Guardian uses spaCy and Prodigy to train a machine learning model that helps extract quotes from news articles and match them to the correct source.
How We Analyzed Google’s Search ResultsThe MarkupUsing the Prodigy annotation tool, we created a user interface and a coder manual for two annotators to spot-check 741 stained images randomly sampled from our dataset.
What 1.2 million parliamentary speeches can teach us about gender representationThe PuddingAnalysis of parliamentary speeches using spaCy.
Szczecin stolicą programowaniaTVP3 SzczecinNews segment about EuroSciPy 2024 on local Polish television, featuring Ines’ talk and interviews with the organizers.
ZenML v0.58.0New out-of-the-box Prodigy integration in ZenML for LLMs and beyond, to make data development and annotation a core part of your MLOps lifecycle.
Zero-Shot NER with GliNER and spaCy Python Tutorials for Digital HumanitiesTutorial by WJB Mattingly on how to integrate the generalist GLiNER model for Named Entity Recognition with spaCy's versatile NLP environment.
Who said what: using machine learning to correctly attribute quotesThe Guardian Engineering BlogHow the Guardian uses spaCy and Prodigy to train a custom coreference resolution model.
MP Interests Tracker: Utilising GenAI to uncover insights in the UK Register of Financial InterestJournalismAI BlogProject from teams at The Times and BBC using spacy-llm to make complex financial interests data more accessible.
The Nesta Skills Extractor LibraryEconomic Statistics Centre of ExcellenceA new library for extracting skills from job adverts and mapping them to a taxonomy of your choice, built on top of spaCy.
Reproducible spaCy NLP Experiments with Weights & BiasesWeights & Biases BlogThis tutorial will show how to add Weights & Biases to any spaCy NLP project to track your experiments, save model checkpoints, and version your datasets.
The Physical Traits that Define Men and Women in LiteratureThe PuddingAnalysis of physical traits most tied to gender in literature using spaCy.
More than a Million Pro-Repeal Net Neutrality Comments were Likely FakedHackernoonAnalysis of net neutrality comments by Jeff Kao using spaCy for word vectors.