Introducing spaCy v3.5

· by the spaCy team · ~4 min. read

We’re excited to release v3.5 of the spaCy Natural Language Processing library. spaCy v3.5 introduces three new CLI commands, adds fuzzy matching, provides improvements to our entity linking functionality, and includes a range of language updates and bug fixes.

New CLI commands

  • apply applies a pipeline to one or more .txt, .jsonl or .spacy files
  • benchmark speed profiles a pipeline’s speed with a warmup and a confidence interval
  • find-threshold tests a range of threshold values for spancat, textcat_multilabel, etc, to identify the most optimal one.

Examples on how to run these commands can be found in our CLI documentation as well as in our v3.5 usage notes.

Fuzzy matching

The new FUZZY operator allows fuzzy matches based on Levenshtein edit distance:

pattern = [{"LOWER": {"FUZZY": "definitely"}}]

The FUZZY and REGEX operators are now also supported for lists with IN and NOT_IN:

pattern = [{"TEXT": {"REGEX": {"NOT_IN": ["^awe(some)?$", "^wonder(ful)?"]}}}]

Entity linking

The entity linker’s knowledge base has been refactored for easier customization. KnowledgeBase is now an abstract class and the default implementation is the new class InMemoryLookupKB.

Read more about all the improvements, updates and bug fixes:

New additions to spaCy universe and projects

Many cool new plugins, extensions, pipelines and tutorials have been added to the spaCy universe and spaCy projects since v3.4:

BERTopicLeveraging BERT and c-TF-IDF to create easily interpretable topics.
concepCyA multilingual knowledge graph in spaCy.
greCyTrained Ancient Greek models for use in spaCy.
English Interpretation Sentence PatternEnglish interpretation for accurate translation from English to Japanese.
spaCy - Partial TaggerSequence tagger for partially annotated datasets in spaCy.
spacy-cleanerEasily clean text with spaCy.
spaCy-PyThaiNLPAdd Thai support for spaCy.
Speedster pipeline accelerationNamed Entity Recognition (WikiNER) accelerated using Speedster.
ZshotZero and Few shot named entity & relationships recognition.

View the spaCy universe

Additionally, the spaCy team has added demo projects for two newer components:

experimental/corefUse the new experimental coref component to train a coreference model using OntoNotes.
pipelines/spancat_demoA minimal demo spancat project.

Resources

About the authors

  • Matthew Honnibal

    Matthew Honnibal CTO, Founder

  • Ines Montani

    Ines Montani CEO, Founder

  • Sofie Van Landeghem

    Sofie Van Landeghem Machine Learning Engineer, spaCy Lead

  • Adriane Boyd

    Adriane Boyd Machine Learning Engineer

  • Paul O’Leary McCann

    Paul O’Leary McCann Machine Learning Engineer

  • Edward Schmuhl

    Edward Schmuhl Machine Learning Engineer

  • Raphael Mitsch

    Raphael Mitsch Machine Learning Engineer

  • Daniël de Kok

    Daniël de Kok Machine Learning Engineer

  • Madeesh Kannan

    Madeesh Kannan Machine Learning Engineer

  • Richard Hudson

    Richard Hudson Machine Learning Engineer

  • Lj Miranda

    Lj Miranda Machine Learning Engineer

  • Peter Baumgartner

    Peter Baumgartner Machine Learning Engineer

  • Victoria Slocum

    Victoria Slocum Developer Advocate