Go to top

Introducing spaCy v3.5

We’re excited to release v3.5 of the spaCy Natural Language Processing library. spaCy v3.5 introduces three new CLI commands, adds fuzzy matching, provides improvements to our entity linking functionality, and includes a range of language updates and bug fixes.

New CLI commands

  • apply applies a pipeline to one or more .txt, .jsonl or .spacy files
  • benchmark speed profiles a pipeline’s speed with a warmup and a confidence interval
  • find-threshold tests a range of threshold values for spancat, textcat_multilabel, etc, to identify the most optimal one.

Examples on how to run these commands can be found in our CLI documentation as well as in our v3.5 usage notes.

Fuzzy matching

The new FUZZY operator allows fuzzy matches based on Levenshtein edit distance:

pattern = [{"LOWER": {"FUZZY": "definitely"}}]

The FUZZY and REGEX operators are now also supported for lists with IN and NOT_IN:

pattern = [{"TEXT": {"REGEX": {"NOT_IN": ["^awe(some)?$", "^wonder(ful)?"]}}}]

Entity linking

The entity linker’s knowledge base has been refactored for easier customization. KnowledgeBase is now an abstract class and the default implementation is the new class InMemoryLookupKB.

Read more about all the improvements, updates and bug fixes:

New additions to spaCy universe and projects

Many cool new plugins, extensions, pipelines and tutorials have been added to the spaCy universe and spaCy projects since v3.4:

BERTopicLeveraging BERT and c-TF-IDF to create easily interpretable topics.
concepCyA multilingual knowledge graph in spaCy.
greCyTrained Ancient Greek models for use in spaCy.
English Interpretation Sentence PatternEnglish interpretation for accurate translation from English to Japanese.
spaCy - Partial TaggerSequence tagger for partially annotated datasets in spaCy.
spacy-cleanerEasily clean text with spaCy.
spaCy-PyThaiNLPAdd Thai support for spaCy.
Speedster pipeline accelerationNamed Entity Recognition (WikiNER) accelerated using Speedster.
ZshotZero and Few shot named entity & relationships recognition.
View the spaCy universe

Additionally, the spaCy team has added demo projects for two newer components:

experimental/corefUse the new experimental coref component to train a coreference model using OntoNotes.
pipelines/spancat_demoA minimal demo spancat project.

Resources