We’re excited to release v3.5 of the spaCy Natural Language Processing library. spaCy v3.5 introduces three new CLI commands, adds fuzzy matching, provides improvements to our entity linking functionality, and includes a range of language updates and bug fixes.
New CLI commands
apply
applies a pipeline to one or more.txt
,.jsonl
or.spacy
filesbenchmark speed
profiles a pipeline’s speed with a warmup and a confidence intervalfind-threshold
tests a range of threshold values forspancat
,textcat_multilabel
, etc, to identify the most optimal one.
Examples on how to run these commands can be found in our CLI documentation as well as in our v3.5 usage notes.
Fuzzy matching
The new FUZZY
operator allows
fuzzy matches based on
Levenshtein edit distance:
pattern = [{"LOWER": {"FUZZY": "definitely"}}]
The FUZZY
and REGEX
operators are now also supported for lists with IN
and
NOT_IN
:
pattern = [{"TEXT": {"REGEX": {"NOT_IN": ["^awe(some)?$", "^wonder(ful)?"]}}}]
Entity linking
The entity linker’s knowledge base has been refactored for easier customization.
KnowledgeBase
is now an abstract class and the
default implementation is the new class
InMemoryLookupKB
.
New additions to spaCy universe and projects
Many cool new plugins, extensions, pipelines and tutorials have been added to the spaCy universe and spaCy projects since v3.4:
BERTopic | Leveraging BERT and c-TF-IDF to create easily interpretable topics. |
concepCy | A multilingual knowledge graph in spaCy. |
greCy | Trained Ancient Greek models for use in spaCy. |
English Interpretation Sentence Pattern | English interpretation for accurate translation from English to Japanese. |
spaCy - Partial Tagger | Sequence tagger for partially annotated datasets in spaCy. |
spacy-cleaner | Easily clean text with spaCy. |
spaCy-PyThaiNLP | Add Thai support for spaCy. |
Speedster pipeline acceleration | Named Entity Recognition (WikiNER) accelerated using Speedster. |
Zshot | Zero and Few shot named entity & relationships recognition. |
Additionally, the spaCy team has added demo projects for two newer components:
experimental/coref | Use the new experimental coref component to train a coreference model using OntoNotes. |
pipelines/spancat_demo | A minimal demo spancat project. |