We’re excited to release v3.6 of the spaCy Natural Language Processing library. spaCy v3.6 adds the span finder component to the core spaCy library and introduces trained pipelines for Slovenian.
SpanFinder component
The SpanFinder
component identifies
potentially overlapping, unlabeled spans by identifying span start and end
tokens. It is intended for use in combination with a component like
SpanCategorizer
that may further
filter or label the spans. See our
Spancat blog post for a more
detailed introduction to the span finder design.
To train a pipeline with span_finder
+ spancat
, add span_finder
(and its
tok2vec
or transformer
if required) to [training.annotating_components]
so
that the spancat
component can be trained directly from its predictions:
[nlp]pipeline = ["tok2vec","span_finder","spancat"]
[training]annotating_components = ["tok2vec","span_finder"]
Language updates
- Initial support for Malay.
- Support for noun chunks and other updates for Latin.
Trained pipelines
New trained pipelines
v3.6 introduces new pipelines for Slovenian, which use the trainable lemmatizer and floret vectors.
Package | UPOS | Parser LAS | NER F |
---|---|---|---|
sl_core_news_sm | 96.9 | 82.1 | 62.9 |
sl_core_news_md | 97.6 | 84.3 | 73.5 |
sl_core_news_lg | 97.7 | 84.3 | 79.0 |
sl_core_news_trf | 99.0 | 91.7 | 90.0 |
Pipeline updates
The English pipelines have been updated to improve handling of contractions with various apostrophes and to lemmatize “get” as a passive auxiliary.
New additions to spaCy universe
Many cool new plugins, extensions and pipelines have been added to the spaCy universe since v3.5:
LatinCy | Synthetic trained spaCy pipelines for Latin NLP. |
parsigs | Structuring prescriptions text made simple using spaCy. |
Sentimental Onix | Use onnx for sentiment models. |
spaCysee | Visualize spaCy’s Dependency Parsing, POS tagging, and morphological analysis. |
spaCy-SetFit | An an easy and intuitive approach to use SetFit in combination with spaCy. |
spaCy Visual Studio Code Extension | Work with spaCy’s config files in VS Code. |
spacy-wasm | spaCy in the browser using WebAssembly. |
SpanMarker | Effortless state-of-the-art NER in spaCy. |
Vetiver | Version, share, deploy, and monitor models. |