We’re excited to release v3.6 of the spaCy Natural Language Processing library. spaCy v3.6 adds the span finder component to the core spaCy library and introduces trained pipelines for Slovenian.
SpanFinder component identifies
potentially overlapping, unlabeled spans by identifying span start and end
tokens. It is intended for use in combination with a component like
SpanCategorizer that may further
filter or label the spans. See our
Spancat blog post for a more
detailed introduction to the span finder design.
To train a pipeline with
span_finder (and its
transformer if required) to
spancat component can be trained directly from its predictions:
[nlp] pipeline = ["tok2vec","span_finder","spancat"] [training] annotating_components = ["tok2vec","span_finder"]
- Initial support for Malay.
- Support for noun chunks and other updates for Latin.
v3.6 introduces new pipelines for Slovenian, which use the trainable lemmatizer and floret vectors.
New Trained Pipelines
|Package||UPOS||Parser LAS||NER F|
The English pipelines have been updated to improve handling of contractions with various apostrophes and to lemmatize “get” as a passive auxiliary.
Many cool new plugins, extensions and pipelines have been added to the spaCy universe since v3.5:
|LatinCy||Synthetic trained spaCy pipelines for Latin NLP.|
|parsigs||Structuring prescriptions text made simple using spaCy.|
|Sentimental Onix||Use onnx for sentiment models.|
|spaCysee||Visualize spaCy’s Dependency Parsing, POS tagging, and morphological analysis.|
|spaCy-SetFit||An an easy and intuitive approach to use SetFit in combination with spaCy.|
|spaCy Visual Studio Code Extension||Work with spaCy’s config files in VS Code.|
|spacy-wasm||spaCy in the browser using WebAssembly.|
|SpanMarker||Effortless state-of-the-art NER in spaCy.|
|Vetiver||Version, share, deploy, and monitor models.|