Introducing spaCy v3.4

· by the spaCy team · ~3 min. read

We’re pleased to publish v3.4 of the spaCy Natural Language Processing library. spaCy v3.4 brings typing and speed improvements along with new vectors for English pipelines and new trained pipelines for Croatian. This release also includes prebuilt linux aarch64 wheels for all spaCy dependencies distributed by Explosion.

Typing improvements

spaCy v3.4 supports pydantic v1.9 and mypy 0.950+ through extensive updates to types in Thinc v8.1.

Speed improvements

  • For the parser, use C saxpy/sgemm provided by the Ops implementation in order to use Accelerate through thinc-apple-ops.
  • Improved speed of vector lookups.
  • Improved speed for Example.get_aligned_parse and Example.get_aligned.

Trained pipelines

New trained pipelines

v3.4 introduces new CPU/CNN pipelines for Croatian, which use the trainable lemmatizer and floret vectors. Due to the use of Bloom embeddings and subwords, the pipelines have compact vectors with no out-of-vocabulary words.

New Trained Pipelines

PackageUPOSParser LASNER F
hr_core_news_sm96.677.576.1
hr_core_news_md97.380.181.8
hr_core_news_lg97.580.483.0

Pipeline updates

All CNN pipelines have been extended with whitespace augmentation.

The English CNN pipelines have new word vectors, which improve the NER performance and update the vectors with words like “AirTags”, “Brexit”, “covid” and “doomscrolling”:

New English Vectors

PackageModel VersionTAGParser LASNER F
en_core_web_mdv3.3.097.390.184.6
en_core_web_mdv3.4.097.290.385.5
en_core_web_lgv3.3.097.490.185.3
en_core_web_lgv3.4.097.390.285.6

New in the spaCy universe

Many cool new plugins, extensions, pipelines and tutorials have been added to the spaCy universe since v3.3:

Aim-spacyAn Aim-based spaCy experiment tracker.
AsentFast, flexible and transparent sentiment analysis.
spaCy fishingNamed entity disambiguation and linking on Wikidata in spaCy with Entity-Fishing.
spacy-reportGenerates interactive reports for spaCy models.

View the spaCy universe

Resources

About the authors

  • Matthew Honnibal

    Matthew Honnibal CTO, Founder

  • Ines Montani

    Ines Montani CEO, Founder

  • Sofie Van Landeghem

    Sofie Van Landeghem Machine Learning Engineer, spaCy Lead

  • Adriane Boyd

    Adriane Boyd Machine Learning Engineer

  • Paul O’Leary McCann

    Paul O’Leary McCann Machine Learning Engineer

  • Daniël de Kok

    Daniël de Kok Machine Learning Engineer

  • Edward Schmuhl

    Edward Schmuhl Machine Learning Engineer

  • Lj Miranda

    Lj Miranda Machine Learning Engineer

  • Philip Vollet

    Philip Vollet Head of Developer Relations

  • Peter Baumgartner

    Peter Baumgartner Machine Learning Engineer

  • Richard Hudson

    Richard Hudson Machine Learning Engineer

  • Vincent D. Warmerdam

    Vincent D. Warmerdam Machine Learning Engineer

  • Madeesh Kannan

    Madeesh Kannan Machine Learning Engineer

  • Raphael Mitsch

    Raphael Mitsch Machine Learning Engineer