Sentence Similarity

Texts can be "similar" in lots of different ways – they can have similar structure, discuss similar topics, or express similar ideas. Type two sentences to see what result spaCy's doc.similarity() method will produce.

{{ model.score }}%{{ model.name }}

How does this work?

By default, spaCy uses an average-of-vectors algorithm, using pre-trained vectors if available (e.g. the en_core_web_lg model). If not, the doc.tensor attribute is used, which is produced by the tagger, parser and entity recognizer. This is how the en_core_web_sm model provides similarities. Usually the .tensor-based similarities will be more structural, while the word vector similarities will be more topical. You can also customize the .similarity() method, to provide your own similarity function, which can be trained using supervised techniques.

Read more