sense2vec: Semantic Analysis of the Reddit Hivemind

Our neural network read every comment posted to Reddit in 2015, and built a semantic map using word2vec and spaCy. Try searching for a phrase that's more than the sum of its parts to see what the model thinks it means. Try your favourite band, slang words, technical things, or something totally random.

{{ result.text }}{{ result.score }}%
Nothing found.

How does this work?

We used spaCy to tag and parse every comment posted to Reddit in 2015, and fed the results to Gensim's word2vec implementation. Using the search above, you can get a lot of interesting insights into the Reddit hivemind. See what spaCy and Gensim think Reddit thinks about almost anything.

Read the blog post

Try sense2vec

The sense2vec library is a simple Python implementation for loading and querying sense2vec models. While it's best used in combination with spaCy, it can also be run as a standalone module.

import sense2vec
s2v = sense2vec.load('reddit_vectors-1.1.0')
freq, vector = s2v[u'natural_language_processing|NOUN']
most_similar = s2v.most_similar(vector, n=10)