The best AI products won't be
built from boxed solutions.

Custom Artificial Intelligence and Natural Language Processing.
We like projects that are easier said than done.

Learn more

Industry Survey

The State of AI

Machine learning is changing the world by greatly extending the range of problems that software can solve. The web is full of predictions about where we might be headed, but where are we now? What’s already here and what’s coming soon? Who’s building these systems and how? The results will be 100% open source for everyone to access and analyze.

Open Source Library

spaCy: Industrial-Strength Natural Language Processing for Python

spaCy helps you write programs that do clever things with text. You give spaCy a string of characters, it gives you an object that provides multiple useful views of its meaning and linguistic structure. Specifically, spaCy features a high performance tokenizer, part-of-speech tagger, named entity recognizer and syntactic dependency parser, with built-in support for word vectors. All of the functionality is united behind a clean high-level Python API, that makes it easy to use the different annotations together.

Open Source Visualiser

displaCy.js: An open-source NLP visualiser for the modern web

displaCy is a modern and service-independent visualisation library. We hope this makes it easy to compare different services, and explore your own in-house models. If you're using spaCy's syntactic parser, displaCy should be part of your regular workflow. Because spaCy's parser is statistical, it's often hard to predict how it will analyse a given sentence. Using displaCy, you can simply try and see. You can also share the page for discussion with your team, or save the SVG to use elsewhere. If you're developing your own model, you can run the service yourself — it's 100% open source.

Open Source Visualiser

displaCy ENT: A modern named entity visualiser

Data exploration is an important part of effective named entity recognition because systems often make common unexpected errors that are easily fixed once identified. Despite the apparent simplicity of the task, automatic named entity recognition systems still make many errors, unless trained on examples closely tailored to the use-case.

Open Source Demo

sense2vec: Semantic Analysis of the Reddit Hivemind

This project demonstrates a powerful and scalable approach to text mining, using our open-source library spaCy. We used spaCy to tag and parse every comment posted to Reddit in 2015, and fed the results to Gensim's word2vec implementation. Using the search, you can get a lot of interesting insights into the Reddit hivemind. See what a syntax-sensitive distributional similarity model thinks Reddit thinks about almost anything.

Demo & Visualisation

Sentence Similarity

We trained two deep neural networks on the Quora and StackExchange duplicate question data sets, and one baseline model using spaCy's similarity method. Type two questions or sentences and see how similar the models think they are. The baseline model computes a vector average using word vectors from the GloVe common crawl model. The Quora and StackExchange models use siamese convolutional neural networks to reflect the symmetric relationship of the question pairs.