This is a guest post by Wah Loon Keng, the
author of spacy-nlp, a client that
exposes spaCy’s NLP text parsing to Node.js (and other
languages) via Socket.IO.
Natural Language Processing and other AI technologies promise to let us build
applications that offer smarter, more context-aware user experiences. However,
an application that’s almost smart is often very, very dumb. In this tutorial,
I’ll show you how to set up a better brain for your applications — a Contextual
Knowledge Base Graph.
Applications feel particularly stupid when they make mistakes that a human never
would, but which a human can sort of understand. These mistakes reveal how crude
the system’s actual logic is, and the illusion that you’re “talking” to
something “intelligent” shatters.
To avoid these mistakes, we’d like our application to have a way to remember
what the user has told it. We need to store these memories in a structured way —
we want information we can act on, not just text we can search. In this post,
I’ll show you how to start wiring up a solution to this problem, using free
open-source technologies. Here’s a sneak preview of what we’re building:
I call this memory storage mechanism a
Contextual Knowledge Base Graph (CKBG).
The graph is contextual so that a query can automatically resolve into ground
knowledge as a path in the graph.
For example, a query “call John” will invoke the function “call”, and a context
“John”. “Call” knows that a phone number is needed, and “John” has one
(otherwise it can ask and remember the answer). So, the subgraph
(John)->(phone number) is selected and passed to “call” to execute the
Before we can resolve these queries, we first have to build the CGKB. We want
the knowledge base to be populated automatically. We don’t want the knowledge to
be hard-coded by humans. Instead, the brain should learn by itself. So let’s
start setting it up.
Before you start, make sure you have the latest versions of Python and Node
installed. We then install spaCy and
Socket.IO using the Python package manager pip. We also
have to download spaCy’s statistical models (about 1GB of data).
Next, we need to install Neo4j, the graph database for our
brain, with built in visualizer in the browser:
For the bot interface, install AIVA, my
open-source framework for cross-plattform bot development. Fork
the repo and clone your fork locally:
Start Neo4j and log in for the first time at http:∕∕localhost:7474
with default name and password neo4j, neo4j. It will ask you to change the
password, use 0000 for this demo.
The next step will depend on which platform you want to run your bot on. You can
use AIVA on Slack, Telegram or Facebook. I prefer using
Slack, as it’s generally easier. All you have to do is sign
into your Slack account,
create a bot user, get the Slack token
and update config/default.json in your AIVA installation.
Finally, we can start the bot, and wait for it to be ready.
You’re done! Now, go on Slack and talk to your bot. It should parse your input
into its brain. To see its brain, go to the Neo4j interface at
http://localhost:7474 and do a query MATCH (u) RETURN u.
The demo essentially shows the syntactic dependency parse tree of your latest
input. The NLP backend is powered by the node module
spacy-nlp that connects to spaCy. It draws
inspiration from displaCy, spaCy’s interactive dependency
If you click on a graph node, you will see the parsed NLP information from
In the terminal during debug mode, you can also see that information in JSON as
returned from spaCy to the bot, which then inserts it into the brain.
How can this information be used in an application? Let’s say we are writing a
flight-booking app. We can see that “Book” is a verb, i.e. an action to execute.
“New York” is a named Entity of type "GPE", i.e. a location. It also modifies
“from”, so I know this is the origin. Likewise, “London” is the destination.
Finally, we know the “flight” is for “Sunday”, which is tagged as a "DATE".
If you have questions, or wish to collaborate (especially on the graph brain
CGKB), reach out to me at my Twitter @kengzwl.