© Kemal Şanlı

Building your bot’s brain with Node.js and spaCy

by Wah Loon Keng on

This is a guest post by Wah Loon Keng, the author of spacy-nlp, a client that exposes spaCy's NLP text parsing to Node.js (and other languages) via Socket.IO.

Natural Language Processing and other AI technologies promise to let us build applications that offer smarter, more context-aware user experiences. However, an application that's almost smart is often very, very dumb. In this tutorial, I'll show you how to set up a better brain for your applications — a Contextual Knowledge Base Graph.

Applications feel particularly stupid when they make mistakes that a human never would, but which a human can sort of understand. These mistakes reveal how crude the system's actual logic is, and the illusion that you're "talking" to something "intelligent" shatters.

Sources: Funnyjunk, @Summerson

To avoid these mistakes, we'd like our application to have a way to remember what the user has told it. We need to store these memories in a structured way — we want information we can act on, not just text we can search. In this post, I'll show you how to start wiring up a solution to this problem, using free open-source technologies. Here's a sneak preview of what we're building:

Example of Contextual Graph Knowledge Base

I call this memory storage mechanism a Contextual Knowledge Base Graph (CKBG). The graph is contextual so that a query can automatically resolve into ground knowledge as a path in the graph.

For example, a query "call John" will invoke the function "call", and a context "John". "Call" knows that a phone number is needed, and "John" has one (otherwise it can ask and remember the answer). So, the subgraph (John)->(phone number) is selected and passed to "call" to execute the function.

Before we can resolve these queries, we first have to build the CGKB. We want the knowledge base to be populated automatically. We don't want the knowledge to be hard-coded by humans. Instead, the brain should learn by itself. So let's start setting it up.NoteAll code and examples presented in this tutorial are early implementations and still a work in progress.

Before you start, make sure you have the latest versions of Python and Node installed. We then install spaCy and Socket.IO using the Python package manager pip. We also have to download spaCy's statistical models (about 1GB of data).DependenciesSystem: Python, Node, Neo4j
Node modules: spacy-nlp, cgkb
pip modules: socketIO-client, spacy
User interface: AIVA

1. Install spaCy and Socket.IOpip install -U socketIO-client  # for communication with interface
pip install -U spacy
python -m spacy.en.download     # download spaCy's models

Next, we need to install Neo4j, the graph database for our brain, with built in visualizer in the browser:

2. Install Neo4j for Mac or Linuxif which neo4j >/dev/null; then
  echo "Neo4j is already installed"
else
  if [ $(uname) == "Darwin" ]; then
    brew install neo4j
  else
    wget -O - https://debian.neo4j.org/neotechnology.gpg.key | sudo apt-key add -
    echo 'deb http://debian.neo4j.org/repo stable/' | sudo tee /etc/apt/sources.list.d/neo4j.list
    sudo apt-get update
    sudo apt-get -y install neo4j
  fi
fi

For the bot interface, install AIVA, my open-source framework for cross-plattform bot development. Fork the repo and clone your fork locally:

3. Install AIVA interface# forked repo: https://github.com/kengz/aiva
git clone https://github.com/YOURUSERNAME/aiva.git && cd aiva
# checkout to the demo branch of cgkb
git checkout cgkb

# run bin/setup, which checks and sets up bot dependencies including npm install
npm run setup

Start Neo4j and log in for the first time at http:∕∕localhost:7474 with default name and password neo4j, neo4j. It will ask you to change the password, use 0000 for this demo.

4. Start Neo4jneo4j start             # for Mac
service neo4j start     # for Linux

The next step will depend on which platform you want to run your bot on. You can use AIVA on Slack, Telegram or Facebook. I prefer using Slack, as it's generally easier. All you have to do is sign into your Slack account, create a bot user, get the Slack token and update config/default.json in your AIVA installation.

5. Configure your bot{
  "BOTNAME": "NAME OF YOUR BOT",
  "PORTS": {
    "NEO4J": 7476,
    "SOCKETIO": 6466,
    "SLACK": 8345,
    "TELEGRAM": 8443,
    "FB": 8545
  },
  "NGROK_AUTH": null,
  "ADMINS": ["your_chat_account@email.com"],
  "ACTIVATE_IO_CLIENTS": {
    "ruby": false,
    "python3": true
  },
  "ADAPTERS": {
    "SLACK": {
      "ACTIVATE": true,
      "HUBOT_SLACK_TOKEN": "THE TOKEN YOU JUST GOT"
    },
    "TELEGRAM": {
      "ACTIVATE": false,
      "TELEGRAM_TOKEN": "get from bot father https://core.telegram.org/bots#3-how-do-i-create-a-bot",
      "BOTNAME": "your bot name from bot father",
      "WEBHOOK_KEY": "TELEGRAM_WEBHOOK"
    },
    "FB": {
      "ACTIVATE": false,
      "FB_PAGE_ID": "see aiva doc on adapters",
      "FB_APP_ID": "see aiva doc on adapters",
      "FB_APP_SECRET": "see aiva doc on adapters",
      "FB_PAGE_TOKEN": "see aiva doc on adapters",
      "FB_AUTOHEAR": true,
      "WEBHOOK_KEY": "FB_WEBHOOK_BASE",
      "FB_WEBHOOK_BASE": "optional: set a persistent webhook url if you have one on ngrok, since FB takes 10 mins to update it",
      "FB_ROUTE_URL": "/fb"
    }
  },
  "TEST": {
    "HUBOT_SHELL_USER_ID": "ID0000001",
    "HUBOT_SHELL_USER_NAME": "alice"
  }
}

Finally, we can start the bot, and wait for it to be ready.

6. Start the botnpm start --debug

The stdout log should look something like this:

[Sat Oct 22 2016 17:36:12 GMT+0000 (UTC)] INFO Authenticated database successfully
[Sat Oct 22 2016 17:36:14 GMT+0000 (UTC)] DEBUG
Sequelize [Node: 6.7.0, CLI: 2.4.0, ORM: 3.24.3]
...
[Sat Oct 22 2016 17:36:14 GMT+0000 (UTC)] INFO Starting poly-socketio server on port: 6466, expecting 4 IO clients
...
[Sat Oct 22 2016 17:36:18 GMT+0000 (UTC)] INFO Logged in as aiva-dev of Global Hackers
[Sat Oct 22 2016 17:36:18 GMT+0000 (UTC)] INFO Slack client now connected
[Sat Oct 22 2016 17:36:18 GMT+0000 (UTC)] DEBUG Started global js socketIO client for SLACK at 6466
[Sat Oct 22 2016 17:36:19 GMT+0000 (UTC)] DEBUG global-client-js HAMbJ5QstqugAsABAAAC joined, 1 remains
[Sat Oct 22 2016 17:36:26 GMT+0000 (UTC)] DEBUG cgkb-py N4tr885fIOxzEuGTAAAD joined, 0 remains
[Sat Oct 22 2016 17:36:26 GMT+0000 (UTC)] INFO All 4 IO clients have joined

You're done! Now, go on Slack and talk to your bot. It should parse your input into its brain. To see its brain, go to the Neo4j interface at http:∕∕localhost:7474 and do a query MATCH (u) RETURN u.

The demo essentially shows the syntactic dependency parse tree of your latest input. The NLP backend is powered by the node module spacy-nlp that connects to spaCy. It draws inspiration from displaCy, spaCy's interactive dependency visualizer.

If you click on a graph node, you will see the parsed NLP information from spaCy.

In the terminal during debug mode, you can also see that information in JSON as returned from spaCy to the bot, which then inserts it into the brain.

spaCy parse as JSON (Excerpt){
  "word": "Book",
  "lemma": "book",
  "NE": "",
  "POS_fine": "VB",
  "POS_coarse": "VERB",
  "arc": "ROOT",
  "modifiers": [
    {
      "word": "me",
      "lemma": "me",
      "NE": "",
      "POS_fine": "PRP",
      "POS_coarse": "PRON",
      "arc": "dative",
      "modifiers": []
    },
    {
      "word": "flight",
      "lemma": "flight",
      "NE": "",
      "POS_fine": "NN",
      "POS_coarse": "NOUN",
      "arc": "dobj",
      "modifiers": [
        {
          "word": "a",
          "lemma": "a",
          "NE": "",
          "POS_fine": "DT",
          "POS_coarse": "DET",
          "arc": "det",
          "modifiers": []
        },
        {
          "word": "from",
          "lemma": "from",
          "NE": "",
          "POS_fine": "IN",
          "POS_coarse": "ADP",
          "arc": "prep",
          "modifiers": [
            {
              "word": "New York",
              "lemma": "New York",
              "NE": "GPE",
              "POS_fine": "NNP",
              "POS_coarse": "PROPN",
              "arc": "pobj",
              "modifiers": []
            }
          ]
        }
      ]
    }
  ]
}

How can this information be used in an application? Let's say we are writing a flight-booking app. We can see that "Book" is a verb, i.e. an action to execute. "New York" is a Named Entity of type "GPE", i.e. a location. It also modifies "from", so I know this is the origin. Likewise, "London" is the destination. Finally, we know the "flight" is for "Sunday", which is tagged as a "DATE".

Get started!

If you have questions, or wish to collaborate (especially on the graph brain CGKB), reach out to me at my Twitter @kengzwl.

Wah Loon Keng
About the Author

Wah Loon Keng

Developer by day, rock climber by night. Mathematician at heart. Love coffee and AI. Software engineer at Eligible Inc.

Read more