Update (October 3, 2016)
A syntactic dependency parse is a kind of shallow meaning representation. It's an important piece of many language understanding and text processing technologies. Now that these representations can be computed quickly, and with increasingly high accuracy, they're being used in lots of applications – translation, sentiment analysis, and summarization are major application areas.
I've been living and breathing similar representations for most of my career. But there's always been a problem: talking about these things is tough. Most people haven't thought much about grammatical structure, and the idea of them is inherently abstract. When I left academia to write spaCy, I knew I wanted a good visualizer. Unfortunately, I also knew I'd never be the one to write it. I'm deeply graphically challenged. Fortunately, when working with Ines to build this site, she really nailed the problem, with a solution I'd never have thought of. I really love the result, which we're calling displaCy.
The best alternative is a Java command-line tool that outputs static images, which look like this:
The idea is to use CSS to draw shapes, mostly with border styling, and some arithmetic to figure out the spacing:
The arrow needs only one HTML element,Ines Montani, Developing Displacy
<div class="arrow">and the CSS pseudo-elements
:beforepseudo-element is used for the arc and is essentially a circle (
border-radius: 50%) with a black outline. Since its parent
.arrowis only half its height and set to
overflow: hidden, it’s "cut in half" and ends up looking like a half circle.
To me, this seemed like witchcraft, or a hack at best. But I was quickly won over: if all we do is declare the data and the relationships, in standards-compliant HTML and CSS, then we can simply step back and let the browser do its job. We know the code will be small, the layout will work on a variety of display, and we'll have a ready separation of style and content. For long output, we simply let the graphic overflow, and let users scroll.
What I'm particularly excited about is the potential for displaCy as an annotation tool. It may seem unintuitive at first, but I think it will be much better to annotate texts the way the parser operates, with a small set of actions and a stack, than by selecting arcs directly. Why? A few reasons:
- You're always asked a question. You don't have to decide-what-to-decide.
- The viewport can scroll with the user, making it easier to work with spacious, readable designs.
- With only 4-6 different actions, it's easy to have key-based input.
Efficient manual annotation is incredibly important. If we can get that right, then we can offer you cheap domain adaptation. You give us some text, we get it annotated, and ship you a custom model, that's much more accurate on your data. If you're interested in helping us beta test this idea, get in touch.