Displaying Linguistic Structure with CSS

· by Matthew Honnibal

A syntactic dependency parse is a kind of shallow meaning representation. It's an important piece of many language understanding and text processing technologies. Now that these representations can be computed quickly, and with increasingly high accuracy, they're being used in lots of applications – translation, sentiment analysis, and summarization are major application areas.

I've been living and breathing similar representations for most of my career. But there's always been a problem: talking about these things is tough. Most people haven't thought much about grammatical structure, and the idea of them is inherently abstract. When I left academia to write spaCy, I knew I wanted a good visualizer. Unfortunately, I also knew I'd never be the one to write it. I'm deeply graphically challenged. Fortunately, when working with Ines to build this site, she really nailed the problem, with a solution I'd never have thought of. I really love the result, which we're calling displaCy.

The best alternative is a Java command-line tool that outputs static images, which look like this:

Output of the Brat parse tree visualizer

I find the output of the CMU visualizer basically unreadable. Pretty much all visualizers suffer from this problem: they don't add enough space. I always thought this was a hard problem, and a good Javascript visualizer would need to do something crazy with Canvas. Ines quickly proposed a much better solution, based on native, web-standard technologies.

The idea is to use CSS to draw shapes, mostly with border styling, and some arithmetic to figure out the spacing:

The arrow needs only one HTML element, <div class="arrow"> and the CSS pseudo-elements :before and :after. The :before pseudo-element is used for the arc and is essentially a circle (border-radius: 50%) with a black outline. Since its parent .arrow is only half its height and set to overflow: hidden, it’s "cut in half" and ends up looking like a half circle.

To me, this seemed like witchcraft, or a hack at best. But I was quickly won over: if all we do is declare the data and the relationships, in standards-compliant HTML and CSS, then we can simply step back and let the browser do its job. We know the code will be small, the layout will work on a variety of display, and we'll have a ready separation of style and content. For long output, we simply let the graphic overflow, and let users scroll.

What I'm particularly excited about is the potential for displaCy as an annotation tool. It may seem unintuitive at first, but I think it will be much better to annotate texts the way the parser operates, with a small set of actions and a stack, than by selecting arcs directly. Why? A few reasons:

  • You're always asked a question. You don't have to decide-what-to-decide.
  • The viewport can scroll with the user, making it easier to work with spacious, readable designs.
  • With only 4-6 different actions, it's easy to have key-based input.

Efficient manual annotation is incredibly important. If we can get that right, then we can offer you cheap domain adaptation. You give us some text, we get it annotated, and ship you a custom model, that's much more accurate on your data. If you're interested in helping us beta test this idea, get in touch.

Matthew Honnibal
About the author

Matthew Honnibal

Matthew is a leading expert in AI technology. He completed his PhD in 2009, and spent a further 5 years publishing research on state-of-the-art NLP systems. He left academia in 2014 to write spaCy and found Explosion.