Page: 6 · Explosion · Developer tools and consulting for AI, Machine Learning and NLP

Explosion builds developer tools for AI, Machine Learning and Natural Language Processing. →
Consulting

Project

Topics

Category

Tasks

Authors

Filtered by page: 6

Blackstone v0.1.15

Blackstone v0.1.15

A spaCy pipeline and model for NLP on unstructured legal text

spaCy and Explosion: past, present & future

spaCy and Explosion: past, present & future spaCy IRL 2019

Mark Neumann: ScispaCy: A spaCy pipeline & models for scientific & biomedical text

Mark Neumann: ScispaCy: A spaCy pipeline & models for scientific & biomedical text spaCy IRL 2019

The Brains behind spaCy

The Brains behind spaCy DataHack Radio

✨ prodigy v1.8.0May 20, 2019

Support for spaCy v2.1, basic auth, multi-user sessions, review workflow & more

Introducing spaCy v2.1

Introducing spaCy v2.1

Version 2.1 of the spaCy Natural Language Processing library includes a huge number of features, improvements and bug fixes. In this post, we highlight some of the things we're especially pleased with, and explain some of the most challenging parts of preparing this big release.

Frag deinen Kühlschrank: Wie künstliche Intelligenz die Welt verändert

Frag deinen Kühlschrank: Wie künstliche Intelligenz die Welt verändert ARD alpha Documentary (German)

In this documentation we explore what it feels like to work with intelligent machines. At large research centers and small start-ups we meet people who decide how and what AI learns today. Ines Montani teaches machines to understand the meaning of texts. Even for the young programmer, artificial intelligence is not magic, but a technology that everyone should understand.

How to Ignore Most Startup Advice and Build a Decent Software Business

How to Ignore Most Startup Advice and Build a Decent Software Business EuroPython Keynote

“In this talk, I’m not going to give you one "weird trick" or tell you to ~* just follow your dreams *~. But I’ll share some of the things we’ve learned from building a successful software company around commercial developer tools and our open-source library spaCy.”

Rapid NLP annotation

Rapid NLP annotation Data Science Summit

This talk presents a fast, flexible and even somewhat fun approach to named entity annotation. Using our approach, a model can be trained for a new entity type in only a few hours, starting from only a feed of unannotated text and a handful of seed terms.

Training a new entity type with Prodigy – annotation powered by active learning

Training a new entity type with Prodigy – annotation powered by active learning

In this video, we’ll show you how to use Prodigy to train a phrase recognition system for a new concept. Specifically, we’ll train a model to detect references to drugs, using text from Reddit.

Training an insults classifier with Prodigy in ~1 hour

Training an insults classifier with Prodigy in ~1 hour

In this video, we’ll show you how to use Prodigy to train a classifier to detect disparaging or insulting comments. Prodigy makes text classification particularly powerful, because you can try out new ideas very quickly.

Prodigy: A new tool for radically efficient machine teaching

Prodigy: A new tool for radically efficient machine teaching

Machine learning systems are built from both code and data. It's easy to reuse the code but hard to reuse the data, so building AI mostly means doing annotation. This is good, because the examples are how you program the behaviour – the learner itself is really just a compiler. What's not good is the current technology for creating the examples. That's why we're pleased to introduce Prodigy, a downloadable tool for radically efficient machine teaching.

Deep text-pair classification with Quora's 2017 question dataset

Deep text-pair classification with Quora's 2017 question dataset

Quora recently released the first dataset from their platform: a set of 400,000 question pairs, with annotations indicating whether the questions request the same information. This data set is large, real, and relevant — a rare combination. In this post, I'll explain how to solve text-pair tasks with deep learning, using both new and established tips and technologies.

displaCy.js: An open-source NLP visualizer for the modern web

displaCy.js: An open-source NLP visualizer for the modern web

With new offerings from Google, Microsoft and others, there are now a range of excellent cloud APIs for syntactic dependencies. A key part of these services is the interactive demo, where you enter a sentence and see the resulting annotation. We're pleased to announce the release of displaCy.js, a modern and service-independent visualization library. We hope this makes it easy to compare different services, and explore your own in-house models.

SyntaxNet in context: Understanding Google's new TensorFlow NLP model

SyntaxNet in context: Understanding Google's new TensorFlow NLP model

Yesterday, Google open sourced their Tensorflow-based dependency parsing library, SyntaxNet. The library gives access to a line of neural network parsing models published by Google researchers over the last two years. I've been following this work closely since it was published, and have been looking forward to the software being published. This post tries to provide some context around the release — what's new here, and how important is it?

Sense2vec with spaCy and Gensim

Sense2vec with spaCy and Gensim

If you were doing text analytics in 2015, you were probably using word2vec. Sense2vec (Trask et. al, 2015) is a new twist on word2vec that lets you learn more interesting, detailed and context-sensitive word vectors. This post motivates the idea, explains our implementation, and comes with an interactive demo that we've found surprisingly addictive.

Writing C in Cython

Writing C in Cython

For the last two years, I’ve done almost all of my work in Cython. And I don’t mean, I write Python, and then “Cythonize” it, with various type-declarations et cetera. I just, write Cython. I use "raw" C structs and arrays, and occasionally C++ vectors, with a thin wrapper around malloc/free that I wrote myself. The code is almost always exactly as fast as C/C++, because that's really all it is, but with Python right there, if I want it.

spaCy meets Transformers: Fine-tune BERT, XLNet and GPT-2

spaCy meets Transformers: Fine-tune BERT, XLNet and GPT-2

Huge transformer models like BERT, GPT-2 and XLNet have set a new standard for accuracy on almost every NLP leaderboard. You can now use these models in spaCy, via a new interface library we've developed that connects spaCy to Hugging Face's awesome implementations.

David Dodson: spaCy in the News: Quartz’s NLP pipeline

David Dodson: spaCy in the News: Quartz’s NLP pipeline spaCy IRL 2019

Applied NLP: Lessons from the Field

Applied NLP: Lessons from the Field spaCy IRL 2019

Practical transfer learning for NLP with spaCy and Prodigy

Practical transfer learning for NLP with spaCy and Prodigy Infoshare

Building a software business with Python

Building a software business with Python TalkPython

Where Do Corpora Come From?NLP Highlights Podcast

What 1.2 million parliamentary speeches can teach us about gender representation

What 1.2 million parliamentary speeches can teach us about gender representation The Pudding

Analysis of parliamentary speeches using spaCy.

Can You Verifi This? Studying Uncertainty and Decision-Making About Misinformation

Can You Verifi This? Studying Uncertainty and Decision-Making About Misinformation Karduni, Wesslen, Santhanam, Cho, Volkova, Arendt, Shaikh, Dou (2018)

HCI interface to identify misinformation on social media using spaCy for NER.

More than a Million Pro-Repeal Net Neutrality Comments were Likely Faked

More than a Million Pro-Repeal Net Neutrality Comments were Likely Faked Hackernoon

Analysis of net neutrality comments by Jeff Kao using spaCy for word vectors.

Why Python’s the best language for AI (and how to make it even better)

Why Python’s the best language for AI (and how to make it even better)PyCon Israel Keynote

Reflections on running spaCy: commercial open-source NLP ines.io

As more and more people and companies are getting involved with open-source software, balancing the expectations of an open community and a traditional provider vs. consumer relationship is becoming increasingly difficult. Are maintainers becoming too authoritarian? Are users becoming too demanding? Are large companies selling out open-source?

Embed, encode, attend, predict: The new deep learning formula for state-of-the-art NLP models

Embed, encode, attend, predict: The new deep learning formula for state-of-the-art NLP models

Over the last six months, a powerful new neural network playbook has come together for Natural Language Processing. The new approach can be summarised as a simple four-step formula: embed, encode, attend, predict. This post explains the components of this new approach, and shows how they're put together in two recent systems.

Introducing Explosion AI

Introducing Explosion AI

The problem with developing a machine learning model is that you don't know how well it'll work until you try — and trying is very expensive. Obviously, this risk is unappealing, but the existing solution in the market, one-size-fits-all cloud services, are even worse. We're launching Explosion AI to give you a better option.

Multi-threading spaCy's parser and named entity recognizer

Multi-threading spaCy's parser and named entity recognizer

In v0.100.3, we quietly rolled out support for GIL-free multi-threading for spaCy's syntactic dependency parsing and named entity recognition models. Because these models take up a lot of memory, we've wanted to release the global interpretter lock (GIL) around them for a long time. When we finally did, it seemed a little too good to be true, so we delayed celebration — and then quickly moved on to other things. It's now past time for a write-up.

Dead Code Should Be Buried

Dead Code Should Be Buried

Natural Language Processing moves fast, so maintaining a good library means constantly throwing things away. Most libraries are failing badly at this, as academics hate to editorialize. This post explains the problem, why it's so damaging, and why I wrote spaCy to do things differently.

Parsing English in 500 Lines of Python

Parsing English in 500 Lines of Python

This post explains how transition-based dependency parsers work, and argues that this algorithm represents a break-through in natural language understanding. A concise sample implementation is provided, in 500 lines of Python, with no external dependencies. This post was written in 2013. In 2015 this type of parser is now increasingly dominant.

PyDev of the Week: Ines Montani Mouse vs. Python Blog

McKenzie Marshall: NLP in Asset Management (Barings)

McKenzie Marshall: NLP in Asset Management (Barings)spaCy IRL 2019

Entity linking functionality in spaCy

Entity linking functionality in spaCy spaCy IRL 2019

Practical Natural Language Processing with spaCy and Prodigy

Practical Natural Language Processing with spaCy and Prodigy TWIML AI Podcast

FAQ #1: Tips & tricks for NLP, annotation & training with Prodigy and spaCy

FAQ #1: Tips & tricks for NLP, annotation & training with Prodigy and spaCy

In this video, Ines talks about a few frequently asked questions and shares some general tips and tricks for how to structure your NLP annotation projects, how to design your label schemes and how to solve common problems.

The AI Revolution will not be Monopolized

The AI Revolution will not be Monopolized Hack Talks

Who’s going to "win at AI"? There are now several large companies eager to claim that title. Others say that China will take over, leaving Europe and the US far behind. But short of true Artificial General Intelligence, there’s no reason to believe that machine learning or data science will have a single winner. Instead, AI will follow the same trajectory as other technologies for building software: lots of developers, a rich ecosystem, many failed projects and a few shining success stories.

Building new NLP solutions with spaCy and Prodigy

Building new NLP solutions with spaCy and Prodigy PyData Berlin

“Commercial machine learning projects are currently like start-ups: many projects fail, but some are extremely successful, justifying the total investment. While some people will tell you to embrace failure, I say failure sucks — so what can we do to fight it? In this talk, I will discuss how to address some of the most likely causes of failure for new NLP projects.”

Increasing Data Science Productivity: spaCy & Prodigy

Increasing Data Science Productivity: spaCy & Prodigy SF Machine Learning Meetup

spaCy’s entity recognition model: incremental parsing with Bloom embeddings & residual CNNs

spaCy’s entity recognition model: incremental parsing with Bloom embeddings & residual CNNs

spaCy v2.0’s Named Entity Recognition system features a sophisticated word embedding strategy using subword features and "Bloom" embeddings, a deep convolutional neural network with residual connections, and a novel transition-based approach to named entity parsing.

Pseudo-rehearsal: A simple solution to catastrophic forgetting for NLP

Pseudo-rehearsal: A simple solution to catastrophic forgetting for NLP

Sometimes you want to fine-tune a pre-trained model to add a new label or correct some specific errors. This can introduce the "catastrophic forgetting" problem. Pseudo-rehearsal is a good solution: use the original model to label examples, and mix them through your fine-tuning updates.

Supervised learning is great — it's data collection that's broken

Supervised learning is great — it's data collection that's broken

Short of Artificial General Intelligence, we'll always need some way of specifying what we're trying to compute. Labelled examples are a great way to do that, but the process is often tedious. However, the dissatisfaction with supervised learning is misplaced. Instead of waiting for the unsupervised messiah to arrive, we need to fix the way we're collecting and reusing human knowledge.

spaCy v1.0: Deep Learning with custom pipelines and Keras

spaCy v1.0: Deep Learning with custom pipelines and Keras

I'm pleased to announce the 1.0 release of spaCy, the fastest NLP library in the world. By far the best part of the 1.0 release is a new system for integrating custom models into spaCy. This post introduces you to the changes, and shows you how to use the new custom pipeline functionality to add a Keras-powered LSTM sentiment analysis model into a spaCy pipeline.

How front-end development can improve Artificial Intelligence

How front-end development can improve Artificial Intelligence

What's holding back Artificial Intelligence? While researchers rightly focus on better algorithms, there are a lot more things to be done. In this post I'll discuss three ways in which front-end development can improve AI technology: by improving the collection of annotated data, communicating the capabilities of the technology to key stakeholders, and exploring the system's behaviours and errors.

spaCy now speaks German

spaCy now speaks German

Many people have asked us to make spaCy available for their language. Being based in Berlin, German was an obvious choice for our first second language. Now spaCy can do all the cool things you use for processing English on German text too. But more importantly, teaching spaCy to speak German required us to drop some comfortable but English-specific assumptions about how language works and made spaCy fit to learn more languages in the future.

How spaCy Works

How spaCy Works

This post was pushed out in a hurry, immediately after spaCy was released. It explains some of how spaCy is designed and implemented, and provides some quick notes explaining which algorithms were used. The post pre-dates spaCy's named entity recogniser, but it provides some detail about the tokenisation algorithm, general design, and efficiency concerns.

Episode 139: f"Yes!" for the f-strings PythonBytes Podcast

Patrick Harrison: Financial NLP at S&P Global

Patrick Harrison: Financial NLP at S&P Global spaCy IRL 2019

spaCy IRL 2019: 2 days of NLP in Berlin

spaCy IRL 2019: 2 days of NLP in Berlin

We were pleased to invite the spaCy community and other folks working on Natural Language Processing to Berlin this summer for a small and intimate event.

Advanced NLP with spaCy: A free online course

Advanced NLP with spaCy: A free online course

In this free and interactive online course, you’ll learn how to use spaCy to build advanced natural language understanding systems, using both rule-based and machine learning approaches.

Practical transfer learning for NLP with spaCy and Prodigy

Practical transfer learning for NLP with spaCy and Prodigy Applied Machine Learning Days

The process: Transforming spaCy’s docs

The process: Transforming spaCy’s docs Increment Magazine

Making your documentation work for users with vastly different needs is a challenge. Here’s how spaCy, an open-source library for natural language processing, did it.

Embed, encode, attend, predict

Embed, encode, attend, predict Data Science Summit

While there is a wide literature on developing neural networks for natural language understanding, the networks all have the same general architecture. This talk explains the four components (embed, encode, attend, predict), gives a brief history of approaches to each subproblem, and explains two sophisticated networks in terms of this framework.

Explosion in 2017: Our Year in Review

Explosion in 2017: Our Year in Review

We founded Explosion in October 2016, so this was our first full calendar year in operation. We set ourselves ambitious goals this year, and we're very happy with how we achieved them. Here's what we got done.

Introducing custom pipelines and extensions for spaCy v2.0

Introducing custom pipelines and extensions for spaCy v2.0

As the release candidate for spaCy v2.0 gets closer, we've been excited to implement some of the last outstanding features. One of the best improvements is a new system for adding pipeline components and registering extensions to the Doc, Span and Token objects. In this post, we'll introduce you to the new functionality, and finish with an example extension package, spacymoji.

Building Prodigy: Our new tool for efficient machine teaching

Building Prodigy: Our new tool for efficient machine teaching ines.io

The philosophy behind Prodigy’s features and its cloud-free design.

Supervised similarity: Learning symmetric relations from duplicate question data

Supervised similarity: Learning symmetric relations from duplicate question data

Supervised models for text-pair classification let you create software that assigns a label to two texts, based on some relationship between them. When the relationship is symmetric, it can be useful to incorporate this constraint into the model. This post shows how a siamese convolutional neural network performs on two duplicate question data sets with experimental results.

An open-source named entity visualizer for the modern web

An open-source named entity visualizer for the modern web

Named Entity Recognition is a crucial technology for NLP. Whatever you're doing with text, you usually want to handle names, numbers, dates and other entities differently from regular words. To help you make use of NER, we've released displaCy-ent.js. This post explains how the library works, and how to use it.

A natural language user interface is just a user interface

A natural language user interface is just a user interface

Let’s say you’re writing an application, and you want to give it a conversational interface: your users will type some command, and your application will do something in response, possibly after asking for clarification.

Statistical NLP in the Ten Hundred Most Common English Words

Statistical NLP in the Ten Hundred Most Common English Words

When I was little, my favorite TV shows all had talking computers. Now I’m big and there are still no talking computers, so I’m trying to make some myself. Well, we can make computers say things. But when we say things back, they don’t really understand. Why not?

Introducing spaCy

Introducing spaCy

Computers don't understand text. This is unfortunate, because that's what the web almost entirely consists of. We want to recommend people text based on other text they liked. We want to shorten text to display it on a mobile screen. We want to aggregate it, link it, filter it, categorise it, generate it and correct it. spaCy provides a library of utility functions that help programmers build such products.