Who said what: using machine learning to correctly attribute quotesThe Guardian Engineering BlogHow the Guardian uses spaCy and Prodigy to train a custom coreference resolution model.
Creating Custom Event Data Without Dictionaries: A Bag-of-TricksHalterman, Schrodt, Beger, Bagozzi, Scarborough (2023)While in the past the process of generating training case has been quite time consuming and tedious, newer approaches such as those incorporated into the web-based Prodigy annotation system allow this to be done much more quickly.
Speech acts in the Dutch COVID-19 Press ConferencesSchueler, Marx (2022), Language Resources and EvaluationWe used the annotation tool Prodigy. Prodigy provides a simple interface in which the annotator sees a sentence and selects the applicable speech acts. The use of Prodigy considerably sped up the annotation process, allowing the annotators to annotate around 200 sentences per hour.
How We Found Pricey Provisions in New Jersey Police ContractsProPublicaProPublica and the Asbury Park Press scoured hundreds of police union agreements for details on publicly funded payouts to cops, using spaCy under the hood.
Impoliteness and morality as instruments of destructive informal social control in online harassment targeting Swedish journalistsBjörkenfeldt, Gustafsson (2023)In the annotation tool Prodigy used for this process, the tweets directed towards journalists were displayed alongside the initial tweet that initiated the conversation thread and the subsequent reply from the journalist.
When Women Make HeadlinesThe PuddingUsing spaCy and other packages from the NLP ecosystem for analyzing more than 382,000 headlines to see how women are represented (or misrepresented) in the news.
How We Analyzed Google’s Search ResultsThe MarkupUsing the Prodigy annotation tool, we created a user interface and a coder manual for two annotators to spot-check 741 stained images randomly sampled from our dataset.
What 1.2 million parliamentary speeches can teach us about gender representationThe PuddingAnalysis of parliamentary speeches using spaCy.
MP Interests Tracker: Utilising GenAI to uncover insights in the UK Register of Financial InterestJournalismAI BlogProject from teams at The Times and BBC using spacy-llm to make complex financial interests data more accessible.
Talking sense: using machine learning to understand quotesThe Guardian BlogHow the Guardian uses spaCy and Prodigy to train a machine learning model that helps extract quotes from news articles and match them to the correct source.
The Physical Traits that Define Men and Women in LiteratureThe PuddingAnalysis of physical traits most tied to gender in literature using spaCy.
Can You Verifi This? Studying Uncertainty and Decision-Making About MisinformationKarduni, Wesslen, Santhanam, Cho, Volkova, Arendt, Shaikh, Dou (2018)HCI interface to identify misinformation on social media using spaCy for NER.
How Good is the Model in Model-in-the-loop Event Coreference Resolution Annotation?Ahmed, Nath, Regan, Pollins, Krishnaswamy, Martin (2023)Figure 6 illustrates the interface design of the annotation methodology on the popular model-in-the-loop annotation tool - Prodigy. We use this tool for the simplicity it offers in plugging in the various ranking methods we explained.
How the Guardian approaches quote extraction with NLPA case study of the Guardian's spaCy-Prodigy workflow to modularize quote extraction for content creation. This study includes iterative annotation guidelines and custom interface functionality.
Corpus-Level Evaluation for Event QA: The IndiaPoliceEvents Corpus Covering the 2002 Gujarat ViolenceHalterman, Keith, Sarwar, O’Connor (2021), ACL 2021Figure A2 shows a stylized version of the custom interface we built using the Prodigy annotation tool. Annotators are presented with an entire document, with sentences sequentially highlighted.
Millennials Kill EverythingThe PuddingAnalysis on media reporting of millenials using spaCy. From napkins to marriage to Applebees, just looking at headlines you’d guess that for the past decade the millennial generation’s been on a rampage.
More than a Million Pro-Repeal Net Neutrality Comments were Likely FakedHackernoonAnalysis of net neutrality comments by Jeff Kao using spaCy for word vectors.