Presentations and Publications
May 11 - May 13, 2022
All times US Eastern timeParticipants login here to enable Zoom and video links
Day 1 - May 11
Hands on technical session
Instruction on how to load and utilize the models for research. Work time to address outstanding questions and technical problems.
Location: Center for Digital Humanities
Challenges in the development of NLP Resources for New Languages: Case Studies from Kannada, Quechua, and Russian
Moderators: Janco, Dombrowski
Languages: Russian, Quechua, Old Kannada
Right-To-Left and Back
Languages: Ottoman, Yiddish, Classical Arabic
Keynote: Representation in Literary NLP
While work in NLP has increasingly moved beyond the domains of news and Wikipedia to focus on literary texts, the availability of public domain collections like Project Gutenberg has still concentrated attention on the works of a small set of historical authors writing primarily in English. The "New Languages for NLP" workshop is expanding this attention along several important dimensions, and in this talk, I'll explore my own group's work to capture the diversity of representation in contemporary literature: creating a dataset of novels published between 1923-2020 annotated for a variety of NLP tasks; building models of referential gender that align characters in fiction with the pronouns used to describe them (he/she/they/xe/ze/etc.) rather than inferring an unknowable gender identity; and expanding BookNLP to Russian, Spanish, and a range of other languages to help drive multilingual work in cultural analytics centered on the analysis of character.
Location: Julis Romo Rabinowitz Building 399
Day 2 - May 12
Documentation & Publishing Research Data
Creating Annotated Corpora for Yoruba, Efik and Tigrinya
Languages: Yoruba, Tigrinya, Efik
East Asian Historical Language Models: Beyond the Mainstream
Languages: Old Chinese, Kanbun
Keynote: Advanced NLP for diverse languages with spaCy
Since its release in 2015, spaCy has become one of the most popular open-source libraries for applied natural language processing in Python, enabling a wide range of applications across different use cases and domains. In this talk, I'll discuss spaCy's philosophy for modern NLP, its extensible design and new recent features to enable the development of advanced natural language processing pipelines for typologically diverse languages.
Location: Computer Science Building 104
Day 3 - May 13
Andrew Janco, Natalia Ermolaev, Toma Tasovac, David Lassner, Nick Budak, Quinn Dombrowski
Reflections and future steps.