Workshop-III
Presentations and Publications
May 11 - May 13, 2022
All times US Eastern time
Participants login here to enable Zoom and video linksCourse Materials
Day 1 - May 11

Hands on technical session
Instruction on how to load and utilize the models for research. Work time to address outstanding questions and technical problems.
Moderator: Lassner
Location: Center for Digital Humanities

Challenges in the development of NLP Resources for New Languages: Case Studies from Kannada, Quechua, and Russian
Moderators: Janco, Dombrowski
Languages: Russian, Quechua, Old Kannada

Right-To-Left and Back
Moderator: Fellbaum
Languages: Ottoman, Yiddish, Classical Arabic

Keynote: Representation in Literary NLP
David Bamman
While work in NLP has increasingly moved beyond the domains of news and Wikipedia to focus on literary texts, the availability of public domain collections like Project Gutenberg has still concentrated attention on the works of a small set of historical authors writing primarily in English. The "New Languages for NLP" workshop is expanding this attention along several important dimensions, and in this talk, I'll explore my own group's work to capture the diversity of representation in contemporary literature: creating a dataset of novels published between 1923-2020 annotated for a variety of NLP tasks; building models of referential gender that align characters in fiction with the pronouns used to describe them (he/she/they/xe/ze/etc.) rather than inferring an unknowable gender identity; and expanding BookNLP to Russian, Spanish, and a range of other languages to help drive multilingual work in cultural analytics centered on the analysis of character.
Location: Julis Romo Rabinowitz Building 399
Day 2 - May 12

Documentation & Publishing Research Data
Moderator: Karajgikar

Creating Annotated Corpora for Yoruba, Efik and Tigrinya
Languages: Yoruba, Tigrinya, Efik

East Asian Historical Language Models: Beyond the Mainstream
Languages: Old Chinese, Kanbun

Keynote: Advanced NLP for diverse languages with spaCy
Ines Montani
Since its release in 2015, spaCy has become one of the most popular open-source libraries for applied natural language processing in Python, enabling a wide range of applications across different use cases and domains. In this talk, I'll discuss spaCy's philosophy for modern NLP, its extensible design and new recent features to enable the development of advanced natural language processing pipelines for typologically diverse languages.
Location: Computer Science Building 104
Day 3 - May 13
Instructors roundtable
Andrew Janco, Natalia Ermolaev, Toma Tasovac, David Lassner, Nick Budak, Quinn Dombrowski
Reflections and future steps.