Workshop-III

Presentations and Publications

May 11 - May 13, 2022

All times US Eastern time

Participants login here to enable Zoom and video links

Course Materials

Day 1 - May 11

Hands on technical session

Hands on technical session

Instruction on how to load and utilize the models for research. Work time to address outstanding questions and technical problems.

Moderator: Lassner

Location: Center for Digital Humanities

Challenges in the development of NLP Resources for New Languages: Case Studies from Kannada, Quechua, and Russian

Challenges in the development of NLP Resources for New Languages: Case Studies from Kannada, Quechua, and Russian

Moderators: Janco, Dombrowski

Languages: Russian, Quechua, Old Kannada

Right-To-Left and Back

Right-To-Left and Back

Moderator: Fellbaum

Languages: Ottoman, Yiddish, Classical Arabic

Keynote: Representation in Literary NLP

Keynote: Representation in Literary NLP

David Bamman

While work in NLP has increasingly moved beyond the domains of news and Wikipedia to focus on literary texts, the availability of public domain collections like Project Gutenberg has still concentrated attention on the works of a small set of historical authors writing primarily in English. The "New Languages for NLP" workshop is expanding this attention along several important dimensions, and in this talk, I'll explore my own group's work to capture the diversity of representation in contemporary literature: creating a dataset of novels published between 1923-2020 annotated for a variety of NLP tasks; building models of referential gender that align characters in fiction with the pronouns used to describe them (he/she/they/xe/ze/etc.) rather than inferring an unknowable gender identity; and expanding BookNLP to Russian, Spanish, and a range of other languages to help drive multilingual work in cultural analytics centered on the analysis of character.

Location: Julis Romo Rabinowitz Building 399

Day 2 - May 12

Documentation & Publishing Research Data

Documentation & Publishing Research Data

Moderator: Karajgikar

Creating Annotated Corpora for Yoruba, Efik and Tigrinya

Creating Annotated Corpora for Yoruba, Efik and Tigrinya

Languages: Yoruba, Tigrinya, Efik

East Asian Historical Language Models: Beyond the Mainstream

East Asian Historical Language Models: Beyond the Mainstream

Languages: Old Chinese, Kanbun

Keynote: Advanced NLP for diverse languages with spaCy

Keynote: Advanced NLP for diverse languages with spaCy

Ines Montani

Since its release in 2015, spaCy has become one of the most popular open-source libraries for applied natural language processing in Python, enabling a wide range of applications across different use cases and domains. In this talk, I'll discuss spaCy's philosophy for modern NLP, its extensible design and new recent features to enable the development of advanced natural language processing pipelines for typologically diverse languages.

Location: Computer Science Building 104

Day 3 - May 13

Instructors roundtable

Andrew Janco, Natalia Ermolaev, Toma Tasovac, David Lassner, Nick Budak, Quinn Dombrowski

Reflections and future steps.