Workshop-II

Machine Learning and Statistical Language Models

January 10 - January 14, 2022

All times US Eastern time

Participants login here to enable Zoom and video links

Course Materials

Day 1 - January 10

Andrew Janco
David Lassner

Introduction to Machine Learning   Janco, Lassner

Preparation: Intro. to ML

Skills and concepts for model training.

Andrew Janco
Quinn Dombrowski

Practical Introduction to Model Training   Janco, Dombrowski

Preparation: LitBank Notebook

Hands-on activity with model training.

Day 2 - January 11

Toma Tasovac
Nick Budak

Overview: From Toy Data to Your Data   Tasovac, Budak

Preparation: None, slides available here

spaCy projects, data processing and model training with your project's data and requirements.

Toma Tasovac
Nick Budak

Demonstration: Training Models with INCEpTION Data   Tasovac, Budak

Preparation: New Language Training

Workflow for data preparation and model training with project data.

Day 3 - January 12

Toma Tasovac

Practical Session with Teams’ Data   Tasovac

Work in individual teams to run project files and train models. Assess model performance against applied research tasks.

Andrew Janco

Practical Session with Project Data and Requirements   Janco

Lightning talks to share with other groups. Continued project work

Day 4 - January 13

Andrew Janco
David Lassner

Embeddings, Do You Need Them?   Janco, Lassner

Preparation: Embeddings

Adding FastText vectors to your model. Shared embedding layers. Transformer pipeline component.

Andrew Janco
David Lassner

Optional Applied Session   Janco, Lassner

Preparation: Applied embeddings

Training with embeddings. Assess utility for research tasks

Day 5 - January 14

Andrew Janco
Natalia Ermolaev

Review of Key Topics and Discussion   Janco, Ermolaev

.

David Lassner
Nick Budak

Time for Team Meetings and Planning Work   Lassner, Budak

.