Data science course delivers new perspectives on language
By Senta Scarborough | Emory Report | Nov. 16, 2020
Dan Sinykin, assistant professor of English, gets some help from pup Azul as he teaches QTM 340: Practical Approaches to Data Science with Text.
By combining the humanities with the cutting-edge field of data science, an innovative Emory College course examines how turning text into data can lead to new insights on topics ranging from poetry to social media posts.
“What’s exciting about the class is we are in a moment culturally and technologically with quickly developing advancements in data science and language,” says Dan Sinykin, assistant professor of English, who teaches Quantitative Theory and Methods 340: Practical Approaches to Data Science with Text.
Approaching language as text data that can be analyzed through computational methods powered by artificial intelligence allows researchers the creativity to uncover bias, discover untold stories and share new insights about the humanities and social sciences.
“It’s one of the most interesting areas in academia right now,” says Mack Hutsell, a computer science and English major. “The class has been good. It’s practical but we are learning theory and the human side of things. It shows how I can apply these methods and how we should use them.”
Each week, students learn how to perform a new method including web scraping, topic modeling, sentiment analysis and vector space models. These are applied to a variety of text data including poetry, social media posts, presidential campaign rally music playlists, screenplay dialogue and obituaries.
For example, GPT-3, an extremely powerful language model, writes by predicting connections between words. Students used an earlier version, GPT-2, to write one of their posts on the course discussion forum by typing 30 words and letting the program write the rest.
One week, the class read analysis of ratemyprofessors.com reviews revealing that comments showed gender bias. Women were “bossy” or “bubbly” while men were “genius” or “arrogant.”
“It reflects their own collective bias on gender,” Sinykin says. “They can do this type of analysis on any text. The students find it exciting how rapidly the tech is evolving and being applied to everything.”
Celia Hu, a senior majoring in quantitative science studies, values the freedom of sharing examples in class from students’ own diverse interests — texts from psychology, music, a Mary Oliver poem and an article comparing how information spread virally in 19th century newspapers versus social media today.
Because of COVID-19, the class is taught online for the first time. It’s also Sinykin’s first time teaching it. Lauren Klein, Emory associate professor of English and QTM, designed and taught the class last year in person. Klein, director of Emory’s Digital Humanities Lab, and Sinykin have built the class collaboratively while alternating teaching. It’s a pragmatic approach due to the fast pace of data science.
Tutorial exercises are a big part of learning. Last year, students worked the tutorials together in class with their professor’s guidance.
“The goal is to bring students up to speed on practical skills of data science with text so they can do it themselves,” Sinykin says.
Adjusting for online learning
Sinykin initially had class discussions twice a week — on an online platform Tuesdays and then live on Zoom Thursdays — but realized the format was too much for students during the pandemic, and with some in different times zones.
Sinykin’s solution: give students Tuesday’s class time for completing assignments independently.
To help, Sinykin — drawing from humanities experience — crafts each step of the tutorials with instruction between lines of code to be as conversational and clear as possible for students with varying levels of computational experience.
On Thursdays, Sinykin highlights key issues and answers questions on Zoom.
“He is giving us enough time to digest, have a few trials and errors in the homework and do corrections ourselves, and then time to talk to classmates or our professor,” Hu says, “I think that is valuable especially during the pandemic. Things can get stressful easily and this professor is generous in providing us with the time we need.”
To foster engagement, Sinykin also holds Zoom breakout sessions.
“I ask a lot more questions when I am online than in person. On Zoom, you are in a comfortable space and there’s less anxiety if you want to speak,” Alex Welsh, a senior and computer science major, says.
Data ethics and justice is an essential part of class discussion. Welsh shared an example in class: last year, the Apple Pay app gave women lower credit limits than men because women earn less — causing outrage.
“The rules are fixed in machine learning,” Welsh says. “The machine will follow the rules you give it. If you give it bad rules it will follow them. It is a reflection of our society.”