Written by Matthew Busekroos
Originally from China, CSAIL PhD candidate Shannon Shen received his undergraduate degree in Civil Engineering at Nanjing Tech University. After obtaining a Master’s degree in Data Science from Brown University, Shen then spent several years working as a researcher at Harvard University and the Allen Institute for AI.
Shen now works alongside Professor David Sontag in the lab’s Clinical Machine Learning Group. Shen said he is fortunate to be advised by Professor Sontag.
“He is a role model of advancing machine learning foundations as well as applying it in high-stake settings like healthcare,” Shen said. “I enjoyed chatting with him during the weekly meetings, where his mentorship goes beyond research guidance: he inspires me to think deeper about the problems at hand and gives me new perspectives to think and understand about problems. He has been a caring and considerate mentor as well as a great friend that not only supports various aspects of my research but also helps me to become a better person overall.”
Shen’s research aims at building better AI (for example, Large Language Models) for the end users, especially for expert users like doctors, lawyers, and even researchers.
“This is a joint effort between Machine Learning and Human Computer Interaction — we need to design more accurate models as well as better ways for people and AIs to interact and collaborate,” he said.
Shen cites the group’s recent work, Co-LLM, as an example.
“It’s a method that enables collaboration between two or more language models of different expertise’s to provide better generations,” Shen said. “For example, let’s say we have a general language model like ChatGPT that can produce friendly responses but might not have enough expert knowledge, and another one that is trained to specialize in medicine. Our method makes it possible that the general language model ‘calls’ the medical model when needing to generate text involving the domain knowledge.”
Shen added that one machine learning challenge is learning how to collaborate between these models without labeled data. He said they develop a method based on the latent variable model, where it trains a “switch” variable between the models by observing both models’ confidence over the training text. During training, the variable figures out which text each model is good at generating, and during generation time, it can “switch” between which model to use for decoding the next token to combine the best of each.
He said this method has many benefits: the flexibility of the algorithm enables easy composition of different off-the-shelf language models without expensive pre-training. It can also speed up the generation if collaborating between small and large models.
Shen is interested in how to develop AIs that can enrich or enhance one’s intellectual work.
“Rather than only focusing on building ‘better’ AIs that can supersede us, I deeply care about how the AIs can actually empower us and make us better at what we do and do what we cannot do,” he said. “One interesting fact is that the R&D spending on ‘building better AIs’ is significantly larger than ‘designing better ways for us to use AIs’: for example, companies like OpenAI spent many millions of dollars to build the model that powers AI chatbots like ‘ChatGPT,’ but in the end, it’s a simple web-portal for people to interact and chat with the models.”
Shen said he thinks it is critical to have human factors considered in the first place when developing AIs, rather than adding post hoc “patches” after building an AI that might be initially problematic or not aligned with human values.
“For example, in a project called SymGen, we focus on improving the trustworthiness of AI generations via providing fine-grained attributions,” Shen said. “It’s a known fact that nowadays AIs can ‘hallucinate,’ i.e., generating unsupported or incorrect facts. Existing RAG-based approaches try to elicit the models to generate citations alongside the text in an attempt to battle this issue. However, usually for a single sentence generated by the LLM, they might cite one or more documents/web pages that often take minutes to read. As a result, the user needs to spend a lot more time to read and verify the source than reading the generation alone, which makes the attribution less helpful in practice.”
Shen said to solve this issue, they reconfigure the language model generation such that it produces fine-grained attributions to specific spans in the source data.
Shen added that most of the group’s work can have direct use cases or applications in the industry.
“In my work, I always try to understand the interplay between the data used, methods developed, and end users involved in the research project,” he said. “Often there are real-world motivations and some projects we’ve worked on aim to directly solve the pain points of the end users. We publicly share our code and data for our research, and we build demos and tools to help both the research and industry communities.”
Following his time at CSAIL, Shen said he would love to continue the journey of research either in industry or a research lab, and work on research that supports the creative endeavors for domain experts.
You can find more information about Shannon Shen and his work on his website: https://www.szj.io/.