Audrey Woods | CSAIL Alliances
In the last few years, it has become clear that AI is impressively adept at making predictions when fed enough raw data. Unsurprisingly, this ability has generated excitement in the healthcare industry, where AI is being applied toward drug and treatment generation, early identification of disease, predicting patient outcomes, and many more.
However, CSAIL-affiliated Associate Professor Marzyeh Ghassemi argues that technologists have a responsibility to make sure a tool is actually going to make a positive difference and, as she puts it, “work at least as well as what [healthcare providers] are doing right now.” Acknowledging that AI tools have the ability to transform medicine and save lives, Professor Ghassemi is working to understand the risks of applying AI in medical settings and map out ways to mitigate them to empower the future of healthcare.
FINDING HER INTEREST
From a young age, Professor Ghassemi was interested in video games, puzzles, and health. She considered a career in healthcare, but computer science drew her because of its ability to combine all her interests, especially when she learned about how computer science tools could be applied in medicine. Professor Ghassemi’s trajectory into academia was, she says, “nontraditional.” She was homeschooled, went to New Mexico State University, and had a daughter around the time she started graduate school at MIT. But she’s grateful to have found a mentor in her now-colleague, CSAIL Professor Peter Szolovits, who leads the Clinical Decision-Making group and calls Professor Ghassemi, “a spectacularly good researcher [and] one of the best students I’ve had.”
As a PhD student, Professor Ghassemi wanted to broaden what Professor Szolovits’s group was working on in extracting clinical information from physician notes using NLP tools. Specifically, she wanted to see if machine learning algorithms could make accurate predictions on patient outcomes when trained on multimodal data, combining vitals, labs, and clinical notes. This project became the foundation of her thesis, “Representation Learning in Multi-dimensional Clinical Timeseries for Risk and Event Prediction.” During this time, she discovered how biases in health data lead to gaps in machine learning performance and learned that “unless you think carefully about them, models will naively reproduce and extend biases.” Since then, her research has been focused on identifying such model weaknesses and creating a broader conversation around identifying, rectifying, and protecting against the capacity for models to perpetuate inequality and unfairness. Toward that end, her group, Healthy ML, “focuses on creating and applying machine learning to understand and improve health in ways that are robust, private, and fair.”
THE PROBLEM: AI BIAS, USER TRUST, AND RISKY APPLICATIONS
In her conversation with CSAIL Alliances on the Alliances Podcast, Professor Ghassemi describes an experiment she learned about as a graduate student that had a profound impact on the way she thought about the deployment of technological tools. In this study, a group of people were brought into a room to do an unrelated task. Soon after everyone was inside, a fire alarm went off and fake smoke began to fill the room, mimicking a true emergency. A rescue robot came in, ostensibly to guide the subjects to safety, except that it guided them right past the safe door they came in through and into a dark room without any discernable exit. The subjects of the experiment followed the robot, despite some of them having watched a video in which the robot did terribly with a similar navigational task. To Professor Ghassemi, this highlighted just how trusting people can be of technology, even when users are exposed to the risks and pitfalls of the tool they’re using.
To educate the public about these risks, Professor Ghassemi’s group has published extensively on AI bias and the risk scenarios of commonly used tools. For example, she and her research group showed that AI models can identify a patient’s race from medical images like chest X-rays, even when radiologists could not. They showed that many medical models optimized for the general population do not perform well for women and minorities, and that the more a model could predict a patient’s race or gender from a medical image, the worse the model performed for those subgroups.
Some of the solutions that Professor Ghassemi and her fellow researchers have explored involve tailoring models for robustness to data shift and minority populations, emphasizing that the more all-purpose a model is designed to be, the more likely it is to underperform with smaller subgroups. This can be challenging, though, because minorities are by definition in the minority and getting enough robust data to accurately train a model is difficult when you’re working with a narrow subset of the population. It’s also important to consider geographical differences, as a model trained in one part of the country isn’t guaranteed to work in a different hospital setting or patient demographic. “A model that is well-balanced in one site may not function effectively in a different environment,” she says. “This impacts the utility of models in practice, and it’s essential that we work to address this issue for those who develop and deploy models.”
One of the reasons Professor Ghassemi is so inspired to foster careful and thoughtful deployment of AI tools is her awareness of their revolutionary potential. “AI is extending our capacity, and what's really cool about that is it's transformative,” she explains. “It's not like we're asking a model to mimic our existing practice that is not so great but just do it faster. We're asking AI to do something better than us. Something that we're actually not great at, and I think that's really exciting within clinical settings.” For example, the predictive power of AI toward early detection of domestic violence or endometriosis, both of which currently have “a long delay between the onset of a condition and a clinician recognizing that it's happening.” AI could also be used for difficult personalization tasks, such as predicting which patients might do well on a specific antidepressant or chemotherapy treatment, incorporating social determinant of health inputs like available resources, medical history, etc. into its recommendations, and generally improve patient outcomes. Furthermore, well-calibrated AI could radically improve the raw volume of paperwork that doctors, nurses, and other healthcare personnel have to deal with, cutting down on a particularly burdensome aspect of modern medicine. With the looming problem of medical staff shortages and an aging population, it’s critical to get these tools right so they can be deployed fairly, accurately, and successfully.
GOING FORWARD: COLLABORATION, REGULATION, AND INTEGRATION
Unfortunately, AI models are not like iPhones, where they can be tested out in one area and then deployed broadly with guarantees that they will work as demonstrated. For better or worse, this unevenness in performance is a problem across ML. As an example, Professor Ghassemi describes how computer vision models trained on data from the United States will label an Indian wedding as “performance art.” However, she points out, “the issue in health care is when something doesn't work on a new patient population, the result can be really disastrous.”
But even though AI deployment feels like a gold rush right now, Professor Ghassemi says there’s guidance in examples like aerospace. In the beginning of the industry, flying was incredibly dangerous and risky, but now it’s safer to fly across the country than to drive to the grocery store. “This is because we now have multiple regulatory systems in the United States that were created by government funded regulators.” With that in mind, Professor Ghassemi says that when it comes to AI regulation, “I think this needs to be an extremely well-orchestrated effort… Many parts of the regulatory system need to be engaged.” This means that experts like her need to get “really comfortable” working with lawmakers to create guidance for model developers and employers because, “we don’t have the rules yet, and it’s hard to play by the rules when there aren’t any.”
For companies considering AI solutions in healthcare—or more broadly—she says, “It's not a question anymore of whether we can predict something. The answer is almost always yes, we can. The question you should be asking is: Should we predict this? And if we do, what action will we take as a result?” With thorough research, collaborative effort, and deep consideration, Professor Ghassemi is working to help guide the industry on this question and bring forth the future of AI in healthcare.
Visit Professor Ghassemi’s website or CSAIL page to learn more about her work.