Chatbots like ChatGPT and Claude have experienced a meteoric rise in usage over the past three years because they can help you with a wide range of tasks. Whether you’re writing Shakespearean sonnets, debugging code, or need an answer to an obscure trivia question, artificial intelligence (AI) systems seem to have you covered. The source of this versatility? Billions or even trillions of textual data points across the Internet.
In the months leading up to the 2024 U.S. presidential election, a team of researchers at MIT CSAIL, MIT Sloan, MIT LIDS, set out to answer a question no one had fully explored: how do large language models (LLMs) respond to questions about the election? Over four months, from July through November, the team ran nearly daily queries across 12 state-of-the-art models on more than 12,000 carefully constructed prompts, generating a dataset with over 16 million responses from LLMs, to help answer this question.
Annotating regions of interest in medical images, a process known as segmentation, is often one of the first steps clinical researchers take when running a new study involving biomedical images.
The artificial intelligence models that turn text into images are also useful for generating new materials. Over the last few years, generative materials models from companies like Google, Microsoft, and Meta have drawn on their training data to help researchers design tens of millions of new materials.