Ask CSAIL: Who’s Using What Model

A snapshot of how MIT researchers are working with AI

Audrey Woods, MIT CSAIL Alliances | April 7, 2026

AI models are proliferating fast. There’s Claude, ChatGPT, Gemini, Copilot, DeepSeek, Grok, Mistral, Llama, and many more emerging every day. But which ones to work with? And why? We asked MIT CSAIL faculty and students which AI tools they’re reaching for right now. The responses showed a variety of preferences, a clear winner in one area, and a word of caution about what goes into any public model’s memory.

Codex & Claude Code: The Current Standard

The most consistent finding was that Anthropic’s Claude Code and Codex are—at least for now—the default coding assistant for students and faculty writing prototype-level code. Professor Srini Devadas says, “my students are using Claude Code. From what I hear, it is substantially better than the alternatives.” Assistant Professor Andreea Bobu agreed, calling Claude Code and Codex "amazing for coding efficiency." Principal Research Scientist Michael Cafarella uses Claude Code for building prototypes and demos, but added, “I haven’t successfully used them for code I really care about. It’s mostly disposable stuff so far.”

Teaching, Writing, and Other Applications

For non-coding work, there’s a spread of preferences and workflows. Professor Devadas uses Microsoft Copilot, which runs on GPT 5.1, primarily as a teaching or brainstorming tool. He says, “I’m impressed with how much it can help. It does make things more efficient.” Professor Bobu has seen her students use Gemini successfully, and Dr. Cafarella uses Gemini for proofreading documents and hypothesis generation.

Professor Peter Szolovits shared two projects he’s created with vibe coding using ChatGPT and occasionally Gemini. First, he built web pages that display a list of published papers and theses from his research group. He said, “ChatGPT did a fine job, though using it was frustrating. For example, it would generate JavaScript code that was syntactically incorrect or would issue calls to nonexistent functions. I could ask to fix this, which it did, but then making other changes to my specifications would revert to the incorrect code over and over again. Even if I included instructions to the prompt to ask it to double-check for such errors. Nevertheless, I found it quite useful.”

The second project, which is currently in the works, is to create an Apple Mail extension that examines incoming email and does a “binary classification to determine whether a message contains a request for me to submit an article to some workshop, conference or journal, to serve as editor for some issue or on the program committee of a meeting, etc. These messages should be moved to a special ‘Calls’ mailbox so the hundreds I receive each week don’t clutter up my inbox.” Importantly, Professor Szolovits doesn’t want to have to share his incoming email with an external vendor. “I have tried building such a tool using both ChatGPT and Gemini as built in to Apple’s Xcode development environment, but neither has come close to working… It’s been suggested that I try Claude, whose reputation for coding assistance is better, but have not had a chance to do that.”

Privacy & Security Concerns

Exciting as these models are, there’s still reason for caution. Professor Devadas says, “I would not upload any sort of private/sensitive information into LLMs. I am amazed at how much memory they have of interactions, and they seem to correlate things across chats over a period of time. Perhaps this is entirely within the environment of a single user, but I don't know that for a fact. And they claim to be learning from these interactions and ask for feedback, so this means that the model is changing, and storing some form of the interaction (queries, responses) internally.” For his public class notes or the open code repositories his students are using and building this isn’t a huge concern, but industries dealing with private or sensitive data might think twice about what they’re uploading to an LLM.

Professor Szolovits also highlighted a paper published by a former Md/PhD student which “showed that for life-critical domains such as healthcare, LLMs make a lot of serious errors.”

A Wide-Open Field

None of this is prescriptive. In a field evolving so quickly, these perspectives are likely to change in a matter of months, if not weeks, as new tools enter the scene. What holds steady is the spirit of CSAIL, where researchers are constantly pushing the boundary of what’s possible with today’s models and building the foundation of what comes next.

Want to be a part of that process? Get in touch with CSAIL Alliances to learn more.