DeepSeek: PI Perspectives

Not sure what to think about DeepSeek R1, the most recent large language model (LLM) making waves in the global tech community? Faculty from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) are here to help!

Hear from three esteemed MIT CSAIL researchers on what this development means, how the scientific community is responding, and what they think the future holds for academia and in the business world.

Read our DeepSeek Fact Sheet with all the most important information pulled together into one quick page here.
 

MIT CSAIL Associate Professor Phillip John Isola

  • · "The basic recipe for DeepSeek is very similar to all the other frontier language models: it's an autoregressive transformer trained on a bunch of text from the internet, then further trained with reinforcement learning to do reasoning via chain of thought.
  • The thing that sets DeepSeek apart is that it is a really good implementation of this recipe. They cut out unnecessary bits and pared it down to a system that requires a lot less compute than previous systems used.
  • The interesting thing about DeepSeek from an algorithmic perspective is how simple the reinforcement learning part is. It omits a lot of the components that have been introduced in recent years. For example, it doesn't do any explicit test time search, there is no process supervision, the reward function looks very simple, there's no learned critic function, etc. Making it really simple might have helped them make it efficient.
  • It's not clear to me why DeepSeek made tech stocks go down, rather than up. DeepSeek makes the value of an Nvidia chip higher: one chip can do more than we thought it could. People should be willing to pay more today for a chip than they were willing to pay yesterday, because the chips provide more utility per unit. Why did the stocks crash then? It could be because people assume that AI has satiating demand, where once you have a smart enough AI you will be satisfied and there won't be demand for even smarter AIs. I think this is probably wrong. The long term effect could even be to increase total chip use, and energy use, if the demand increase outweighs the efficiency gain (Jevon's paradox).
  • The other big news about DeepSeek is that it's from a Chinese company, and seems pretty homegrown in China. This is one of the first really splashy cases where a Chinese company seems to be on par with, or maybe ahead of, US companies in AI.
  • Also, DeepSeek is open source, openweight, and open science (they are releasing papers).
  • The net effect should be to significantly increase the pace of AI development, since the secrets are being let out and the models are now cheaper and easier to train by more people.
  • I'm starting to see a narrative about DeepSeek potentially leading to future AI systems that use less energy, are smaller, are cheaper, etc. I think that's unlikely. Instead I think it's more likely that we will just get more powerful AI systems, which still use as much energy as possible, are as big as possible, etc."

MIT CSAIL Assistant Professor Yoon Kim

"DeepSeek represents a significant advance in bringing down the costs of building performant LLMs. While the individual technical advances behind DeepSeek are not significantly novel in and of themselves, taken together they represent substantial progress.

The DeepSeek team should also be commended for providing illuminating details about their models through a series of technical reports, as well as making their models publicly available under a permissive open-source license. This makes it possible for businesses to use these models as part of their AI stack, and researchers to study these models in depth."

 

MIT CSAIL Senior Research Scientist Jim Glass

"It sounds like [the DeepSeek Team] has come up with a more efficient training method for their foundation model which is great. Because they have open sourced the model and published the technical details it should be beneficial to everyone."

Cyber CTA background image
DeepSeek Fact Sheet: What you need to know