Image
Three new frameworks from MIT CSAIL reveal how natural language can provide important context for language models that perform coding, AI planning, and robotics tasks (Credit: Alex Shipps/MIT CSAIL, with components from the researchers and Pixabay).
CSAIL article

Large language models (LLMs) are becoming increasingly useful for programming and robotics tasks, but for more complicated reasoning problems, the gap between these systems and humans looms large. Without the ability to learn new concepts like humans do, these systems fail to form good abstractions — essentially, high-level representations of complex concepts that skip less-important details — and thus sputter when asked to do more sophisticated tasks.

Image
alt="A team of MIT researchers found highly memorable images have stronger and sustained responses in ventro-occipital brain cortices, peaking at around 300ms. Conceptually similar but easily forgettable images quickly fade away (Credits: Alex Shipps/MIT CSAIL)."
CSAIL article

For nearly a decade, a team of MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) researchers have been seeking to uncover why certain images persist in a people's minds, while many others fade. To do this, they set out to map the spatio-temporal brain dynamics involved in recognizing a visual image. And now for the first time, scientists harnessed the combined strengths of magnetoencephalography (MEG), which captures the timing of brain activity, and functional magnetic resonance imaging (fMRI), which identifies active brain regions, to precisely determine when and where the brain processes a memorable image.

Image
With their DMD method, MIT researchers created a one-step AI image generator that achieves image quality comparable to StableDiffusion v1.5 while being 30 times faster (Credits: Illustration by Alex Shipps/MIT CSAIL using six AI-generated images developed by researchers).
CSAIL article

In our current age of artificial intelligence, computers can generate their own “art” by way of diffusion models, iteratively adding structure to a noisy initial state until a clear image or video emerges. Diffusion models have suddenly grabbed a seat at everyone’s table: Enter a few words and experience instantaneous, dopamine-spiking dreamscapes at the intersection of reality and fantasy. Behind the scenes, it involves a complex, time-intensive process requiring numerous iterations for the algorithm to perfect the image.

Image
alt="FeatUp is an algorithm that upgrades the resolution of deep networks for improved performance in computer vision tasks such as object recognition, scene parsing, and depth measurement (Credits: Mark Hamilton and Alex Shipps/MIT CSAIL, top image via Unsplash)."
CSAIL article

Imagine yourself glancing at a busy street for a few moments, then trying to sketch the scene you saw from memory. Most people could draw the rough positions of the major objects like cars, people, and crosswalks, but almost no one can draw every detail with pixel-perfect accuracy. The same is true for most modern computer vision algorithms: They are fantastic at capturing high-level details of a scene, but they lose fine-grained details as they process information.