Michael Cafarella

AI Strategy and Database Management Systems with Principal Research Scientist Michael Cafarella

Audrey Woods | MIT CSAIL Alliances

As a young computer scientist, MIT CSAIL Principal Research Scientist Michael Cafarella wanted to make science fiction a reality. Inspired by the idea of creating “something that was barely believable,” he started his undergraduate degree in robotics. However, he quickly realized the timescale for robotics was longer than he wanted. “It was basically too tough to build a science fiction robot,” Dr. Cafarella jokes, so he pivoted to a subfield of computer science which felt both thrilling and practical to build in the short term: the intersection of databases and artificial intelligence.

By the time he started as a graduate student at the University of Washington in the early 2000’s, companies were looking to tackle big data problems, with new tools like Google web search gaining popularity. The exponential growth of the dataset sizes made it increasingly important to create ways to manage datasets and extract useful information from them. “What I don't think was obvious to everyone—but maybe has become clear in retrospect—is that the extreme large-scale processing was really a prerequisite to making a lot of progress in building the models that today exploit incredibly large datasets.” In other words, Dr. Cafarella argues that previous work on database management served as a necessary precursor to our current and exciting era of widespread AI.

Now, working in the Data Systems Group at CSAIL, Dr. Cafarella is excited to be seeing, using, and building the kinds of tools and systems he never thought he’d witness in his lifetime.

PALIMPZEST: A SYSTEM FOR OPTIMIZING AI MODEL BUILDING
Anyone working in the world of AI understands how dizzying the rate of change has been the last few years. ”Unless you are a serious AI fanatic, it is actually pretty tough to keep track of practical developments in the field, much less all the intellectual developments.” This, Dr. Cafarella explains, makes it challenging for software designers to make useful tools. “If you are a practicing engineer and you're trying to build some system, in principle you should really be running a huge fleet of experiments every night to figure out exactly which piece of technology or package to draw upon in order to build the program that you're trying to build,” which is clearly impractical.

To address this problem, Dr. Cafarella and his team drew inspiration from the 70’s and 80’s, where a similar exponential evolution was happening in computer hardware. At that time, “the capacities of one machine versus another were dramatically different, and yet people wanted to do things that were consistent across machines and hopefully got better as machines got better.” The computer scientists of that time came up with the idea of relational database systems. These allowed computers to assess their available resources and figure out the fastest way to answer a given user question without requiring a specific application or program to do so.

In the same vein, Dr. Cafarella and his collaborators—including MIT students, MIT faculty, and people from the Universities of Arizona and Chicago—is “trying to build that kind of language abstraction and data system which can take a general high level description of what you're trying to build and then make all the lower level decisions on your behalf.” This system, called Palimpzest, helps the user choose which model, inference method, interface hardware, prompt design, etc. is best for a given project and adjusts based on the available technology and the parameters of the user. For example, if someone wants to prioritize cost savings over speed, the system will offer a different set of recommendations than a user who wants the maximum speed no matter the price. Palimpzest draws on public service platforms to stay informed on the available models and technology and automatically compares the different options to help users make the best set of decisions for their given application.

ECONOMICS: USING AI FOR ACCURATE INFLATION ADJUSTMENT
Beyond Palimpzest, Dr. Cafarella is also working with economists at the Universities of Michigan and Maryland to develop models for more accurate price adjustment in inflation statistics. Calculating inflation is not always a straightforward practice considering how quickly goods change. Dr. Cafarella uses the example of a plain cotton shirt, which might go up in price but also down in quality, perhaps from 100% cotton to 5% polyester. “What [economists] would really like to answer is: what would I pay this year for the 100% cotton shirt if it were still on the shelf? Or, alternatively, if this polyester shirt had been sold last year, what would I have paid for it?”

Using the extensive data available on historical product description and prices, Dr. Cafarella and his colleagues are working to build a series of models that, when given a product description, can accurately predict the price at a particular moment in time. This will capture “quantifiable human preferences” which can then be translated into adjustments on quality that allow economists to produce clearer inflation numbers.

While their models have performed well when compared to professional statisticians—offering efficiency in economic analysis—one area the model has surprised them is food. These models have shown that food quality has gone up significantly, mitigating some degree of price inflation. Such results have offered economists an improved understanding of the US economy and, Dr. Cafarella feels, serve as “an example of how, by combining data and AI methods, they really help you understand the universe better.”

LOOKING AHEAD: UNCERTAINTY, OPTIMISM, AND CREATIVITY
Dr. Cafarella acknowledges the wide gamut of reactions associated with the rise of AI technology. “I know there’s a lot of excitement from some people but also fear in others who are worried about bad social outcomes from the widespread deployment of these things.” In response, he urges people to keep in mind that “there’s an incredibly large underappreciated amount of uncertainty about how [AI models] will be useful.” While the demos and announcements of new models are fun for scientists like Dr. Cafarella, it’s not immediately apparent how these tools can be deployed to create real value. Furthermore, as with other groundbreaking innovations like the World Wide Web, it might take society a long time to figure out how best to apply these advancements. Dr. Cafarella offers this advice: “I would ask people to be both modest and creative in figuring out how to use these things in their own lives and in their own organizations. If you’re totally happy or totally depressed [about AI technology], you’re probably wrong.”

Dr. Cafarella himself has found “modern models to be incredibly valuable for computer programming,” changing the way he writes software and automating the less appealing parts of the process. “I’m more productive. I can write more stuff. I can do more interesting things.” More broadly, he finds it “thrilling” to be at a place where society is talking about things like Artificial General Intelligence and regularly using “talking computers.” Over the course of his career, Dr. Cafarella has watched the science of database management evolve to support the rapid evolution and expansion of AI technology. Now, with science fiction technology truly becoming real, he’s excited to see what comes next.

Learn more about Dr. Cafarella on his website or MIT CSAIL page.

Learn More

More about Michael Cafarella