WRITTEN BY: Thomas Forrister
With hybrid workspaces and decentralized work on the rise, many companies are looking for new ways to have distributed teams work efficiently together.
While there has been some successes of new collaboration tools for certain areas such as collaborative design tools or whiteboarding apps, current data analytics tools are still designed for a single user and allow at most to share (final) results. But what if there was a data environment in which people could work together — even when not in the physical meeting room — similar to the whiteboard concept?
This is one of the questions that led Professor Tim Kraska of MIT CSAIL and his team to develop the Northstar project, which uses a novel user interaction concept, called Visual Data Computing, and a new data processing engine to ensure interactive results to keep the user immersed in the data analytics process. “Northstar helps domain experts and data scientist to actually work together during a meeting either remote or in-person, which significantly shortens the time to insight” says Professor Kraska.
Furthermore, in order to make data science more accessible, Prof. Kraska team developed a set of machine learning-based assistants to assist the user along the data exploration and model building process and help her to avoid common pitfalls. “We wanted to create an interactive environment that people can call to collaborate and work together, and then it evolved over time to this interactive experience we wanted to create in particular for the citizen data scientist,” Prof. Kraska explains.
After some trial and error, he and his team started deploying the software to different companies and received positive feedback and specific feature requests, which made them consider forming a company around it. Northstar is currently being commercialized by einblick.ai (an MIT and Brown University spin-off), backed by venture capital and Prof. Kraska’s ML for Systems work.
In general, Prof. Kraska is interested in making data more accessible for everyone as in building systems for machine learning, or to leverage machine learning to improve the efficiency of data management systems.
While Professor Kraska’s Northstar falls into the first category, he recently started a new project called SageDB for the latter. “In the second category for applying machine learning for systems, we are currently looking into instance-optimized systems — how we can create systems that self-adjust automatically to the data and the workload.”
Right now, he and his fellow researchers are looking further into how this can apply to industry. Traditional systems target a whole range of use cases and build the data warehouse for all types of retailers and manufacturers across the board. While this saves the time and resources of developing a customized system from scratch, it also sacrifices performance.
“The question we’re asking ourselves is how can we leverage machine learning to build something, which self-adjusts based on the workload as well as the data,” says Prof. Kraska.
He explains that this could make for a much more efficient design of the system. “Machine learning plays an important role, because it gives us a tool to navigate this large subspace of potential configurations and options. In other cases, the bet goes further than that, because sometimes it’s possible to replace traditional components of a system entirely through a model.”
For example, he and his team did some work where they show that machine-learning models can replace traditional B-Tree indexes or enhance sorting algorithms. In another line of work, they show that machine-learning models can replace the query optimizer in traditional system designs.
Many of these research areas and projects such as Northstar spark industry partnerships with the help of MIT CSAIL Alliances.
“With the Northstar project and others, the CSAIL Alliances program has been great. Many of our initial customers came through CSAIL Alliances,” says Prof. Kraska. “We’re always looking for partners in industry to try out our software…and everybody benefits from it. The industry partners get direct access to new research results, and we can find out if it actually works in practice. We learn a ton by seeing how people actually use it and what the real problems are.”