Written by Matthew Busekroos | Produced by Andrew Zucosky
Originally from Thessaloniki, Greece, CSAIL PhD student Markos Markakis now works in the lab’s Data Systems Group.
Prior to CSAIL, Markakis interned twice in management consulting, but realized that his ideal career would involve a more significant technical component. Markakis decided to go deeper into the technical side of things by pursuing a PhD. Once set on a computer science path, MIT and CSAIL quickly emerged as his top choice due to the combination of its worldwide renown academics and location in the Boston area. Having experienced Princeton University’s more local feel during his undergraduate studies, Markakis wanted to pursue graduate studies in a more urban environment while staying on the East Coast.
Markakis remains happy with his choice and said the Data Systems Group has been an amazing work family over the past few years.
“We have students working on very different aspects of data management, from tiered memory all the way to LLM-powered query planning, which means that the Q&A portion of internal presentations can yield all kinds of insights from different perspectives about each presented project,” Markakis said.
He added that the faculty members in the Data Systems Group similarly bring complementary, but equally inspiring perspectives to the table.
“From my advisor, Professor Tim Kraska, I have learned to always remember that we are ultimately building real systems, so maximizing impact means ensuring our problem statements and assumptions rise to the current challenges in the field,” Markakis said. “I have also worked closely with Principal Research Scientist Michael Cafarella, whose enthusiasm for exploring novel ideas and commitment to fostering a supportive work environment has been nothing short of amazing.”
One project Markakis is currently working on is part of a larger effort in the lab called “BRAD,” advised by Professors Kraska and Samuel Madden.
Markakis said as database workloads move to the cloud, providers keep expanding their cloud-based database engine offerings to cater to different kinds of workloads.
“This crowded landscape makes it very challenging for businesses to select and provision the right engines for their workloads, in order to achieve satisfactory performance at the minimum possible cost,” he said. “BRAD addresses this problem using machine learning to arrive at the optimal configuration based on each customer's workloads.”
Markakis’ focus is specifically on efficiently achieving latency targets within the context of BRAD. The cloud execution model has several sources of performance unpredictability (e.g. different hardware types, virtualization), even before considering the uncertainty in the customer workloads themselves. In this environment, ensuring that each query meets its latency target requires carefully crafted models and policies.
Markakis is also involved in LOGos, a project advised by Cafarella.
“Everyone that has ever programmed has tried ‘debugging by print statement,’ whereby the presence/sequence of printed messages helps locate a bug,” he said. “Large systems implement a proactive version of this approach in the form of logging – appending timestamped records of important events to a file. However, this quickly leads to large volumes of logs, which can be hard to analyze when a problem must be diagnosed. Worse yet, naively finding correlations in logged values can be misleading if detached from system mechanics.”
Markakis said in LOGos, they develop a human-in-the-loop pipeline for applying causal inference on log data, enabling users to reach principled conclusions about the system faster.
Through his research, Markakis said he hopes he will help streamline some rather unappealing dimensions of today’s computing paradigm: configuration and debugging.
“This can let people focus on implementing the ‘interesting’ part of their idea faster, rather than being bogged down by determining how big of a cloud instance to provision or why their program, which worked fine yesterday, has now crashed,” he said.
Markakis said the advances in hardware capabilities and machine learning architectures have convinced most enterprises that collecting more data is always worth it, a reality that system-builders must adapt to.
“The same ‘rise of data’ is true in our personal lives – each day we may take and exchange dozens of photos, exchange hundreds of messages and log various aspects of our lives: our meals, our workouts, our sleep and so forth,” he said. “Our data systems require continuous innovation to deliver good performance at previously unthinkable scales and ingestion rates.”
Markakis is excited that his research contributes to making this possible, ultimately making all this data collection useful.
After graduation, Markakis is looking for a role in industry.
“Data systems are meant to facilitate common interactions with real data,” he said. “Ensuring this within academia is challenging, since data and workloads from commercial applications are usually inaccessible.”
He said his dream job is one where he is not only using his technical skills to engineer effective solutions, but also part of the human- and business-oriented aspects of a project.
“Ultimately, designing a system is only useful if you get people to use it – and that’s rarely achieved by just offering good performance,” he said. “You also have to understand users’ priorities, needs and constraints, and work with them to deliver a solution that fits their unique needs”.
You can find more information on Markos Markakis on his website: https://people.csail.mit.edu/markakis/