Written by: Matthew Busekroos | Produced by: Nate Caldwell

Born in Toronto, Canada, Arash Nasr-Esfahany was raised in Isfahan, Iran. Nasr-Esfahany later studied Electrical Engineering at Sharif University of Technology prior to MIT. After four years at MIT and CSAIL, Nasr-Esfahany said he is happy to continue his education here. He said everyone he’s connected with is deeply interested in what they study, knowledgeable, smart, open to sharing their ideas and collaborative.

Nasr-Esfahany is currently a PhD candidate working alongside Professor Mohammad Alizadeh. The research done in their group spans a broad range of scientific topics and real-world applications, including video conferencing, causal modeling, formal verification, cloud gaming, reinforcement learning, and blockchains.

“Interacting with my labmates has significantly increased the breadth of my knowledge,” Nasr-Esfahany said. “Furthermore, working on different areas gives us diverse perspectives. This helps foster deep technical and intellectually rich discussions about our projects that improve their quality.”

Nasr-Esfahany said the most important thing he learned from Professor Alizadeh is how to do research, and the ability to transition from high-level reasoning for decision-making to low-level details of a project for making things really work.

“Another thing that I always strive to learn from [Alizadeh] is the ability to explain complex things in simple terms so that others understand it, get excited about it, and start contributing with their own ideas to it,” he said.

Nasr-Esfahany works on simulation and modeling of computer systems and networks. He said classical simulators model all the interactions and details of a system which makes them good at what they model, but also very slow for complex real-world systems with many details and components. For example, simulating a datacenter or a processor is 3-4 orders of magnitude slower than real-time.

“If you want to experiment with only a specific component of a large-scale system though, as opposed to ‘full system simulation,’ there exists a method called ‘trace-driven simulation,’ which is much faster,” he said. “As a result, it has been widely used for a long time for simulating and evaluating new ideas especially in computer systems. However, it has a big underlying assumption. It assumes that changes that you experiment with in the component that you’re interested in would not affect the behavior of the other components of the system, and they’ll keep behaving exactly the same way they were before you did the change. In other words, it treats the collected trace as an exogenous property of the rest of the system. This key assumption makes this type of simulation much faster, but also biased because the assumption is often violated in real systems.”

In a recent project, CausalSim (winner of the best paper award at NSDI ’23), Nasr-Esfahany and his colleagues characterized this source of bias in trace-driven simulation, formulated it as a causal problem, and developed a causal machine learning method for relaxing this assumption and removing this bias which leads to accurate simulations.

“We did an experiment with video streaming: We took two algorithms A and B and simulated them on some real-world network traces collected from real network sessions of users watching video,” he said. “Vanilla trace-driven simulation tells us that A is better than B and does much less rebuffering, while CausalSim tells us that B is better than A. We deployed both of them in an open-source video streaming system developed at Stanford (Puffer), and 9 months of data collection and measurement confirmed CausalSim’s conclusion, that B rebuffers much less than A. I think everyone in the community knew that vanilla trace-driven simulation is biased, but no-one knew how large the impact of this bias could be, which can lead to wrong conclusions. Using CausalSim, we can do accurate trace-driven simulation.”

In follow up work, Nasr-Esfahany said he developed new methods to make the solution work with less restrictions on the data collection process. With all that said, he added they can’t always do trace-driven simulation.

“In some cases, we are interested in experimenting with algorithms and their effect on all components of a system,” he said. “In these cases, we do full-system simulation which is slow. It’s slow because it models the system and all the interactions of different components at a very granular way, the way that we understand these systems and think they work. However, when we look at outcomes of these simulators, we’re often not interested in detailed events, but we care about aggregate overall statistics such as efficiency, latency, and quality of experience of user populations. I think low-level simulation is just an artifact of the way we think about these systems, makes it computationally intensive and slow.”

Currently, Nasr-Esfahany is working on using AI for learning high-level simulators of real-world large-scale systems. He said this will come with two main benefits. First, because it operates at higher levels of abstraction, it is much faster than traditional simulators that operate at lower levels. And second, it can learn to model the behavior of the system from real data collected from the system.

“Traditional full-system simulators have many parameters and configurations that one needs to specify, and it’s usually very hard to set them correctly,” he said. “As a result, even if we could magically run them very fast, there is always some simulation to reality gap. Learning models of systems from real data helps remove this gap as well.”

Nasr-Esfahany said his research will help develop fast and accurate simulators and models of computer systems including video conferencing, video streaming, cloud computing, and data centers.

“Using data-driven simulators, we can design better algorithms to manage and control complex systems,” he said. “This will result in higher efficiency in our systems and better user experience for everyone.”

After completing his PhD, Nasr-Esfahany said his dream job is doing anything where he can continue conducting research and collaborating with amazing people to make impossible things possible.

For more information on Arash Nasr-Esfahany, check out his webpage and project websites: causalsim.csail.mit.edu and bgm.csail.mit.edu.