New MIT CSAIL Open-Source Project Automates Experimental Design Optimization

An important part of experimental design is evaluation, because it can lead to improvements in the experiments and more accurate and reliable results. But whether designing a new molecule, a new chair, or a new financial tool, experiments can be expensive to evaluate in terms of time, money, and performance. When balancing tradeoffs between conflicting objectives, which are often expensive black-box functions, how can an experiments designer accelerate the discovery of optimal solutions?

Schmidt Science Postdoctoral Fellow Mina Konaković Luković and PhD student Yunsheng Tian in the Computational Design and Fabrication Group in MIT CSAIL, advised by Professor Wojciech Matusik, explored the concept of multi-objective Bayesian optimization as a possible method. Through their research, they introduced a new algorithm and user-friendly platform that finds the optimal design for multiple users’ objectives simultaneously with a limited number of evaluations.

Their project, called the Automated Optimal Experiment Design Platform (AutoOED), is an open-source platform for automatically guiding experiment design, even for users who aren’t experts in optimization or computer science.

We tried to make the platform as general as possible, so that a lot of users find it useful,” said Luković. “We found out that there are many applications for this: molecule design, engineering design, materials science, and robotics. In computer science, it’s also popular in hyperparameter tuning for neural networks, which typically take forever to simulate; you’d rather choose the parameters in only several simulation steps. Our platform also applies to trading strategies and so on. But it’s not limited to any one of these applications.”

The platform also has an easy-to-use interface. “The most important feature is that it really provides a way of doing automatic experiment design. After the problem setup is specified, the platform will take care of everything else and return the optimal solutions to users,” said Tian, adding that users still have the option of manually controlling the optimization through iterations. “You can see your optimization progress and enter your own design variables, maybe based on your prior knowledge, instead of automatically generating everything by the algorithm.”

The CSAIL researchers began with the modest quest to find optimal designs for their own projects in the lab. As they investigated, they were only able to find two previous works in which you could select a batch of points to evaluate in parallel in each algorithm iteration that advance the performance of multiple objectives in a promising way, and still keep the small number of evaluations. However, none of them fully met all their criteria. So they decided to develop their own algorithm inhouse.

To their pleasant surprise, they found that their algorithm, called DGEMO, outperforms those of the previous works for multi-objective optimization problems, and decided they could help other researchers by developing the project further and putting the algorithm into their platform.

Luković says the reason they were “inspired to make this open source and make a platform usable for researchers is because it does have a lot of applications, but also because there is nothing available out there that is easily accessible to researchers, especially the ones who have no experience with coding and, for example, machine learning or optimization.”

The strength of the DGEMO algorithm lies in the way it selects and approximates the Pareto front, and how it helps you find designs to evaluate next. They are also working to develop practical extensions to DGEMO and implemented several other multi-objective Bayesian optimization algorithms into the platform for users to have more choices. These Bayesian optimization algorithms contribute to the platform’s ability to produce the learned prediction models of the unknown objectives, giving users better insights into optimization problems and guidance for how they want to proceed.

No matter how hard the experiment is, “for most of the problems, we expect the optimization to finish after 200 or 300 evaluations.” said Luković. “You could, of course, go on to 1,000 evaluations or more, and the longer you go, maybe you’ll be able to take better results. But no matter how many evaluations you do, the results returned by the platform will in most cases be much better than hand-designed experiments purely based on researchers’ intuition.”

Users can also deploy several samples in parallel, further reducing the evaluation time, and take advantage of the platform’s synchronous and asynchronous batch evaluation capabilities. Asynchronous batch evaluations, in particular, are useful when multiple workers are running experiments with varying evaluation times.

Luković and Tian have also developed a team version of the platform that enables distributed collaboration around the globe, so that with a central database for storing information, a scientist can use one app for controlling the optimization, and a technician can use another app just for doing the evaluations, and multiple technician applications can be active at the same time. The technician app will report the results to the database after it finishes all of the evaluations, and the database will then synchronize the most updated data to the scientist app. The scientist can then launch another experiment.

Other features, such as supporting custom evaluation programs and providing an intuitive problem space and statistics visualizations, help to make the platform an important automation tool for the optimization of experiments with multiple objectives.

The CSAIL researchers plan to release the open-source GitHub repository for the project in the spring of 2021 with details and documentation for operations. They will also set up a website for illustrating the features, and welcome user feedback. In addition, they will add more state-of-the-art algorithms and algorithmic features to their software and support as many tools as possible that will help the user.

They also plan to test the software on more real industrial use cases, working with problems provided by the MachineLearningApplications@CSAIL Initiative companies to develop more features.

The Automated Optimal Experiment Design Platform is released as open source for both academic and commercial use.

The CSAIL researchers released the open-source GitHub repository for the project here. The details and documentation for operations can be found at https://autooed.readthedocs.io. They also set up a website for illustrating the features and welcome user feedback. In addition, they will add more state-of-the-art algorithms and algorithmic features to their software and support as many tools as possible that will help the user.