Introduction to the Computational Biology Lab with Professor Manolis Kellis

 Manolis Kellis is a professor of Computer Science at MIT in the area of Computational Biology. He is a member of CSAIL and the Broad Institute of MIT and Harvard, and his research spans computational biology, genomics, epigenomics, gene regulation, and genome evolution. As the head of the Computational Biology Lab at CSAIL, he conducts research and oversees work in the following areas:

  • In genome interpretation, the Computational Biology Lab seeks to develop comparative genomics methods to identify genes and regulatory elements systematically in the human genome.
  • For gene regulation, they strive to understand the regulatory motifs involved in cell type specification during development, their combinatorial relationships, and how these establish expression domains in the developing embryo.
  • In epigenomics, they are trying to understand the chromatin signatures associated with distinct activity states, the changing chromatin states across different cell types and during differentiation, and the sequencing signals responsible for the establishment and maintenance of chromatin marks.
  • In the area of evolutionary genomics, their work aims to understand the dynamics of gene phylogenies across complete genes, the emergence of new gene functions by duplication and mutation, and the algorithmic principles behind phylogenomics.

Learn more about the Computational Biology Group’s work here.

Jackie (Jiekun) Yang | Analyzing Complex Diseases with Single Cell Technologies

Jackie (Jiekun) Yang is an assistant professor at Rutgers University's Department of Genetics and was a postdoc in the Computational Biology Lab at CSAIL. With a decade of experience in computational and systems biology, her research includes deep dives into the molecular pathways behind drug resistance in tumors, the cross-tissue effects of obesity and exercise, and the resilience against Alzheimer's in Down Syndrome using advanced single-cell, multi-tissue, and multi-omics techniques. Having developed novel CRISPR screening and GWAS variant analyses, she has pinpointed genes pivotal in cancer and obesity, with publications in journals like Cell Metabolism and Genome Biology.

Abstract: Exercise training is critical for the prevention and treatment of obesity, but its underlying mechanisms remain incompletely understood given the challenge of profiling heterogeneous effects across multiple tissues and cell types. Here, we address this challenge and opposing effects of exercise and high-fat diet (HFD)-induced obesity at single-cell resolution in subcutaneous and visceral white adipose tissue and skeletal muscle in mice with diet and exercise training interventions. We identify a prominent role of mesenchymal stem cells (MSCs) in obesity and exercise-induced tissue adaptation. Among the pathways regulated by exercise and HFD in MSCs across the three tissues, extracellular matrix remodeling and circadian rhythm are the most prominent. Inferred cell-cell interactions implicate within- and multi-tissue crosstalk centered around MSCs. Overall, our work reveals the intricacies and diversity of multi-tissue molecular responses to exercise and obesity and uncovers a previously underappreciated role of MSCs in tissue-specific and multi-tissue beneficial effects of exercise.

Learn More | Paper

Ben Lengerich | AI and Machine Learning for Healthcare

Ben Lengerich is an incoming Assistant Professor of Statistics at the University of Wisconsin-Madison in Fall 2024. Before that, he was a postdoctoral associate and Alana Fellow at MIT CSAIL and the Broad Institute of MIT and Harvard. Dr. Lengerich received his Ph.D. in Computer Science and M.S. in machine learning at Carnegie Mellon University. His research focuses on machine learning for healthcare, designing interpretable and context-adaptive models to dissect complex diseases and advance precision medicine. His work has been recognized with awards including “Rising Star in Data Science”, CMLH Fellowship, and spotlight presentations at conferences including NeurIPS, ISMB, AMIA and SMFM.

Abstract: Real-world evidence is confounded by treatments, so data-driven systems can learn to recapitulate biases that influenced treatment decisions. This confounding presents a challenge: uninterpretable black-box systems can put patients at risk by confusing treatment benefits with intrinsic risk, but also an opportunity: interpretable “glass-box” models can improve medical practice by highlighting unexpected patterns which suggest biases in medical practice. We propose a glass-box model that enables clinical experts to find unexpected changes in patient mortality risk. By applying this model to four datasets, we identify two characteristic types of biases: (1) discontinuities where sharp treatment thresholds produce step-function changes in risk near clinically-important round-number cutoffs, and (2) counter-causal paradoxes where aggressive treatment produces non-monotone risk curves that contradict underlying causal risk by lowering the risk of treated patients below that of healthier, but untreated, patients. While these effects are learned by all accurate models, they are only revealed by interpretable models. We show that because these effects are the result of clinical practice rather than statistical aberration, they are pervasive even in large, canonical datasets. Finally, we apply this method to uncover opportunities for improvements in clinical practice, including 8000 excess deaths per year in the US, where paradoxically, patients with moderately-elevated serum creatinine have higher mortality risk than patients with severely-elevated serum creatinine.

Learn More | Paper

Yosuke Tanigawa | Studying Disease Heterogeneity with Genetics Data

Yosuke Tanigawa is a postdoctoral associate at MIT CSAIL working with Prof. Manolis Kellis in the computational biology lab. He develops statistical and computational methods for precision medicine, focusing on the following areas:

  1. For therapeutic target discovery, he analyzes human genetics data from large-scale cohorts. For example, he led a study and nominated ANGPTL7 as an attractive therapeutic target for glaucoma, given that carriers of rare genetic variants in the gene have a ~34% risk reduction (learn more: press release and Editor’s choice).
  2. For disease heterogeneity dissection, he jointly analyzes multiple diseases and relevant phenotypes to nominate cellular, molecular, and genetic basis of interindividual differences in disease. He focuses on Alzheimer’s disease in an ongoing project, where he integrates multidimensional phenotypic data with single-cell RNA-seq profiling data of 1.9 million cells, nominating transcriptional hallmarks in Alzheimer’s disease (learn more: a recent conference abstract). 
  3. For Polygenic prediction of human disease and medically relevant traits, he leads methodology development and large-scale applications to realize genomics-informed precision medicine. Recently, he developed an inclusive polygenic score training approach and substantially improved predictive accuracy by analyzing individuals across the continuum of genetic ancestry (learn more: MIT News article and the paper)

Together, he aims to aid early detection and prevention of the disease and help tailor therapeutic intervention based on individuals’ genetic profiles. He has received many awards for his research, including the Charles J. Epstein Trainee Awards for Excellence in Human Genetics Research from the American Society of Human Genetics and MIT Technology Review’s Innovators Under 35 Japan. Previously, he received Ph.D. training in Biomedical Informatics at Stanford University under joint supervision by Prof. Manuel A. Rivas and Prof. Gill Bejerano and a B.S. in bioinformatics and systems biology at the University of Tokyo. 

Abstract: Admixed individuals offer unique opportunities for addressing limited transferability in polygenic scores (PGSs), given the substantial trans-ancestry genetic correlation in many complex traits. However, they are rarely considered in PGS training, given the challenges in representing ancestry-matched linkage-disequilibrium reference panels for admixed individuals. Here we present inclusive PGS (iPGS), which captures ancestry-shared genetic effects by finding the exact solution for penalized regression on individual-level data and is thus naturally applicable to admixed individuals. We validate our approach in a simulation study across 33 configurations with varying heritability, polygenicity, and ancestry composition in the training set. When iPGS is applied to n = 237,055 ancestry-diverse individuals in the UK Biobank, it shows the greatest improvements in Africans by 48.9% on average across 60 quantitative traits and up to 50-fold improvements for some traits (neutrophil count, R2 = 0.058) over the baseline model trained on the same number of European individuals. When we allowed iPGS to use n = 284,661 individuals, we observed an average improvement of 60.8% for African, 11.6% for South Asian, 7.3% for non-British White, 4.8% for White British, and 17.8% for the other individuals. We further developed iPGS+refit to jointly model the ancestry-shared and -dependent genetic effects when heterogeneous genetic associations were present. For neutrophil count, for example, iPGS+refit showed the highest predictive performance in the African group (R2 = 0.115), which exceeds the best predictive performance for the White British group (R2 = 0.090 in the iPGS model), even though only 1.49% of individuals used in the iPGS training are of African ancestry. Our results indicate the power of including diverse individuals for developing more equitable PGS models.

Learn More | Paper | Conference Abstract | Paper

Tianlong Chen | Foundation Models for Alzheimer's Disease Research

Tianlong Chen is an incoming Assistant Professor of Computer Science at The University of North Carolina at Chapel Hill in Fall 2024. Before that, he was a research scientist at MIT CSAIL and the Department of Biomedical Informatics at Harvard. Dr. Chen received his Ph.D. degree in Electrical and Computer Engineering at The University of Texas at Austin in 2023. His research focuses on building accurate, trustworthy, and efficient machine learning systems. Recently, one of his core research missions is to investigate the crucial role of foundation models in various computational biology applications. He received the IBM Ph.D. Fellowship, Adobe Ph.D. Fellowship, Graduate Dean's Prestigious Fellowship, AdvML Rising Star Award, and the Best Paper Award from the inaugural Learning on Graphs (LoG) Conference 2022. He has served as an area chair in ICIP'22-'24, and CPAL'23.

Abstract: Single-cell RNA sequencing technologies have facilitated complex characterizations of different cell types, enhancing our understanding of the mechanisms underlying disease onset. This has paved the way for exploring cellular differences, elucidating pathogenic mechanisms, and individualized treatment. The advent of sequencing technologies has diversified data modalities, extending our comprehension from genomics to spatial omics and proteomics, offering multimodal insights. These breakthroughs also introduce new research queries, such as perturbation predictions and multi-omics integration, making the development of methods capable of effectively utilizing multimodal data crucial.

Our plan involves leveraging the foundation model pre-training to address this challenge. Foundational models, a class of deep learning models pre-trained on large-scale, diverse datasets, can be easily fine-tuned for various downstream tasks. Intriguingly, these general pre-trained models consistently outperform task-specific models trained from scratch. Current single-cell studies exploring machine learning methods for multimodal data utilization are still under discovery, with many models specifically designed for distinct downstream tasks. To overcome this limitation, a foundation model pre-trained on large-scale multimodal data is required, which can understand the interactions between genes across different tissues and capture hidden information such as transcription factors contained within multi-omics sequencing data.

Learn More | Paper to come

Eloi Schmauch | Xenotransplantation

Eloi Schmauch hails from the Alsace region of France and commenced Medical School at the University of Strasbourg in 2016, driven by a passion for aiding patients and a profound scientific intrigue. Subsequently, he enrolled in the University MD-PhD program and Ecole de l’INSERM Liliane Bettencourt, augmenting his medical education with research lectures and internships. During a summer stint in 2019 at the Kellis lab at CSAIL, he delved into the realm of single-cell RNA sequencing, applying it to cancer research.
Transitioning to the Master’s program in computational biology at University Paris-Saclay, he returned to the Kellis lab for his master’s thesis in 2020. Collaborating with Prof. Suvi Linna-Kuosmanen, he employed transcriptomics to elucidate coronary artery disease and heart failure pathophysiology. His PhD focuses on applying computational biology to medical problems, particularly in transcriptomics studies for translational medicine. His overarching goal has been to integrate computational biology and medicine, specifically in transcriptomics studies for translational medicine. After defending his PhD in the upcoming summer, he plans to resume his clinical training in medical school.

Abstract: Recent advances in xenotransplantation in living and decedent humans using pig xenografts have laid promising groundwork towards future emergency use and first in human trials. Major obstacles remain though, including a lack of knowledge of the genetic incompatibilities between pig donors and human recipients which may led to harmful immune responses against the xenograft or dysregulation of normal physiology. In 2022 two pig heart xenografts were transplanted into two brain-dead human decedents with a minimized immunosuppression regime, primarily to evaluate onset of hyper-acute antibody mediated rejection and sustained xenograft function over 3 days.

Methods: We performed multi-omic profiling to assess the dynamic interactions between the pig and human genomes in the first two pig heart-xenografts transplants into human decedents. To assess global and specific biological changes that may correlate with immune-related outcomes and xenograft function, we generated transcriptomic, lipidomic, proteomic and metabolomics datasets, across blood and tissue samples collected every 6 hours over the 3-day procedures.

Results: Single-cell datasets in the 3-day pig xenograft-decedent models show dynamic immune activation processes. We observe specific scRNA-seq, snRNA-seq and geospatial transcriptomic changes of early immune-activation leading to pronounced downstream T-cell activity and hallmarks of early antibody mediated rejection (AbMR) and/or ischemia reperfusion injury (IRI) in the first xenograft recipient. Using longitudinal multiomic integrative analyses from blood in addition to antigen presentation pathway enrichment, we also observe in the first xeno-heart recipient significant cellular metabolism and liver damage pathway changes that correlate with profound physiological dysfunction whereas, these signals are not present in the other xenograft recipient.

Conclusions: Single-cell and multiomics approaches reveal fundamental insights into early molecular immune responses indicative of IRI and/or early AbMR in the first human decedent, which was not evident in the conventional histological evaluations.
Learn More | Paper

Eloi Schmauch | Rare Disease Research

Abstract: Pityriasis rubra pilaris (PRP) is a rare inflammatory skin disease which lacks efficacious standard-of-care treatments. Molecular studies of skin lesions revealed that IL-1β is central to the pathogenesis of PRP. Treatment of three patients with the IL-1-targeting biologics anakinra and canakinumab resulted in rapid clinical improvement and reversal of the PRP-associated molecular signature. We identified an NF-κB-mediated IL-1β-CCL20 axis central to the inflammatory response in PRP. Our results reveal the central role of IL-1β signaling in the pathogenesis of PRP and highlights its prominence as a therapeutic target.

Learn More | Paper