Manolis Kellis is a Professor of Computer Science at MIT, a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL) and of the Broad Institute of MIT and Harvard, where he directs the MIT Computational Biology Group. His research spans computational biology, genomics, AI-driven drug discovery, and the development of Mantis, a cognitive cartography platform for interactive exploration of multimodal knowledge spaces across science and industry. He has received a number of awards including: the US Presidential Early Career Award in Science and Engineering (PECASE), the NIH Director’s Transformative Research Award, the Mendel Medal for Outstanding Achievements in Science, the NSF CAREER award, the Alfred P. Sloan Fellowship, the Karl Van Tassel chair in EECS, and the Boston Patent Law Association award. He has authored over 325 journal publications cited over 200,000 times. He obtained his PhD from MIT where he received the Sprowls award for the best doctorate thesis in computer science and the first Paris Kanellakis graduate fellowship. Prior to computational biology, Kellis worked on artificial intelligence, sketch and image recognition, robotics, and computational geometry at MIT and at the Xerox Palo Alto Research Center.
Industry Impact
• Multimodal representation learning and cognitive cartography. We develop Mantis, a platform that learns joint multimodal representations from diverse data — documents, molecules, patient records, financial instruments, patent landscapes, legal corpora, and medical records at scale — and projects them into navigable cognitive cartographies that make the learned structure accessible to both humans and AI. The core insight is that the map is not a visualization for humans but the reasoning context for the AI itself, enabling transparent, glass-box human-AI collaboration grounded in the geometry of the domain rather than in opaque language generation, and outperforming frontier language models by over 30% on retrieval, synthesis, and organization.
• Drug development and personalized therapeutics. We build multimodal foundation models that learn joint representations across proteins, small molecules, cellular states, and patient biology to enable geometric navigation of chemical-biological space. Our AffinityNet platform screens one million candidate compounds per thirty seconds at nanomolar resolution, and our precision neuroscience pipeline links patient genotypes to individualized drug response through patient-derived brain organoids, closing the loop between computational prediction and biological validation. Our modular approach targets specific disease pathways — neuroinflammation, lipid dysregulation, cholesterol transport — as reusable therapeutic building blocks that can be recombined across disorders and patients.
• Disease circuitry and precision medicine. We develop and apply single-cell genomics, epigenomics, and multimodal profiling methods to map the molecular circuitry of complex diseases — including Alzheimer’s, obesity, schizophrenia, cardiac disorders, cancer, and immune disorders — across millions of cells and hundreds of individuals, identifying cell-type-specific disease mechanisms, patient subtypes, and actionable therapeutic targets.
• AI deployment across industries. We apply the Mantis framework across multiple verticals: in real estate, encoding locations as positions in rich semantic space to reveal the geometric drivers of commercial success; in enterprise intelligence, mapping organizational knowledge into meaning-space where strategic gaps and opportunities become visible; and in scientific discovery, deploying multi-agent AI systems (SPHINX) that autonomously generate hypotheses, analyze data, and propose experiments across computational biology.
Research / Thesis Topics
Multimodal Representation Learning and Cognitive Cartography
To develop a new paradigm for human-AI collaboration grounded in the geometry of meaning. Mantis learns joint multimodal representations from diverse data modalities — natural language, structured data, molecules, networks, images, and conceptual gradients — and projects them into navigable cognitive cartographies where users interact with the latent structure of their data through spatial selection, clustering, annotation, and agent orchestration. The intelligence resides in the learned representations; the cartography makes that intelligence accessible to human spatial cognition. The platform provides the reasoning context for autonomous AI agents, making every insight traceable to specific data and reasoning paths. Mantis supports real-time self-extension, writing new code and capabilities on demand, and a marketplace of composable tools across panels, scrapers, and conceptual axes.
Drug Development and Computational Chemistry
To enable precision therapeutics through multimodal representation learning across chemical, protein, and biological spaces. This includes AffinityNet, a multimodal chemical foundation model that aligns small molecules, protein sequences, functional domains, and patent literature into a shared latent space for drug-target affinity prediction, molecular generation, and drug repurposing. Trained on binding data, AffinityNet achieves state-of-the-art generalization and enables scaffold-level molecular reasoning across over 11,000 post-Phase I drugs. Our modular approach targets specific disease pathways — neuroinflammation, lipid dysregulation, cholesterol transport — as reusable therapeutic building blocks that can be recombined across disorders and patients, combined with patient-derived brain organoid systems for functional validation and agentic AI pipelines (SPHINX) for autonomous therapeutic prioritization.
Disease Circuitry and Single-Cell Genomics
To understand the cellular and molecular basis of complex disease at single-cell resolution. This includes the development of Cell-Projected Phenotypes (CPP), a framework for mapping donor-level clinical variables onto
individual cells, revealing intra-individual heterogeneity in disease manifestation. Applied to 3.4 million cells from nearly 600 Alzheimer’s disease donors, CPP uncovers disease subtypes with distinct cognitive trajectories, identifies cell-type-specific axes of transcriptional dysregulation, and dramatically amplifies detection of disease-associated genes and metabolic alterations compared to traditional case-control analyses. Extends to the role of sex, hormone therapy, and environmental factors in modulating disease trajectories across cell types and brain regions.
AI Deployment Across Science and Industry
To apply joint multimodal representation learning and cognitive cartography across diverse knowledge domains. In real estate, we encode locations in semantic space integrating demographics, mobility, commercial ecology, and street connectivity, revealing analog locations across the globe and the deep drivers of success. In enterprise intelligence, we map organizational knowledge — contracts, decisions, competitive landscapes — into navigable spaces where gaps and opportunities emerge from the structure of the data. In scientific discovery, we deploy SPHINX, a multi-agent system that autonomously conducts literature synthesis, hypothesis generation, data analysis, in-silico experimentation, and wet-lab recommendation, integrated with Mantis for transparent agentic reasoning at scale.
Recent Works
Variation and Disease
To understand the effects of genetic variation on molecular phenotypes and human disease. This includes methods for integrating diverse functional genomic datasets of transcription, chromatin modifications, regulator binding, and the changes across multiple conditions to interpret genetic associations, identify causal variants, and predict the effects of genetic perturbations.
Genome Interpretation
To recognize the molecular basis of human biology and disease. This requires computational methods for genome interpretation which can systematically interpret the functional elements encoded in the 4-letter DNA code. Hence, methods have been developed for the comprehensive annotation of proteins, RNAs and regulatory control elements encoded in the human genome. Exploiting genome-wide comparative genomics datasets can help recognize specific patterns of evolutionary change, or ‘evolutionary signatures’, associated with each class of functional elements and dictate the specific constraints to each type of function.
Long non-coding RNAs
Many long transcripts in the human genome do not encode proteins and open up a whole new field for the study of long non-coding RNAs (lncRNAs). Two of the genomic signatures have facilitated the discovery and characterized the chromatin signatures associated with promoters and transcribe regions, and the evolutionary signatures associates with protein-coding selection. This enables the participation in numerous collaborations that seek the discovery, annotation, and functional characterization of long non-coding RNAs. The development of computational methods enables the study of structural properties of non-coding RNAs, based on evolutionary signatures, biophysical folding properties, and recent types of experimental evidence that distinguish paired vs. unpaired positions that constrain the folding algorithms.