Generative Language Models to Design Protein Therapeutics

We develop new language models that generate peptides to bind and modulate disease-causing proteins.

Lead PI:

Pranam Chatterjee

Center Researchers:

Collaborating Partners

Cornell University
Sanford Burnham Prebys
Montreal AI Institute

Funding:

NIH/NIGMS, 2024-2029 Project Title: Towards the design of programmable, isoform-selective proteome editing system Grant 1R35GM155282-01
EndAxD, 2023-2025 Project Title: Design of a GFAP degradation platform for the treatment of Alexander’s Disease
NIH/NCI, 2023-2025 Project Title: Programmable peptide-guided protein degradation Grant 1R21-CA278468-01A1
CHDI Foundation, 2023-2025 Project Title: Design of mHTT and MSH3 Degraders with Language Model-Derived Peptide Guides
The Hartwell Foundation Project Title: Design of a TRIM8 degradation platform for the treatment of Ewing sarcoma

The Challenge

Proteins drive many diseases, but a significant number are undruggable with small molecules because they lack well-defined binding pockets. Many of these proteins—such as fusion oncoproteins, disordered proteins, and transient signaling hubs—do not adopt stable structures, making structure-based drug design ineffective. Even when proteins do have structured domains, post-translational modifications (PTMs) and mutations can drastically alter their function in ways that traditional approaches cannot predict.

Beyond proteins, designing peptides that selectively bind metal ions is another challenge, as metal coordination depends on sequence context rather than a fixed structural template. Without a way to design binders directly from sequence, these targets remain inaccessible to conventional drug discovery.

Our Solution

We solve this problem by leveraging protein language models that operate directly on sequence rather than structure. Our generative models—PepPrCLIP, PepMLM, SaLT&PepPr, PepTune, moPPIt, muPPIt, and Metalorian—design peptides that selectively bind to disease-causing proteins or metal ions, enabling targeted degradation, stabilization, and precise modulation of PTMs.

To improve peptide specificity, we use advanced sequence representation models like PTM-Mamba and FusOn-pLM, which capture the effects of PTMs and fusion events on protein function.

By working at the sequence level, our models generate high-affinity binders for previously undruggable targets, expanding the possibilities for therapeutic intervention.