In recent years, models capable of forecasting the structure or function of proteins have been extensively utilized for a multitude of biological applications, such as pinpointing drug targets and crafting novel therapeutic antibodies.
These models, grounded in large language models (LLMs), are adept at making highly precise predictions regarding a protein’s appropriateness for a specific application. Nonetheless, there is currently no method to ascertain how these models generate their predictions or which protein traits hold the greatest significance in those determinations.
In an innovative investigation, researchers from MIT employed a groundbreaking technique to unveil that “black box” and ascertain what characteristics a protein language model considers when rendering predictions. Grasping the dynamics within that black box could aid researchers in selecting superior models for particular tasks, thereby simplifying the process of discovering new drugs or vaccine targets.
“Our research has extensive implications for improved explainability in downstream tasks reliant on these representations,” states Bonnie Berger, the Simons Professor of Mathematics, head of the Computation and Biology group at MIT’s Computer Science and Artificial Intelligence Laboratory, and the lead author of the paper. “Moreover, pinpointing features that protein language models monitor has the potential to uncover novel biological insights from these representations.”
Onkar Gujral, an MIT graduate student, serves as the principal author of the study, which is featured this week in the Proceedings of the National Academy of Sciences. Also contributing to the paper were Mihir Bafna, an MIT graduate student, and Eric Alm, a professor of biological engineering at MIT.
Unveiling the black box
In 2018, Berger and former MIT graduate student Tristan Bepler PhD ’20 debuted the first protein language model. Their model, akin to subsequent protein models that facilitated the evolution of AlphaFold, like ESM2 and OmegaFold, was anchored in LLMs. These models, including ChatGPT, are capable of processing vast quantities of text and discerning which words are most likely to co-occur.
Protein language models apply a similar methodology, but rather than analyzing words, they scrutinize amino acid sequences. Researchers have utilized these models to forecast the structure and function of proteins, as well as for applications like identifying proteins that could interact with specific drugs.
In a 2021 investigation, Berger and colleagues employed a protein language model to predict which segments of viral surface proteins are less likely to undergo mutations that facilitate viral escape. This enabled them to identify potential targets for vaccines against influenza, HIV, and SARS-CoV-2.
However, throughout all of these studies, it has been unfeasible to ascertain how the models formulated their predictions.
“We would generate some prediction at the end, but we had no clarity about the operations occurring within the individual elements of this black box,” Berger notes.
In the latest study, the researchers sought to explore how protein language models arrive at their predictions. Much like LLMs, protein language models encode information as representations that comprise a pattern of activation among various “nodes” in a neural network. These nodes resemble the networks of neurons that store memories and other information in the brain.
The intricate workings of LLMs are challenging to interpret; however, in recent years, researchers have begun employing a type of algorithm known as a sparse autoencoder to illuminate how these models craft their predictions. The new research from Berger’s lab is the first to apply this algorithm to protein language models.
Sparse autoencoders function by modifying how a protein is represented within a neural framework. Typically, a singular protein will be depicted by an activation pattern involving a limited number of neurons, for instance, 480. A sparse autoencoder expands this representation into a significantly larger set of nodes, such as 20,000.
When a protein’s information is encapsulated by merely 480 neurons, each node activates for multiple features, complicating the understanding of what features each node encodes. In contrast, when the neural network is expanded to 20,000 nodes, this increased space coupled with a sparsity constraint allows the information to “spread out.” Consequently, a feature of the protein that was encoded by several nodes previously can now occupy a single node.
“In a sparse representation, the activated neurons perform in a more meaningful manner,” Gujral explains. “Prior to the creation of sparse representations, the networks compress information so densely that it becomes difficult to interpret the neurons.”
Interpretable models
Once the researchers derived sparse representations of numerous proteins, they employed an AI assistant named Claude (connected to the well-known Anthropic chatbot of the same name) to evaluate the representations. In this instance, they instructed Claude to juxtapose the sparse representations with the known characteristics of each protein, such as molecular function, protein family, or cellular location.
By dissecting thousands of representations, Claude can identify which nodes correspond to specific protein features and then articulate them in straightforward language. For instance, the algorithm might state, “This neuron seems to detect proteins involved in transmembrane transport of ions or amino acids, particularly those situated in the plasma membrane.”
This procedure renders the nodes notably more “interpretable,” allowing the researchers to discern what each node encodes. They discovered that the features most frequently encoded by these nodes were protein family and several functions, encompassing various metabolic and biosynthetic processes.
“When one trains a sparse autoencoder, the goal isn’t to enhance interpretability, but it turns out that by encouraging the representation to be extremely sparse, that leads to interpretability,” Gujral remarks.
Grasping what features a specific protein model encodes could assist researchers in selecting the ideal model for a given task or adjusting the types of input provided to the model to yield optimal results. In addition, analyzing the features encoded by a model might one day enable biologists to uncover new insights about the proteins under investigation.
“At some juncture, as the models evolve and become significantly more powerful, you could glean more biology than currently understood by unveiling the models,” Gujral states.
The research received support from the National Institutes of Health.