Protein design with RL from AI feedback

Overview

AlphaFold2 allows us to predict protein structure from sequence. Additionally, inverse models have also been developed, which predict the sequences of naturally occurring proteins from their structure. However, both of these models are limited in many protein engineering contexts, where we have some particular structural feature that we want to generate in an engineered protein.

A crucial problem here is that we don’t know exactly what protein structures are possible (i.e. for which structures there exists some sequence), and our structure-to-sequence models do not yet have the capability to find similar structures, for which sequences exist, to a target structure. As Po-Ssu Huang says, “creating novel backbones for which there exist a foldable sequence remains one of the greatest challenges in protein engineering”. Another aspect about this problem is that, aside from this target design feature, the rest of the protein structure is often irrelevant, and thus we would like a way to search for proteins based on a fuzzy structural search. One approach to this has been searching over a latent design space (e.g. from a VAE) to find proteins with the desired structural feature.

Here, I propose the approach of using an RL protein design agent along with ESMFold (we don’t actually use AlphaFold as ESMFold is much faster) to assemble proteins conditioned on loose structural constraints. In this process, the RL agent generates an amino acid sequence incrementally, receiving feedback on how well the structure (from ESMFold) of the current prototypical sequence meets the design criteria after each incremental generation step.

Experiment 1

I first trained the agent to apply a minimal number of edits to a starting sequence to maximize the ESMFold pLDDT score, which quantifies how well-defined the protein’s structure is. I only looked at short sequences of 20 residues. I used a 2-layer transformer, and joinly embedded the initial and starting sequence on a per-residue basis. After training for 16h on 8 4090s, the model increases the aggregate pLDDT score by 25% within an edit distance of 4.

Gallery

As seen in the gallery, the model’s strategy heavily favors adding Leucine (L) in particular locations.

I chose this as the first experiment because in any future applications involving specific desired structural features, we want the agent to produce sequences that ESMFold is confident about. Another way to say this is that we want to the agent to only use patterns and motifs that relate to those produced by evolution, and thus are present in ESMFold’s training set. So, we probably want to always have a pLDDT term in the reward no matter the specific use case.

More reflection on this approach

One can think of this as either (1) a sequence generator as a model-free agent in a simulated environment that is ESMFold or (2) a model-based RL agent where the policy is produced by the sequence generator and the model is ESMFold, and the objective is to produce some output sequence that meets the design criteria when it is actually synthesized in a lab. When thinking of this as (2), one can see a sort of capability bootstrapping, where we can increase intelligence related to proteins without having to collect any more measurements from the real world. This is along the same line of thinking as Yoshua Bengio’s idea of using a world model to develop a very large amortized inference machine.