<iframe src="https://www.googletagmanager.com/ns.html?id=GTM-KCV32QR" height="0" width="0" style="display:none;visibility:hidden">

Equilibrium fluctuations of a single folded protein reveal a multitude of potential cryptic allosteric sites

Edited by Michael L. Klein, Temple University, Philadelphia, PA, and approved June 6, 2012 (received for review June 1, 2012)
July 2, 2012
109 (29) 11681-11686

Abstract

Cryptic allosteric sites—transient pockets in a folded protein that are invisible to conventional experiments but can alter enzymatic activity via allosteric communication with the active site—are a promising opportunity for facilitating drug design by greatly expanding the repertoire of available drug targets. Unfortunately, identifying these sites is difficult, typically requiring resource-intensive screening of large libraries of small molecules. Here, we demonstrate that Markov state models built from extensive computer simulations (totaling hundreds of microseconds of dynamics) can identify prospective cryptic sites from the equilibrium fluctuations of three medically relevant proteins—β-lactamase, interleukin-2, and RNase H—even in the absence of any ligand. As in previous studies, our methods reveal a surprising variety of conformations—including bound-like configurations—that implies a role for conformational selection in ligand binding. Moreover, our analyses lead to a number of unique insights. First, direct comparison of simulations with and without the ligand reveals that there is still an important role for an induced fit during ligand binding to cryptic sites and suggests new conformations for docking. Second, correlations between amino acid sidechains can convey allosteric signals even in the absence of substantial backbone motions. Most importantly, our extensive sampling reveals a multitude of potential cryptic sites—consisting of transient pockets coupled to the active site—even in a single protein. Based on these observations, we propose that cryptic allosteric sites may be even more ubiquitous than previously thought and that our methods should be a valuable means of guiding the search for such sites.
The degree to which standard, rational drug design protocols ignore protein-conformational changes is a potentially major flaw. Most assume that proteins are frozen in their crystallographic structures because we do not understand the intrinsic dynamics of proteins well enough to include them. Once this assumption is made, the only way to manipulate a protein’s activity is with inhibitors that bind more tightly to the protein’s active site than the molecule’s natural ligand. Unfortunately, only approximately 15% of proteins have a sufficiently deep pocket coinciding with their active site to make this strategy feasible (1).
Accounting for conformational heterogeneity could greatly expand the repertoire of available drug targets by revealing cryptic sites—transient pockets that are not readily visible in experimental structures, yet can inhibit enzymatic activity via allostery or block protein–protein interactions (25). For example, they may provide previously undescribed druggable pockets on proteins that are already considered viable targets, or even make it possible to target proteins that are currently considered undruggable. Targeting these sites may be easier than targeting the active site if there is no natural ligand to outcompete. Finally, cryptic sites open up the possibility of enhancing desirable activity (6).
As their name implies, identifying cryptic sites with conventional experimental techniques is a challenging task. Cryptic sites are not visible in ligand-free structures of proteins and often have only been discovered serendipitously when crystal structures of ligand-bound proteins reveal that the small molecule binds in a cryptic pocket. Based on this pattern, many have drawn the conclusion that cryptic ligands operate via an induced-fit mechanism—the ligand triggers conformational changes that open the pocket the molecule ultimately binds to. Regardless of the binding mechanism, a site-directed screening method called tethering (7) can, in principle, be used to discover cryptic sites and ligands. However, even tethering can benefit greatly from foreknowledge of the structures and locations of cryptic sites to avoid the need to exhaustively enumerate all possible pocket locations and ligands. Therefore, computational methods could prove valuable for finding new cryptic sites and guiding experiments to verify their existence.
We hypothesized that regions where transient pockets coincide with structural elements coupled to the active site are likely to be viable cryptic sites and developed computational methods for detecting these signature fluctuations. This approach was inspired by the fluctuation-dissipation theorem and an ensemble view of allostery, (8, 9) which suggest that allosteric sites must be detectable in a protein’s natural fluctuations at equilibrium. Our methods draw on kinetic network models—called Markov state models (MSMs)—built from extensive molecular-dynamics simulations to describe a protein’s intrinsic dynamics (10). Like a map of a molecule’s free energy landscape, an MSM provides a reduced view of the ensemble of spontaneous fluctuations the molecule undergoes at equilibrium. These models capture both thermodynamic and kinetic properties of the system being considered. Therefore, we can identify transient pockets, calculate how often they are open, and look for structural couplings between different regions of a protein with MSMs. Our approach builds on a number of existing methods for detecting cryptic pockets and allosteric sites (1122). Important advances made here include (i) capturing events on much longer timescales (i.e., microseconds and beyond) and (ii) accounting for the requirement that residues surrounding a binding pocket are significantly (though perhaps indirectly) coupled to the active site. These methods lead to a number of unique insights into the ubiquity and mechanism of cryptic allosteric sites that are the focus of this work.
We apply these methods to three systems: TEM-1 β-lactamase, interleukin-2 (IL-2), and RNase H. We primarily focus on β-lactamase, an enzyme capable of conferring antibiotic resistance by hydrolyzing β-lactam antibiotics. Besides its relevance to antibiotic resistance, this target was chosen because it has one known cryptic site to validate our approach on. This site is invisible in the apo crystal structure (Fig. 1A) (23) and was only discovered when a small molecule thought to bind the active site turned out to bind between helices 11 and 12 (Fig. 1B) (2). We also analyze IL-2 and RNase H to assess the generality of our results from β-lactamase. These proteins have completely different folds from β-lactamase and are also medically relevant targets.
Fig. 1.
Crystal structures of TEM-1 β-lactamase (A) in the absence of ligand and (B) in the presence of an allosteric inhibitor that reveals a cryptic binding site between helices 11 and 12. The backbone is colored from blue to red starting at the N terminus, a key active site residue (Ser70) is highlighted in green, and the allosteric inhibitor is shown in cyan (with helix 12 to its left and helix 11 to its right).
Using our methods, we address questions like: Are known cryptic sites detectable as transient pockets in the absence of ligands? How do small molecules bind cryptic sites—via induced fit or conformational selection? How is this information transmitted to the active site? How common are cryptic sites?

Results and Discussion

To better understand cryptic allosteric sites, we constructed MSMs of the conformational space each protein explores. MSMs serve as maps of the free energy landscapes that ultimately control a protein’s structure and dynamics (10, 24). They have been successfully employed to understand processes like protein folding (25, 26), ligand binding (2729), and functional dynamics (3032). One powerful feature of these models is that they can combine many simulations to capture processes occurring on much longer timescales than a single simulation could ever address. For example, they have previously been used to capture protein folding on 10 millisecond timescales based on thousands of microsecond timescale simulations run on commodity hardware (33). Moreover, one could capture even longer timescales with MSMs by simply running a larger number of simulations in parallel. This is extremely attractive given that capturing such timescales in computer simulations is an enormously difficult task.
Our models for β-lactamase reveal atomistic details of functional dynamics within the native state on timescales as slow as a millisecond (SI Text). These models are built from hundreds of atomistic molecular dynamics simulations with explicit solvent, each up to 500 ns in length, for an aggregate of over 100 μs of simulation. As discussed shortly, this extensive sampling reveals important dynamic processes that could not have been identified with the much smaller data sets often used to infer structure and function from simulation. Kinetic clustering of this data with MSMBuilder (34, 35) is used to create a partitioning of conformational space with states akin to minima in the free energy landscape and the probabilities of transitioning between them in a 2 ns time interval (Fig. S1 and Fig. S2). Using these models, we survey the variety of structures β-lactamase visits, their equilibrium probabilities, and the transition times between pairs of states. We also assess how ligands modulate the protein’s free energy landscape by comparing models in the absence or presence of the small molecule. Finally, we calculate experimental observables to validate our models and make predictions. Similar analyses are performed for IL-2 and RNase H. More details are given in SI Text.

β-Lactamase’s known Cryptic Site Opens Transiently during the Protein’s Natural Fluctuations, but there is Still a Role for Induced Fit in Ligand Binding.

It has previously been proposed that ligands bind cryptic sites via an induced-fit mechanism because these sites are invisible in ligand-free crystal structures. If this were the case, then observing cryptic sites in the absence of ligands would be impossible and designing drugs to target them would require extensive screening. However, the fluctuation–dissipation theorem—which relates fluctuations at equilibrium to the system’s response to a perturbation—mandates that cryptic sites must open in the course of equilibrium dynamics, if only transiently, even in the absence of a ligand. Identifying such transient pockets could provide a starting point for rational drug design.
By applying a pocket-detection algorithm to each of the approximately 5,000 states in our MSM for ligand-free β-lactamase, we assess how frequently the known cryptic site opens in solution. Specifically, we have employed LIGSITE (36) to identify pockets within representative structures from each state of our MSM. Discarding pockets visible in the apo crystal structure allows us to identify transient pockets that could potentially serve as cryptic allosteric sites. Similar methods have previously been used to search for small, transient pockets (13). Our MSM enhances the power of such analyses by allowing us to quickly quantify properties like the probability of a pocket being open and the timescale for opening (SI Text). For example, the probability that a pocket is open is simply the sum of the equilibrium populations of all the states in which it is open. Moreover, we scale to much larger data sets and capture motions on significantly longer timescales (microseconds and longer compared to nanoseconds). This MSM approach does not remedy systematic errors in the force field used to generate trajectories, but its applicability extends to any model potential.
This analysis reveals that β-lactamase’s known cryptic site does indeed open transiently in solution (Fig. 2). This pocket is at least partially open 53% of the time, making it the most accessible transient pocket in β-lactamase. The site is not visible in apo crystal structures because the closed conformation is also quite common and is likely further stabilized by crystal packing.
Fig. 2.
The three most frequently open pockets (yellow spheres), two of which coincide with the known allosteric ligand (cyan). Each sphere represents a pocket with a radius of up to 5 Å (SI Text). The spheres are overlaid on the apo crystal structure with the backbone colored from blue to red starting at the N terminus and a key active site residue (Ser70) highlighted in green.
The observation of a transient pocket coinciding with the known cryptic site suggests a conformational selection (or population shift) binding mechanism, but there could still be a role for induced fit. As imagined in the conformational selection model, β-lactamase samples both unbound and bound conformations even in the absence of ligand. Ligands are thus able to bind to pre-existing bound conformations, causing their population to increase relative to unbound conformations. This observation, however, does not preclude a role for induced fit. A number of studies have now demonstrated an interplay between conformational selection and induced fit(27, 3741), supporting an extended model for binding that combines the two traditional mechanisms (42).
To assess the roles of conformational selection and induced fit, we built an additional MSM for the ligand-binding process. First, we ran simulations with one of the known cryptic ligands present [N,N-bis(4-chlorobenzyl)-1H-1,2,3,4-tetraazol-5-amine, also called CBT]. We then evaluated rate constants for transitions among the same set of states determined for the ligand-free protein. Reusing the state space allows a quantitative comparison of the thermodynamics and kinetics of the two systems.
Visual inspection of binding pathways alone is insufficient to distinguish between conformational selection and induced fit. For example, Fig. 3 shows the highest flux binding pathway from a particular unbound state calculated following refs. (26) and (43). In this pathway, the cryptic ligand binds against the cryptic site, then the cryptic site opens more widely than in the holo structure while the ligand inserts into the cryptic site, and finally the protein closes around the ligand. Since the ligand binds against the protein before the protein opens, it is unclear from this depiction alone whether the ligand caused the protein to open or had to wait for the cryptic site to appear spontaneously.
Fig. 3.
The highest flux binding pathway, depicted as a series of configurations exemplifying intermediate Markov states. The crystallographically determined structure of helix 11 in the holo state is superimposed in yellow, emphasizing movement in this part of the protein during binding. One interesting feature of the pathway is that the two helices surrounding the cryptic site open 2–3 Å more widely than in the holo structure (particularly in states D and E) and then close around the ligand. The backbone is colored from blue to red starting at the N terminus, and the allosteric inhibitor is shown in cyan. The number associated with each structure quantifies progress along the binding pathway. Specifically, it indicates the probability that a trajectory initiated in the corresponding state reaches the bound state (F) before first reaching the unbound state (A).
Quantitative comparison of β-lactamase’s dynamics in the absence and presence of the cryptic ligand suggests that induced fit plays an important role in ligand binding. If binding were entirely due to conformational selection, then one would expect the binding rate to be no faster than the opening rate of the cryptic site in the absence of ligands. However, we find that the cryptic site opens more rapidly in the presence of the allosteric ligand. For example, on average, it takes 16 μs to transition from the apo to the holo conformation with the ligand compared to 26 μs without the ligand. Therefore, we conclude the ligand modulates the free energy landscape in a manner that promotes its own binding by stabilizing protein conformations along the binding pathways. This finding may not surprise many experts; however, our method’s ability to reveal details of the binding mechanism (like this interplay) is an important result that may prove useful for drug design in the future. For example, the highest flux binding pathway shows that the cryptic site opens more broadly than in the bound conformation and then closes around the ligand. This broad opening could be important for providing sufficient space for the ligand to bind, in which case docking against these highly open conformations may be more fruitful for discovering new ligands than docking against bound-like conformations. Furthermore, despite an important role for induced fit, considering conformational selection alone still allows potential cryptic sites to be identified.

Communities of Coupled Sidechains Link β-Lactamase’s Active and Cryptic Sites.

Understanding how cryptic sites work—and ultimately predicting new ones—also requires determining how they are coupled to the active site. The ability to detect transient pockets that coincide with the known cryptic site suggests we can automatically identify such sites even when they are invisible in experimental structures. However, binding of a small molecule to a transient pocket could have no effect on activity if there is no coupling between the cryptic and active sites. Therefore, it is also important to identify regions of the protein that are somehow coupled to the active site in addition to containing transient pockets.
The allosterically inhibited structure of β-lactamase suggests that the active and cryptic sites are not coupled through the protein’s backbone, though there is evidence other systems may behave differently (44). First, the Cα rmsd between the apo and holo structures is only 0.96 Å and the Cα rmsd between active site residues [defined as Ser70 to Lys73, Ser130 to Asn132, Glu166 to Asn170, Lys234 to Ala237, and Arg244] (45) is 0.68 Å. Our simulation results also show little variability in the backbone structure. The Cα rmsd of active site residues compared to the apo structure is 0.7 ± 0.3  in ligand-free simulations and 0.9 ± 0.2  when the cryptic ligand is present, so the ligand’s effect on the backbone structure is statistically insignificant. This conclusion is also consistent with recent studies of cytochrome P-450, which show that there is little coupling between the dynamics of the active site backbone and the rest of the protein (46).
Despite the fact that the backbone is relatively static, recent studies have suggested that there can be significant heterogeneity in sidechain rotameric states even in the context of a fixed backbone structure (47, 48) and that couplings between these rotameric states can allow long-range communication (12, 15, 49). Indeed, our simulations reveal a great deal of heterogeneity in sidechain rotameric states (Fig. S3). Communication via coupling between sidechain rotameric states would also be consistent with the fact that one of the most significant differences between the apo and holo structures of β-lactamase is that the sidechain of a key active site residue, Arg244, becomes disordered in the holo structure.
To explore the possibility that coupled sidechains are responsible for allosteric communication in β-lactamase, we used spectral clustering based on the mutual information between the rotameric states (χ1 dihedral angles) of pairs of amino acids to identify communities of coupled residues. The mutual information—defined in Eq. 1 of Materials and Methods—is a statistical measure of the interdependence between two random variables that has been used successfully in previous studies of allostery (12, 15). Examining the mutual information alone reveals extensive coupling between both nearby and distant residues (Fig. S4). Nearly equivalent results are obtained with the excess mutual information, which accounts for the spurious correlation expected from finite sampling (SI Text). Spectral clustering was chosen over other clustering methods as it provides a natural means of determining an appropriate number of clusters to construct based on the eigenvalue spectrum of the similarity matrix being clustered (SI Text and Fig. S5). Based on the gap between the fifth and sixth eigenvalues in this spectrum, we chose to construct five clusters.
Applying this procedure to our model for ligand-free β-lactamase reveals a potential mechanism for allosteric communication: A community of coupled residues encompassing the allosteric site and a substantial portion of the active site that may be altered upon ligand binding (Fig. 4). Specifically, one of our clusters contains all residues within 3 Å of the cryptic ligand in the holo structure and 7 out of the 15 residues in the active site with distinguishable rotameric states. Arg244, the active site residue that displays the greatest change between the apo and holo structures, is one of the active site residues in this community. Therefore, we suggest that the known cryptic site in β-lactamase is opening and closing in solution but that this has little to no effect on the protein’s activity. Once a ligand binds this site, however, it alters the rotameric states of neighboring residues that are part of a cooperative community. This change is quickly propagated to other members of this cooperative community, including Arg244 and other active site residues. Thus, binding at the allosteric site alters the active site structure and, ultimately, inhibits enzymatic activity.
Fig. 4.
A structure highlighting the community of coupled residues encompassing the known cryptic allosteric site. Side chains in this community are shown as sticks and are colored green if they are in the active site, cyan if they are in the allosteric site (i.e., within 3 Å of the cryptic ligand in the holo structure), and blue otherwise. The backbone is colored from blue to red starting at the N terminus.
Coupling between rotameric states can communicate information over large distances, as seen in studies of coupling between local folding and unfolding events (9, 50) and a simple model for sidechain variability in proteins (15). For example, Fig. 4 shows that the residues in the community encompassing the known cryptic site span a large portion of β-lactamase. While many of the residues form contiguous groups—as seen in other systems (17)—there are large structural separations among others. For these discontinuous groups of residues, communication is likely achieved through an ensemble of pathways, none of which have strong enough pairwise correlations to appear in this analysis. A simple model that exhibits similar behavior despite only including local interactions supports this conclusion (15). Long-range interactions like electrostatics that are included in our model and energetic correlations that do not require a pathway may also contribute.
Our model for β-lactamase with its cryptic ligand further supports the mechanism inferred from fluctuations of the apo protein. First, the small molecule does alter the rotameric states of nearby residues (Fig. S6). The pattern of coupling between residues is also mostly conserved, demonstrating that our simulations have converged to local equilibrium within the native state. The Pearson correlation coefficient between mutual information values obtained from our models with and without the ligand is 0.55, showing that the degree of coupling differs slightly but that the catalog of strongly coupled residues is mostly conserved. Finally, this coupling alters the rotameric states of a subset of active site residues (Fig. S6).

A Single Protein Contains a Multitude of Prospective Cryptic Allosteric Sites.

An important next step is to investigate how common cryptic allosteric sites are. Typically, allosteric sites—and especially the cryptic variety—are assumed to be rare. However, the serendipitous discovery of the known cryptic site investigated here (2) and a variety of others (5154) suggest that they may actually be quite common. Previous computational work focusing on transient pockets has even hinted that a single protein could conceivably contain multiple cryptic sites (13, 20), but this argument would be significantly strengthened by demonstrating that there is also coupling to the active site since both elements are needed for a cryptic allosteric site. Finally, the mechanisms of many drugs remain unknown, so elucidating their modes of action may reveal yet more cryptic allosteric sites.
We first investigated whether β-lactamase could contain more than one cryptic site. We can rationalize the accidental discovery of the known cryptic site in terms of it being the most common transient pocket and the surrounding residues having some of the greatest coupling to the active site. Other pockets may be open less frequently and have weaker coupling to the active site, yet still serve as viable cryptic sites. Our success with identifying and understanding the known cryptic site in β-lactamase suggests that the methods we present here can predict whether there are additional sites.
Indeed, we have identified a number of other transient pockets in β-lactamase that may be viable drug targets. Some of these pockets may be detectable with techniques like NMR, which has revealed partially closed conformations with equilibrium populations as low as approximately 5% (55). Based on this precedent, we can estimate that pockets in β-lactamase that are open more than 5% of the time at equilibrium may be visible with techniques like NMR. With this assumption, there are numerous pockets that are open sufficiently often to be detectable (Fig. S7). For example, the 50 pockets that are most likely to be open are all accessible more than 10% of the time. They are also distributed across the protein (Fig. 5A), and therefore, there are numerous distinct sites. While harder to detect, many of the pockets that are open less often may still be viable drug targets. For example, a pocket that is open 1% of the time is only about 1 kcal/mol less stable than one that is open 10% of the time. This difference could easily be overcome by binding a sufficiently high-affinity ligand.
Fig. 5.
A multitude of potential cryptic allosteric sites in (A) β-lactamase, (B) IL-2, and (C) RNase H. The 50 most frequently open pockets are shown as yellow spheres, each representing a pocket with a radius of up to 5 Å (SI Text). Sidechains within each coupled community are rendered in the same color. They are depicted as spheres if they are in the active site and as sticks otherwise. There are five communities for β-lactamase, six for IL-2, and four for RNase H. Almost every community contains at least one active site residue in each protein, so the vast majority of transient pockets could serve as cryptic sites. The light-blue, dashed circles encompass the known allosteric sites in β-lactamase and IL-2 and the backbone of each protein is colored from blue to red starting at the N terminus.
Almost all of these transient pockets could exert allosteric control over activity. Four out of five of the communities of coupled residues identified here contain portions of the active site. Therefore, a small molecule that targeted any of these communities could alter the conformational ensemble in a manner that affects function. Together, these communities encompass 212 out of the 263 residues in β-lactamase, so it is not surprising that almost every transient pocket is adjacent to a residue in at least one of these communities. Interestingly, 28 of the 50 most frequently open pockets could exert allosteric control through the same community that encompasses the known cryptic site. These pockets may be particularly attractive targets because we already know this community can exert allosteric control over the protein’s activity. Many of the pockets may also exert allosteric control to varying extents through multiple communities due to interactions with multiple clusters, as well as interactions among the communities. In fact, 48 out of the 50 most accessible pockets are adjacent to residues from multiple communities and, therefore, are particularly likely to exert allosteric control through multiple routes. Other transient pockets actually coincide with the active site, so they could be valuable for the design of competitive inhibitors that extend into these pockets to increase the drug’s affinity.
Based on these observations, a reasonable starting point for experimentally screening for cryptic sites is to target the transient pockets that are open most often and are adjacent to communities of residues with coupling to the active site. These sites should be the most accessible for ligand binding and, therefore, the most promising for rational drug design. Furthermore, the frequency with which these pockets are open in our simulations lends confidence to our model-based conclusions. Besides the known cryptic site, some of the best candidates in β-lactamase are found between Gly238 and Met272, between Gly245 and Ile279, between Ala42 and Phe66, and between Leu81 and Ala202. However, we note that our results also demonstrate an important role for induced fit, so starting with the pockets that are open most often may not be the optimal search strategy, as it assumes conformational selection is dominant. Feedback from experiments could allow us to develop a more efficient search strategy in this case.
To test the generality of our β-lactamase results, we applied our methods to two other proteins with different folds: IL-2 and RNase H. IL-2 was chosen because it plays an important role in regulating immune responses. Its relevance to human health has led to many experimental and theoretical investigations, including the discovery and characterization of a cryptic allosteric site coupled to a competitive site at IL-2’s binding interface for its α-receptor (IL-2Rα). RNase H is a similarly well-studied system because an RNase H domain in HIV reverse transcriptase plays a crucial role in HIV infection. There are no known drugs targeting this domain, so the discovery of cryptic sites could render it druggable for the first time. Both proteins are also known to have conformational heterogeneity in their native states (56, 57) that could result in cryptic allosteric sites.
Our results for IL-2 are consistent with those from β-lactamase. First, there are transient pockets coinciding with the known allosteric site (Fig. 5B), amongst others. Moreover, these pockets are connected by a community of coupled residues (magenta sidechains in Fig. 5B). Coupling between these sites was also observed in a previous simulation study (12). One important addition made in this work is that our more extensive sampling allows us to find many more residues with statistically significant coupling to the competitive site—which we will treat as the active site for this system. Residues from the blue and cyan communities are also adjacent to the competitive site. Together, these communities encompass the vast majority of the protein (98 of 128 residues). Therefore, as in β-lactamase, there are many transient pockets that could serve as cryptic sites exerting allosteric control over the competitive site. Some of the best candidate cryptic allosteric sites lie roughly between Leu53 and Lys97, Met23 and Leu85, and Lys48 and Ile89.
RNase H also contains a multitude of potential cryptic sites. First, there are numerous transient pockets spread across the surface of the protein (Fig. 5C). Moreover, there are two large communities of coupled residues (blue and green residues in Fig. 5C) that contain portions of the active site. Together, these communities encompass 90 out of 155 residues, a remarkably large portion of the protein given that RNase H contains 32 Gly and Ala residues that cannot participate in sidechain coupling because they have indistinguishable rotameric states. Again, many of the transient pockets are adjacent to at least one of these communities and, therefore, could serve as cryptic allosteric sites. Some of the most promising candidates lie roughly between His62 and Gln113, Tyr28 and Leu59, and Lys86 and Leu111.

Conclusions

The primary result of this work is that single proteins contain a multitude of potential cryptic allosteric sites that could greatly expand the repertoire of available drug targets. This insight was made possible by (i) recognizing that cryptic allosteric sites require both the formation of a transient binding pocket and coupling to the active site and (ii) developing methods to detect these elements in the equilibrium fluctuations of a folded protein even in the absence of any ligand. Our methods achieve this by combining MSMs built from extensive molecular dynamics simulations with quantitative statistical measures like the mutual information. We first validated this approach on a known cryptic site in β-lactamase and then demonstrated that this single protein contains numerous other potential cryptic allosteric sites. Application of these methods to two other systems—IL-2 and RNase H—supports the generality of these conclusions. In addition to discovering that cryptic allosteric sites may be even more ubiquitous than previously thought, we also found that ligand binding to these sites occurs via a combination of conformational selection and induced fit. In particular, ligands can bind to highly open proteins and then stabilize more closed, bound conformations. Therefore, it may be fruitful to dock against these more open conformations. In addition, allosteric signals can be transmitted over large distances via coupling between amino acid sidechains even in the absence of substantial backbone motions. Based on these results, our approach should be a powerful means of providing further understanding of cryptic allosteric sites and guiding experimental efforts to identify new sites.

Materials and Methods

Molecular dynamics simulations were run at 300 K with the GROMACS software package (58) deployed on the Folding@home distributed computing environment (59). Atomic interactions were described with the Amber03 force field (60) and the TIP3P explicit solvent model. MSMBuilder was used to construct MSMs from this data (34, 35). LIGSITE (36) was used to identify pockets in representative structures for each state and the mutual information was used to quantify correlations between the rotameric states of pairs of residues. The mutual information (MI) between two residues is
[1]
where p(x,y) is the joint probability distribution function of residues X and Y, and p(x) is the marginal probability distribution function of residue X. For our particular application, X denotes one potential rotameric state of residue X. Protein structures were visualized with PyMOL (61). More details are available in SI Text.

ACKNOWLEDGMENTS.

Thanks to Veena Thomas for helpful insights into β-lactamase. G.R.B. was funded by the Miller Institute. Computing resources were provided by users of the Folding@home distributed computing environment and National Institutes of Health Grant R01-GM062868, courtesy of Vijay Pande.

Supporting Information

Supporting Information (PDF)
Supporting Information

References

1
AL Hopkins, CR Groom, The druggable genome. Nat Rev Drug Discov 1, 727–730 (2002).
2
JR Horn, BK Shoichet, Allosteric inhibition through core disruption. J Mol Biol 336, 1283–1291 (2004).
3
JA Hardy, JA Wells, Searching for new allosteric sites in enzymes. Curr Opin Struct Biol 14, 706–715 (2004).
4
DF Ceccarelli, et al., An allosteric inhibitor of the human Cdc34 ubiquitin-conjugating enzyme. Cell 145, 1075–1087 (2011).
5
MA Arkin, et al., Binding of small molecules to an adaptive protein-protein interface. Proc Natl Acad Sci USA 100, 1603–1608 (2003).
6
JD Sadowsky, et al., Turning a protein kinase on or off from a single allosteric site via disulfide trapping. Proc Natl Acad Sci USA 108, 6056–6061 (2011).
7
DA Erlanson, et al., Site-directed ligand discovery. Proc Natl Acad Sci USA 97, 9367–9372 (2000).
8
VJ Hilser, An ensemble view of allostery. Science 327, 653–654 (2010).
9
JO Wrabl, et al., The role of protein conformational fluctuations in allostery, function, and evolution. Biophys Chem 159, 129–141 (2011).
10
GR Bowman, X Huang, VS Pande, Network models for molecular kinetics and their initial applications to human health. Cell Res 20, 622–630 (2010).
11
G Toth, K Mukhyala, JA Wells, Computational approach to site-directed ligand discovery. Proteins 68, 551–560 (2007).
12
CL McClendon, G Friedland, DL Mobley, H Amirkhani, MP Jacobson, Quantifying correlations between allosteric sites in thermodynamic ensembles. J Chem Theory Comput 5, 2486–2502 (2009).
13
S Eyrisch, V Helms, Transient pockets on protein surfaces involved in protein-protein interaction. J Med Chem 50, 3457–3464 (2007).
14
VJ Hilser, D Dowdy, TG Oas, E Freire, The structural distribution of cooperative interactions in proteins: Analysis of the native state ensemble. Proc Natl Acad Sci USA 95, 9903–9908 (1998).
15
KH Dubay, JP Bothma, PL Geissler, Long-range intra-protein communication can be transmitted by correlated side-chain fluctuations alone. PLoS Comput Biol 7, e1002168 (2011).
16
MP Liang, DR Banatao, TE Klein, DL Brutlag, RB Altman, WebFEATURE: An interactive web tool for identifying and visualizing functional sites on macromolecular structures. Nucleic Acids Res 31, 3324–3327 (2003).
17
SW Lockless, R Ranganathan, Evolutionarily conserved pathways of energetic connectivity in protein families. Science 286, 295–299 (1999).
18
BK Ho, DA Agard, Probing the flexibility of large conformational changes in protein structures through local perturbations. PLoS Comput Biol 5, e1000343 (2009).
19
N Ota, DA Agard, Intramolecular signaling pathways revealed by modeling anisotropic thermal diffusion. J Mol Biol 351, 345–354 (2005).
20
BJ Grant, et al., Novel allosteric sites on Ras for lead generation. PLoS One 6, e25711 (2011).
21
P Schmidtke, A Bidon-Chanal, FJ Luque, X Barril, MDpocket: Open-source cavity detection and characterization on molecular dynamics trajectories. Bioinformatics 27, 3276–3285 (2011).
22
T Frembgen-Kesner, AH Elcock, Computational sampling of a cryptic drug binding site in a protein receptor: Explicit solvent molecular dynamics and inhibitor docking to p38 MAP kinase. J Mol Biol 359, 202–214 (2006).
23
X Wang, G Minasov, BK Shoichet, Evolution of an antibiotic resistance enzyme constrained by stability and activity trade-offs. J Mol Biol 320, 85–95 (2002).
24
F Noe, S Fischer, Transition networks for modeling the kinetics of conformational change in macromolecules. Curr Opin Struct Biol 18, 154–162 (2008).
25
GR Bowman, VS Pande, Protein folded states are kinetic hubs. Proc Natl Acad Sci USA 107, 10890–10895 (2010).
26
F Noe, C Schutte, E Vanden-Eijnden, L Reich, TR Weikl, Constructing the equilibrium ensemble of folding pathways from short off-equilibrium simulations. Proc Natl Acad Sci USA 106, 19011–19016 (2009).
27
DA Silva, GR Bowman, A Sosa-Peinado, X Huang, A role for both conformational selection and induced fit in ligand binding by the LAO protein. PLoS Comput Biol 7, e1002054 (2011).
28
I Buch, T Giorgino, G De Fabritiis, Complete reconstruction of an enzyme-inhibitor binding process by molecular dynamics simulations. Proc Natl Acad Sci USA 108, 10184–10189 (2011).
29
M Held, P Metzner, JH Prinz, F Noe, Mechanisms of protein-ligand association and its modulation by protein mutations. Biophys J 100, 701–710 (2011).
30
F Morcos, et al., Modeling conformational ensembles of slow functional motions in Pin1-WW. PLoS Comput Biol 6, e1001015 (2010).
31
S Yang, NK Banavali, B Roux, Mapping the conformational transition in Src activation by cumulating the information from multiple molecular dynamics trajectories. Proc Natl Acad Sci USA 106, 3776–3781 (2009).
32
TJ Lane, GR Bowman, K Beauchamp, VA Voelz, VS Pande, Markov state model reveals folding and functional dynamics in ultra-long MD trajectories. J Am Chem Soc 133, 18413–18419 (2011).
33
GR Bowman, VA Voelz, VS Pande, Atomistic folding simulations of the five helix bundle protein λ6–85. J Am Chem Soc 133, 664–667 (2011).
34
GR Bowman, X Huang, VS Pande, Using generalized ensemble simulations and Markov state models to identify conformational states. Methods 49, 197–201 (2009).
35
KA Beauchamp, et al., MSMBuilder2: Modeling conformational dynamics on the picosecond to millisecond scale. J Chem Theory Comput 7, 3412–3419 (2011).
36
M Hendlich, F Rippmann, G Barnickel, LIGSITE: Automatic and efficient detection of potential small molecule-binding sites in proteins. J Mol Graph Modell 15, 359–389 (1997).
37
T Wlodarski, B Zagrovic, Conformational selection and induced fit mechanism underlie specificity in noncovalent interactions with ubiquitin. Proc Natl Acad Sci USA 106, 19346–19351 (2009).
38
D Bucher, BJ Grant, JA McCammon, Induced fit or conformational selection? The role of the semi-closed state in the maltose binding protein. Biochemistry 50, 10530–10539 (2011).
39
GG Hammes, YC Chang, TG Oas, Conformational selection or induced fit A flux description of reaction mechanism. Proc Natl Acad Sci USA 106, 13737–13741 (2009).
40
AY Lau, B Roux, The hidden energetics of ligand binding and activation in a glutamate receptor. Nat Struct Mol Biol 18, 283–287 (2011).
41
M D’Abramo, O Rabal, J Oyarzabal, FL Gervasio, Conformational selection versus induced fit in kinases: The case of PI3K-gamma. Angew Chem Int Ed Engl 51, 642–646 (2012).
42
P Csermely, R Palotai, R Nussinov, Induced fit, conformational selection and independent dynamic segments: An extended view of binding events. Trends Biochem Sci 35, 539–546 (2010).
43
A Berezhkovskii, G Hummer, A Szabo, Reactive flux and folding pathways in network models of coarse-grained protein dynamics. J Chem Phys 130, 205102 (2009).
44
MA Young, S Gonfloni, G Superti-Furga, B Roux, J Kuriyan, Dynamic coupling between the SH2 and SH3 domains of c-Src and Hck underlies their inactivation by C-terminal tyrosine phosphorylation. Cell 105, 115–126 (2001).
45
NC Strynadka, et al., Molecular structure of the acyl-enzyme intermediate in beta-lactam hydrolysis at 1.7 Å resolution. Nature 359, 700–705 (1992).
46
R Brandman, JN Lampe, Y Brandman, PR de Montellano, Active-site residues move independently from the rest of the protein in a 200 ns molecular dynamics simulation of cytochrome P450 CYP119. Arch Biochem Biophys 509, 127–132 (2011).
47
KH DuBay, PL Geissler, Calculation of proteins’ total side-chain torsional entropy and its influence on protein-ligand interactions. J Mol Biol 391, 484–497 (2009).
48
PT Lang, et al., Automated electron-density sampling reveals widespread conformational polymorphism in proteins. Protein Sci 19, 1420–1431 (2010).
49
T Lenaerts, et al., Quantifying information transfer by protein domains: Analysis of the Fyn SH2 domain structure. BMC Struct Biol 8, 43 (2008).
50
H Pan, JC Lee, VJ Hilser, Binding sites in Escherichia coli dihydrofolate reductase communicate by modulating the conformational ensemble. Proc Natl Acad Sci USA 97, 12020–12025 (2000).
51
JR Schames, et al., Discovery of a novel binding trench in HIV integrase. J Med Chem 47, 1879–1881 (2004).
52
JK Grimsley, B Calamini, JR Wild, AD Mesecar, Structural and mutational studies of organophosphorus hydrolase reveal a cryptic and functional allosteric-binding site. Arch Biochem Biophys 442, 169–179 (2005).
53
T Lundqvist, et al., Exploitation of structural and regulatory diversity in glutamate racemases. Nature 447, 817–822 (2007).
54
CL Gee, et al., Enzyme adaptation to inhibitor binding: A cryptic binding site in phenylethanolamine N-methyltransferase. J Med Chem 50, 4845–4853 (2007).
55
C Tang, CD Schwieters, GM Clore, Open-to-closed transition in apo maltose-binding protein observed by paramagnetic NMR. Nature 449, 1078–1082 (2007).
56
R Bernstein, KL Schmidt, PB Harbury, S Marqusee, Structural and kinetic mapping of side-chain exposure onto the protein energy landscape. Proc Natl Acad Sci USA 108, 10532–10537 (2011).
57
AM Levin, et al., Exploiting a natural conformational switch to engineer an interleukin-2 ‘superkine’. Nature 484, 529–533 (2012).
58
B Hess, C Kutzner, D van der Spoel, E Lindahl, GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable molecular simulation. J Chem Theory Comput 4, 435–447 (2008).
59
M Shirts, VS Pande, COMPUTING: Screen savers of the world unite! Science 290, 1903–1904 (2000).
60
JM Wang, P Cieplak, PA Kollman, How well does a restrained electrostatic potential (RESP) model perform in calculating conformational energies of organic and biological molecules? J Comput Chem 21, 1049–1074 (2000).
61
WL DeLano, The PyMOL Molecular Graphics System, Version 1.5.0.3 Schrödinger, LLC., 2002).

Information & Authors

Information

Published in

Go to Proceedings of the National Academy of Sciences
Go to Proceedings of the National Academy of Sciences
Proceedings of the National Academy of Sciences
Vol. 109 | No. 29
July 17, 2012
PubMed: 22753506

Classifications

Submission history

Published online: July 2, 2012
Published in issue: July 17, 2012

Keywords

  1. molecular dynamics
  2. native state dynamics

Acknowledgments

Thanks to Veena Thomas for helpful insights into β-lactamase. G.R.B. was funded by the Miller Institute. Computing resources were provided by users of the Folding@home distributed computing environment and National Institutes of Health Grant R01-GM062868, courtesy of Vijay Pande.

Notes

This article is a PNAS Direct Submission.

Authors

Affiliations

Gregory R. Bowman1 [email protected]
Departments of Molecular and Cell Biology, and
Chemistry, University of California, Berkeley, CA 94720; and
Phillip L. Geissler
Chemistry, University of California, Berkeley, CA 94720; and
Physical Biosciences Division, Lawrence Berkeley National Lab, Berkeley, CA 94720

Notes

1
To whom correspondence should be addressed. E-mail: [email protected].
Author contributions: G.R.B. and P.L.G. designed research; G.R.B. performed research; G.R.B. contributed new reagents/analytic tools; G.R.B. analyzed data; and G.R.B. and P.L.G. wrote the paper.

Competing Interests

The authors declare no conflict of interest.

Metrics & Citations

Metrics

Note: The article usage is presented with a three- to four-day delay and will update daily once available. Due to ths delay, usage data will not appear immediately following publication. Citation information is sourced from Crossref Cited-by service.


Citation statements




Altmetrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

    Loading...

    View Options

    View options

    PDF format

    Download this article as a PDF file

    DOWNLOAD PDF

    Get Access

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Personal login Institutional Login

    Recommend to a librarian

    Recommend PNAS to a Librarian

    Purchase options

    Purchase this article to get full access to it.

    Single Article Purchase

    Equilibrium fluctuations of a single folded protein reveal a multitude of potential cryptic allosteric sites
    Proceedings of the National Academy of Sciences
    • Vol. 109
    • No. 29
    • pp. 11467-11890

    Media

    Figures

    Tables

    Other

    Share

    Share

    Share article link

    Share on social media