Effects of frustration on the kinetics of helix formation in alanine polypeptides

The folding kinetics of a seven-residue long alanine polypeptide are investigated using a fully atomic protein model and molecular dynamics simulations. The peptide adopts helical conformations in the native state when simulated in two different implicit solvents: a model with a distance dependent dielectric constant and the generalized Born (GB) model. Although the two solvation models correctly predict the native state of the peptide, they lead to distinct folding kinetics. The model in which the dielectric constant is distance-dependent produces much broader distributions of the first passage time, as measured by the intermittency ratio, than the generalized Born model. In addition, the conformational diffusion search folding mechanism, introduced earlier to study the folding kinetics of helical peptides in explicit solvent, is seen to describe more closely the folding kinetics of the GB model than of the distance-dependent dielectric constant model. Based on these observations, we argue that the implicit solvent model that treats the dielectric constant as dependent on interatomic distances introduces excessive amounts of frustration into the underlying potential energy landscape and is thus far less suitable for modelling the kinetics of protein folding than the more sophisticated GB implicit solvent.


Introduction
Proteins are in general only capable of carrying out their biological functions once they have adopted their specific folded native state.Predicting which conformations are populated in the native state by a given protein under physiological conditions is an important aspect of the protein folding problem.However, in order to be functional, proteins need not only to reach their thermodynamic ground state, but they must do so quickly, on biologically relevant time scales.How rapidly a given protein can fold constitutes the second, equally important, component of the folding problem.
In recent years, considerable experimental efforts [1][2][3][4] have been directed toward studying the kinetics of protein folding kinetics focusing in particular on determining the upper speed limit of folding reactions.From a theoretical perspective, the folding kinetics of a given model are intimately linked to the underlying potential energy surface (PES).The topology of the energy surface is key to defining the nature of the resulting kinetics.For instance, potential energy surfaces with multiple local minima (frustrated surfaces), give rise to slow relaxation phenomena since the minima can act as kinetic traps that slow down the folding reaction.The observation that most proteins reach their native states on relatively short time scales led to the development of the principle of minimum frustration [5,6].This principle postulates that the amino-acid sequences of proteins, in contrast to those of random heteropolymers, have been evolutionary selected in such a way that conflicts among different types of interactions existing in a protein are relieved (or minimized) in the native state conformation.In other words, the native state conformation serves as the global minimum for all the interactions occuring in a protein.As a consequence, the energy landscape of a fast-folding protein satisfying the principle of minimal frustration resembles a funnel with the native state located at the bottom.The degree of frustration present in an energy landscape of a model protein, or how strongly this energy landscape deviates from the prototypical funnel image, can be used to gauge the success of the model at reproducing the properties of real proteins.
Among other methods, the amount of frustration in an energy surface can be quantified by the thermodynamic ratio R of the folding temperature T f to the glass transition temperature T g .Models with R > 1 are classified as fast folders while those with R < 1 are slow folders, characteristic of frustrated energy surfaces.Despite its utility as a parameter for quantitatively characterizing the degree of frustration of potential energy surfaces, the ratio R has not found widespread applications due to numerical difficulties associated with evaluating T g .Only very recently has it become possible to evaluate this quantity from computer simulations of glass-forming liquids [7][8][9] and minimalist protein models [10].To the best of our knowledge, however, the computational burden associated with these types of simulations has prevented systematic studies of T g for more realistic, fully atomic models of proteins.
Another approach that can be used to determine the amount of frustration in a protein model is to study its folding kinetics.Most short peptides and proteins (up to 100 residues long) follow the rule of the thumb that the folding kinetics are single exponential [11].There are however exceptions to this rule, most notably when the protein is considered at low temperatures.Non-exponential kinetics akin to those observed in glassy materials [12] have been reported for a number of proteins [13,14].The increasing broadening of the kinetic distributions at low temperatures is indicative of the underlying slow dynamical processes governed by trapping in local potential energy minima.As suggested by Wang on the basis of the random energy model [15], the kinetic distributions under such conditions may even lead to a Lévytype distribution [16], as a limiting case.The relationship between the amount of frustration on an energy landscape and the resulting kinetic distributions can serve as an important tool for characterizing protein models and evaluating their quality.Quantitatively, the intermittency ratio, or the ratio of higher moments of the folding time distribution functions to their first moment (mean value), can be used as a measure to carry out such characterizations with regard to the broadness of the distributions [17,15].
In the present work, the folding kinetics of an alanine heptapeptide are studied using an atomistic model of the peptide and molecular dynamics simulations.The kinetics simulations are carried out by using two implicit solvation methods.In one of them the presence of water molecules is modeled through distance-dependent dielectric constant e = r [18].For brevity we will refer to this model as the distance dependent solvent throughout the paper.The second solvation method employed is the generalized Born (GB) model [19,20].Both models take into account the electrostatic part of the solvation free energy and are widely used in the literature.By using the replica-exchange molecular dynamics method [21] we found that the ground state of the studied peptide is an α-helical conformation.This is in a good agreement with previous work on alanine polypeptides [22].In long canonical-ensemble simulations, repeated folding and unfolding events into and from the native state were observed and monitored as a function of temperature.The recorded folding times were compiled into distributions and analysed.The analysis of these distributions reveals that the two solvation models behave differently as far as the folding kinetics are concerned.The GB model exhibits narrow distributions of folding times that can be successfully fitted by bi-exponential expressions through the whole range of temperatures studied, including above and below the folding temperature T f .The intermittency ratios for the first three moments in the GB model never reach values higher than one hundred.The distance dependent solvent, on the other hand, produces single-or double-exponential kinetics at high temperatures only, i.e., temperatures at which the stability of the native state is negligible.At lower temperatures the distributions of the folding time broaden in shape and acquire broad tails indicating the onset of slow dynamics.In support of this observation, the intermittency ratios calculated for this model are seen to vary widely from a value of two at high temperatures to milions of bilions at lower T s.The qualitatively different types of folding kinetics observed for the two implicit solvation models led us to conclude that the models differ significantly in their degree of frustration.We argue that the GB model produces reduced amounts of frustration compared to its distance dependent counterpart and is thus a more suitable model for kinetics simulations of proteins and peptides.
In this work we also tested how well the conformational diffusion search mechanism of folding can describe the folding kinetics for the solvation models studied in this paper.Earlier work on explicit solvent simulations indicated that the conformational diffusion search mechanism adequately described the folding kinetics of alanine peptides [22].The conformational diffusion mechanism depicts folding as Brownian motion along a selected reaction coordinate [6,23].Folding within this model can be predicted theoretically from two elements a) the free energy profiles along the reaction coordinate and b) the configurational diffusion constant.Both of these quantities are readily obtainable from computer simulations [24].We used the root mean-square deviation (RMSD) of a given conformation from the helical native state as the progress reaction coordinate and evaluated the folding time theoretically for the two solvation models as a function of temperature.We find that the GB model produces much better agreement between the theoretical folding time and the folding time evaluated directly from computer simulations than does the distance dependent solvation model.For the distance dependent model it was not possible to estimate folding times theoretically at low temperatures at which the native state is expected to be stable.At higher temperatures where such estimates were successful, the theoretical folding time was found to be five to six times longer than the folding time computed directly from simulations.This contrasts with the simulations performed for the GB model which agree with up to a three-fold accuracy with the theoretical folding times for the whole range of temperatures studied.
The paper is organized as follows.In section 2 we describe the all-atom model of the polyalanine heptapeptide and provide details of the simulations.Our main results concerning the folding kinetics are presented in section 3 and conclusions in section 4.

Methods and models
In order to investigate the folding kinetics of short helical peptides we study an alanine heptapeptide as a model system.Polyalanines have been studied for years both experimentally [25] and theoretically [26][27][28][29]22] as a test bed system for the coil-to-helix transition.Here we consider a seven-residue long peptide with the amino acid sequence Ace-(Ala) 7 -Nme.We use the united atom CHARMM PARAM19 force field [30] in which some of the hydrogen atoms are absorbed into the nearest heavy atom and some are treated explicitly.No cutoff distances are applied for long-range van der Waals and Coulomb interactions.

Continuum solvation methods
The solvations effects in the present study are taken into account through two different models: a) the model in which the dielectric constant is assumed to be proportional to the interatomic distances = r [18] and b) the generalized Born solvation model [19,20].The use of implicit solvation methods is motivated by computational limitations: systematic simulations of peptides with explicit water molecules are numerically prohibitive [22].Implicit solvents present an important alternative for simulations of proteins, especially when folding kinetics are of interest.The theory of continuum solvation is now reaching the stage where direct comparisons of both folding thermodynamics and kinetics with experiments is becoming feasible [31].The models considered in this study fall into two extremes of the wide spectrum of implicit solvents available for protein simulations [32].On one extreme lies the distance-dependent dielectric model.The model is a computationally efficient, albeit simplistic in its way of accounting for solvation effects.It produced some encouraging results in the early days of molecular simulations [18] and due to its simplicity, has been widely used in the literature ever since [33,34].On the other extreme is the generalized Born (GB) model which belongs to the category of highly sophisticated continuum solvation methods.It was designed to accurately reproduce the electrostatic part of the solvation free energy [19,20] of large solute molecules in aqueous solutions.For a number of peptides and proteins the GB model has been shown to predict the native state with a high accuracy of RMSD = 3 Å or less [35].Parametrizations of the particular GB implementation used in the present study exist for organic molecules [36] as well as for proteins and nucleic acids [20].We employed the replica-exchange molecular dynamics method [21] in order to determine the native state of our peptide under the two solvation conditions.Specifically, we used the implementation of the method introduced by Brooks et al. [37].The simulations were carried out for 5 replicas taken at temperatures T = 200, 263, 346, 456 and 600 K over a 10 ns time period.As judged from the convergence of the potential energy and the RMS deviations from the helical state, sufficient equilibration occured over the course of the simulations allowing us to determine statistical averages as a function of temperature.We identified the native state of the peptide as the lowest energy conformation visited during the simulations.We found that this conformation is an α helix in both solvation models.A snapshot of the native state is shown in figure 1. Characteristic for α helices patterns of hydrogen bonds, formed between hydrogens of residue i and oxygens of residue i + 4, are readily seen in the figure.The energy of the native state within the distance-dependent dielectric model is −94.6 kCal/mol and that predicted by the GB solvation scheme is −291.5 kCal/mol.The values of the seven backbone (φ, ψ) dihedral angles of the native state differ slightly between two solvation models considered here but are consistent with the values obtained earlier in computer simulations of alanine peptides [22,28].These values are listed in table 1.
Table 1.The backbone dihedral angles (φ, ψ) for the native helical states of the alanine peptide studied here by using the solvation method that treats dielectric constant as distance dependent and the generalized Born model.

Details of kinetics simulations
The CHARMM macromolecular modeling package [38] was used to perform all molecular dynamics simulations.For the distance-dependent dielectric model six canonical-ensemble simulations at temperatures T =470, 500, 550, 600, 700, 800 K were carried out using the velocity Verlet algorithm and the Nosé-Hoover [39] thermostat method with the q friction parameter set to 50 ps.A set of eight simulations at temperatures T =250, 280, 320, 350, 400, 450, 500 and 600 K was performed for the GB model.The temperature ranges for both solvent conditions were chosen so that we could test the folding kinetics above and below folding temperature T f .From our replica-exchange calculations we determined T f to be 290 K for the GB solvation method.For the distance-dependent dielectric model the data did converge sufficiently to allow an accurate determination of the folding temperature.Therefore, we can only put an estimate for this temperature to be approximately 600 K.
In all our simulations, we employed an integration time step of δt = 1 fs.The time step was chosen such that the conformational statistics of the peptide did not change noticeably when shorter integration time steps were used.The length of the hydrogen-nitrogen bonds (the only bonds involving hydrogen atoms in the present system) was kept constant through the SHAKE algorithm [40].The lengths of the simulations performed depended on the temperature considered.The longest total simulation time was 7 µs, generated at the lowest temperature T = 470 K considered under the distance-dependent dielectric solvation.At the highest temperature T = 800 K the total physical time reached by the simulations was 0.15 µs.Simulations performed for the generalized Born solvent were all 1 µs long.During the simulations, peptide conformations were stored into a file at a regular time interval of ∆t = 400 time steps.We made sure, by running short test simulations, that the average unfolding time is longer than 400 time steps in order not to introduce a bias into the folding kinetics resulting from unaccounted folding events.

Kinetics of folding
Repeated folding and unfolding events were monitored during the course of the simulations using the RMS deviation from the native state as the progress variable of folding.Folding times were determined from the time required by the peptide to reach a conformation with RMS < 0.7 Å , when folding was initiated from a conformation with 3.6 < RMS < 4.2 Å.It should be noted that while an RMS < 1 Å is a standard definition for the native state in simulations of peptides and short proteins, there is no such a standard way of defining what exactly constitutes the unfolded state.Technically speaking, every conformation with RMS > 0.7 Å is unfolded.Unlike the native state which consists of a few similar conformations, the unfolded state can comprise a large set of diverse conformations.This set depends on the environment and is hence specific to the method of denaturation.As was pointed out earlier [41], it makes more sense to speak of ensembles of unfolded states in refolding experiments rather than of individual conformations.In the present work we chose to initiate the folding from conformations that are most frequently visited in simulations performed at high enough temperatures T > T f such that the peptide can be considered unfolded.Our unfolded state ensemble constructed in this manner thus can be viewed as an ensemble resulting from heat denaturation.The sequences of folding times recorded during the simulations were used to construct distributions of the folding time P f (τ ), as well as the probability distribution P (τ ) for the protein to remain unfolded at time τ , provided that the folding started at time zero (i.e. the survival probability).The latter is shown for a range of temperatures in figure 2(a) for the distance-dependent dielectric solvent and in figure 2(a) for the generalized Born model.Two features are immediately apparent from figure 2(a).Firstly, at the highest temperature of 800 K, the survival probability P (τ ) is determined by a single time scale and can be successfully fitted by a single-exponential function e −ατ .As the temperature decreases, the kinetics begin to split into two different time scales.The probability distributions become strongly non-exponential and can no longer be fitted by simple expressions such as a bi-exponential φe −ατ + (1 − φ)e −βτ or stretched exponential f e −(ατ ) β .Secondly, the range of the relaxation time scales increases almost by two orders of magnitude when the temperature is decreased from 800 to 470 K.The folding kinetics described for the distance-dependent dielectric model are to be compared with the folding kinetics obtained for the GB model, shown in figure 2(b).In contrast to the previous case, we do not see a splitting into two distinct time scales.The kinetics are narrow at all temperatures and can be well reproduced by the bi-exponential expression given above.It is clear from a visual inspection of figures 2(a) and (b) that the survival probabilities generated by the distance-dependent dielectric model experience considerably more pronounced broadening at reduced temperatures than in the case of the GB solvent.Folding events taking place at long times become increasingly more relevant at low temperature for the distance-dependent dielectric solvation model.A direct consequence of this are broad tails in the survival distribution functions.Wang and Wolynes [17,15] recently introduced intermittency ratios r n to quantitatively measure the broadness of statistical distributions.The analysis of these ratios can help us to unambiguously identify the type of the kinetics underlying a statistical model.r n are defined as the ratios of the n-th moment τ n of the folding time distribution function to its 1-st momemt τ n , or the mean folding time, raised to power n.Narrow statistical distributions, such as Gaussians or exponential distributions, have finite r n ratios on the order of 1. Broadening of the distributions is accompanied by rapid growth of r n .In the limit of power-law distributions, or socalled Lévy-flight distributions, the intermittency ratios r n diverge.In figure 3 we present r n up to the order 4, computed for the alanine heptapeptide using two different solvation methods.For visual convenience the curves are plotted against the scaled temperature T/T f where the scaling temperatures were determined in replica exchange simulations for each solvent model.It is apparent from the figure that the two solvation methods studied in this paper have quantitatively different folding kinetics.At temperatures T/T f > 1, where the native state of the peptide is unstable, the folding kinetics produced by both solvent models are narrow, characterized by r n on the order of 10.As the temperature is reduced, the distributions resulting from GB simulations remain narrow down to the lowest temperature studied.The distance-dependent dielectric constant model, in contrast, produces survival distributions that become increasingly broad as the temperature is lowered.At T ∼ T f where the peptide spends half of its time in the native state and the other half is unfolded, the intermittency ratio r 2 ∼ 100 is almost two orders of magnitude higher than the equivalent ratio obtained for the GB model.The ratios r n display trends towards divergence as the temperature is further reduced.

Conformational diffusion folding mechanism
Within the conformational diffusion model, the folding of a protein is viewed as diffusive motion in the phase space defined by a few selected reaction coordinates.The reaction coordinates are intended to capture the conformational dynamics around both folded and unfolded state ensembles and, more importantly, around the transition state ensemble.The diffusive motion of a protein takes place on the free energy surfaces which represent statistically averaged projections of all the remaining degrees of freedom onto the space of reaction coordinates.Within the conformational diffusion folding model, the time evolution of the probability distribution of a one-dimensional reaction coordinate χ is governed by the Smoluchowski equation [6,42].Solving this equation for the mean first passage time τ f , which we will refer to as the folding time, yields: where β = 1/k B T , D is the configurational diffusion coefficient, χ unf is the value of the reaction coordinate for conformations from unfolded states, and χ fol is this value for the folded ensemble conformations.The quantity U(χ) is the potential energy field in which the motion in the phase space of the reaction coordinate takes place.The potential energy is simply related to P (χ) the distribution function of the reaction coordinate U(χ) = −k B T log(P (χ)) and thus is readily accessible in computer simulations.In contrast, the conformational diffusion coefficient D in equation ( 1), is more difficult to evaluate numerically.As we noted earlier [24,43], a convenient way to do so is to consider the Brownian equation of motion for the dynamics of χ: Here F [χ(t)] and δR(t) are the regular and stochastic forces exerted on the degree of freedom χ respectively.The regular force results from the potential energy field and is given by the usual expression: The stochastic force δR in equation ( 2) is modeled as a Gaussian noise with zero mean and variance δR 2 = 2D.δR taken at a moment t is neither correlated with its value at any previous time nor is it correlated with the reaction coordinate χ at any time, including time t.By multiplying equation (2) by χ(0) and taking statistical average we can derive an equation for the time autocorrelation function Φ(t) = χ(t)χ(0) .Integration of this equation yields an expression for the diffusion constant: where we defined the time correlation function of the regular force at time t and the reaction coordinate at time zero as Ψ(t) = F [χ(t)]χ(0) .Equation ( 4) provides the means to compute configurational diffusion coefficient from the distribution function of the reaction coordinate P (χ) and time evolution of χ(t).
The above expressions were employed to compute the configurational diffusion coefficient and folding time τ f as a function of temperature for the helical heptapeptide using both solvation methods.The theoretical computations were done by choosing the RMS deviation from the helical native state over alpha carbons as the reaction coordinate of folding χ.In figure 4 the theoretical folding time is compared with the folding time computed directly from simulations.The unfolded state in our calculations (both kinetics simulations and theory) was defined as an ensemble of conformations with RMS ranging from χ unf = 3.6 to 4.2 Å.This choice of the unfolded ensemble is based on the observation (data not shown) that under denaturing conditions at high temperatures the conformations with RMS around 4 Å have the largest statistical weight and are sampled most frequently.Kinetics simulations performed in this work can thus be seen as emulating most closely biological conditions where folding is triggered by heat denaturation.In analyzing the folding trajectories, peptides were taken to be folded when the RMS deviation dropped below 0.7 Å.This same value of RMS deviation was taken to be the folded ensemble boundary χ fol in equation (1).  1) and directly from computer simulations for the alanine heptapeptide studied in this work within two continuum solvation methods: (a) the model that assumes that the dielectric constant is proportional to the interatomic distances and (b) the generalized Born implicit solvent.The GB model appears to produce much better agreement with experimental results than the distance-dependent dielectric constant method.
For simulations performed at the three lowest temperatures T = 470, 500 and 550 K for the distance-dependent dielectric solvation model, we were unable to compute the folding time τ theoretically because the time correlation function Ψ(t) would not converge.We attribute this failure to insufficient statistics in our simulations due to slow relaxation.We note that the above temperatures are found in the low-temperature region where folding time starts to rise again after a steep fall at higher temperatures.The conformational dynamics in this regime are dominated by trapping in local potential energy minima which contribute to long relaxation times.At temperatures where theoretical estimates of the folding time were possible we find that the folding time predicted by the conformational diffusion folding model is about 5 times longer than the folding time obtained from folding simulations.The performance of the theory versus simulation observed here is rather poor compared to how exact the same theory was seen to perform for other protein models [23,24].Folding times obtained in simulations when using the GB solvent model are shown in figure 4(b).In contrast to the distance-dependent dielectric solvation, we see much better agreement between theory and computer simulation.The folding times produced by equation ( 1) are still considerably longer than the true folding times of the model, but in this case the discrepancy is on the order of 3 which is almost two times smaller than the discrepancy obtained with the other solvation method studied in this work.

Conclusions
In this paper we studied the folding kinetics of an alanine heptapeptide using two continuum solvent models: a model which takes into account the electrostatic solvation energy through distance-dependent dielectric constant and the generalized Born model.We find signatures of distinct amounts of frustration underlying these models.The folding time distributions computed for the distance-dependent dielectric constant model are exponential at high temperatures above the folding temperature and acquire broad slowly-decaying tails when the temperature is lowered.The distributions generated for the GB model remain bi-exponential at all temperatures studied, both above and below the folding temperature.Quantitative measurements of the intermittency ratios at varying temperatures support the observation that the distance-dependent dielectric solvent model produces considerably broader folding time distributions than the GB solvation method.We argue that the distinct kinetics behavior of the solvation models is a direct consequence of the amounts of frustration present in their respective energy surfaces.As slow kinetics is a dynamic signature of excessive amounts of frustration we conclude that the distance-dependent dielectric model is more frustrated than its GB counterpart and is thus less suitable for simulations of folding kinetics.Our findings regarding the applicability of the conformational diffusion folding mechanism to the present models are consistent with this conclusion.We find that the folding time calculated theoretically using a diffusion-equation formalism is in much better agreement with the folding time obtained from simulations for the GB model than in the case of the distance-dependent dielectric model.

Figure 1 .
Figure 1.Native α-helical state of the polyalanine heptapeptide studied in the present work.

Figure 2 .
Figure 2. Probability for the peptide to remain unfolded at time τ provided that the folding started at time 0, for simulations performed in a) the distancedependent dielectric solvent and b) the generalized Born implicit solvent.

Figure 3 .
Figure 3. Intermittency ratios r n = τ n / τ n for n = 2, 3, 4 computed from the probability distribution functions of folding time.The data are shown for the two solvation methods studied in this work: open symbols -the solvation model in which the dielectric constant is taken to be proportional to the interatomic distances, solid symbols -the generalized Born model.For the sake of comparison the curves are plotted along temperatures normalized by the folding temperatures T f specific for each solvation method.

Figure 4 .
Figure 4. Folding time computed theoretically from equation (1) and directly from computer simulations for the alanine heptapeptide studied in this work within two continuum solvation methods: (a) the model that assumes that the dielectric constant is proportional to the interatomic distances and (b) the generalized Born implicit solvent.The GB model appears to produce much better agreement with experimental results than the distance-dependent dielectric constant method.