New optimized algorithms for molecular dynamics simulations

The method of molecular dynamics (MD) is a powerful tool for the prediction and investigation of various phenomena in physics, chemistry and biology. The development of efficient MD algorithms for integration of the equations of motion in classical and quantum many-body systems should therefore impact a lot of fields of fundamental research. In the present study it is shown that most of the existing MD integrators are far from being ideal and further significant improvement in the efficiency of the calculations can be reached. As a result, we propose new optimized algorithms which allow to reduce the numerical uncertainties to a minimum with the same overall computational costs. The optimization is performed within the well recognized decomposition approach and concerns the widely used symplectic Verlet-, Forest-Ruth-, Suzukias well as force-gradient-based schemes. It is concluded that the efficiency of the new algorithms can be achieved better with respect to the original integrators in factors from 3 to 1000 for orders from 2 to 12. This conclusion is confirmed in our MD simulations of a Lennard-Jones fluid for a particular case of secondand fourth-order integration schemes.


Introduction
Modelling various physical and chemical processes in molecular dynamics (MD) simulations, we come to the necessity of integrating the equations of motion for a many-body system of interacting particles.A lot of numerical algorithms have been devised and implemented over the years to perform such an integration.The traditional high-order explicit Runge-Kutta (RK) and implicit predictor-corrector (PC) schemes [1,2] were applied in early investigations.It was soon realized that the extra orders obtained in these schemes are not relevant since the truncation errors accumulate too rapidly on MD scales of time [3].This high instability restricts the application of RK and PC integrators in long-term MD simulations to very small time steps only, and, thus, reduces significantly the efficiency of the computations.In addition, the RK and PC algorithms produce solutions which, unlike exact phase trajectories, are neither symplectic nor time reversible.
In 1990, a new approach to the integration of motion in classical and quantum systems was proposed [4,5].Within this approach the exponential operator of time evolution is decomposed into analytically solvable parts.As a result, the symplectic map of the flow of particles and time reversibility can be conserved exactly, despite an approximate character of the produced trajectories.The preservation of symplecticity and reversibility appears to be very important because, as is now well established, this closely relates to the stability of an algorithm [6].The stability means that the numerical errors arising during the decomposition integration are bounded even for large sizes of the time step.This is contrary to the RK and PC schemes, where the truncation uncertainties increase linearly with the increase of the time of integration [3,[7][8][9][10].The decomposition algorithms are, therefore, ideal for long-duration evaluations in MD [6] and in astrophysical [11] simulations.
The construction of MD algorithms within the decomposition approach has been the subject of many investigations [4,5,7,[12][13][14][15][16][17][18][19][20].However, much of the previous work has been of a heuristic nature.Moreover, any freedom in the choice of schemes has been eliminated by restricting attention to particular schemes, mainly to those which lead to a minimal number of decomposition stages at a given order.The most notorious example is the widely used Verlet algorithm [21,22], which corresponds to a three-stages decomposition scheme of the second order.The fourth-order algorithm by Forest and Ruth [5] relates to a scheme with seven single-exponential stages.The sixth-order propagations are available [13,16] beginning from such fifteen stages.Higher-order integrators can be obtained by composing schemes of lower orders [4,12,14,16,[23][24][25][26].
Recently, a more consequent analysis of the exponential factorization process has allowed to extend the family of analytically integrable decomposition algorithms by new members from the so-called force-gradient class [15,17,18].But again, as in the non-gradient case, the study has been restricted to particular solutions, actually to the fourth order only.For example, Chin [18] introduced a set of five-and sevenstages of such force-gradient solutions.In order to derive the algorithms of higher orders, it has been proposed to apply the crude iterative procedure [27].Advantages of the force-gradient integrators over their standard non-gradient (including Verlet and Forest-Ruth) counterparts have been demonstrated in celestial [18,27], stochastic [28,29], and quantum [30] dynamics simulations.
The question of how to improve the efficiency of the decomposition integration for atomic systems with long-range interactions has also been considered.As a result, the so-called multiple time scale integrators have been introduced [22,31].In these integrators, the slow subdynamics is treated in a specific way using larger step sizes in view of the weakness of long-range forces.The faster motion, caused by the interactions at short interparticle distances, remains to be integrated using the original decomposition algorithms, such as the Verlet integrator.In addition, it has been shown that the decomposition approach can be adapted to integrate not only the translational motion in atomic liquids, but also to simulate more complicated molecular and spin systems at the presence of orientational degrees of freedom [7,19,20].
Despite these advances, there has been very little work done on the systematic construction of decomposition integrators.Some results in this direction have been obtained by Li [16], but he restricted ourselves to the most obvious solutions.In addition, his consideration was limited to the non-gradient class exclusively, omitting the case of force-gradient algorithms.Note that a complete systematization of decomposition integrators presents not only a methodological interest but may have an important practical significance, because the new solutions found can be more efficient than the already known ones.In particular, as has been shown quite recently [32][33][34], the decomposition propagators with minimal number of stages do not necessarily lead to an optimal performance.
The primary goal of this work is to provide a comprehensive study of a whole family of symplectic self-adjoint decomposition integrators.Note that the self-adjointness should be fulfilled to reproduce the property of time reversibility in the solutions for non-dissipative systems (in the case of stochastic dynamics the same is needed to recover the principle of detailed balance).In this context, we should mention another family of symplectic self-adjoint schemes developed within an extended Runge-Kutta approach [35].These schemes, however, are implicit and require the systems of nonlinear equations to be solved by expensive iterations at each step of the integration process.Since such equations cannot be solved exactly (because the number of the iterations is finite), the symplecticity and time reversibility will be violated.These disadvantages are absent in the present decomposition method, where the time propagation is performed explicitly by a set of canonical transformations.Another benefit of our approach is that we work with the most general solutions and, thus, are able to easily carry out an optimization with respect to their efficiency.The optimization is reached by imposing additional constraints, such as minimizing the principal error coefficients, and thus it allows us to choose the best integrator for each given order.

Basic equations of motion in classical and quantum systems
For classical systems, the equations of motion can be cast in the following compact form where ρ denotes the set of phase variables, [ • ] is the Poisson bracket, and H represents the Hamiltonian function.In the case of N classical particles interacting through the pair-wise potential ϕ(r ij ) ≡ ϕ(|r i − r j |), the explicit expression for the Hamiltonian is Here r i denotes the position of particle i (i = 1, 2, . . ., N ) moving with the velocity v i = dr i /dt and carrying mass m, so that T and U are the total kinetic and potential energies, respectively.Then ρ = {r i , v i } ≡ {r, v}, and the Liouville operator of the system takes the form where are the forces acting on the particles due to the interactions.
For quantum systems, the state evolution can be described by the time-dependent Schrödinger equation where 2 ∇ 2 i /m and U are the kinetic and potential energy operators, respectively, and ψ is the wave function.The stochastic [28,29] and hybrid quantumclassical [36] dynamics models can also be introduced (in the latter case we come to a coupled system of Newtonian (1) and Schrödinger (4) equations).But, for simplicity, we will deal only with the above purely classic and quantum considerations.
If an initial configuration ρ(0) or ψ(0) is specified, the unique solution to equation (1) or (4) can formally be presented as where ∆t is the size of the time step, l = t/∆t is the total number of steps, R denotes either ρ or ψ, and L corresponds to L or −iH/ .It is well known that the time evolution of many-particle systems (N > 2) cannot be carried out exactly.So that the only way of handling the problem ( 5) is to perform the calculations by numerical methods.

Symplectic self-adjoint decomposition schemes
The basic idea of a decomposition approach is to factor out the exponential propagator e L∆t on such suboperators which allow to be presented analytically or at least in quadratures.This is achieved by splitting the operator L = A + B into its kinetic A and potential B parts, where A = v•∂/∂r or A = −iT / and B = a•∂/∂v with a ≡ {a i } = {f i /m} being the acceleration or B = −iU / for the cases of classical or quantum mechanics, respectively.Then the total propagator can be decomposed as e (A+B)∆t+O(∆t K+1 ) = P p=1 e Aap∆t e Bbp∆t+Ccp∆t 3 , ( where C ≡ [B, [A, B]] is the auxiliary operator and [ , ] denotes the commutator of two operators.The coefficients a p , b p , and c p have to be chosen in such a way as to provide the highest possible value for the order K 1 at a given integer number P 1.As a result, taking into account the smallness of ∆t, the integration (5) can be performed approximately using the equation ( 6) by neglecting truncation terms O(∆t K+1 ).The precision of the integration will improve with the increase of the order K and with the decrease of the size ∆t of the time step.Formula (6) represents the decomposition approach in its most general form [34].For c p ≡ 0, equation ( 6) reduces to the usual non-gradient factorization [4,5,12,14,15].
In order to show that the exponential subpropagators arising on the right-handside of equation ( 6) are indeed analytically integrable, let us first consider more in detail the structure of the third-order operator C. In view of the expressions for operators A and B, it can be readily obtained in the case of classical systems that where g iα = 2 jβ (f jβ /m) ∂f iα /∂r jβ .The force-gradient evaluations ∂f iα /∂r jβ can also be explicitly represented because of f iα = − j(j =i) ϕ (r ij )(r iα − r jα )/r ij .This results in where ϕ ij ≡ ϕ (r ij ) = dϕ(r ij )/dr ij and ϕ ij = dϕ ij /dr ij .As can be seen, the function G like a does not depend on velocity and, thus, the operator C will commute with B. Then, taking also into account the independence of v on r, yields the following two equalities e Aap∆t {r, v} = {r + a p v∆t, v} , (9) e Bbp∆t+Ccp∆t 3 {r, v} = {r, v + b p a∆t + c p G∆t 3 } that represent simple shifts in position and velocity spaces, respectively [34].For quantum systems, where C = i i |∇ i U | 2 /( m), the calculation of e Bbp∆t+Ccp∆t 3 also presents no difficulties (at least for particles in external fields), because this requires only the knowledge of the potential and its gradient.The kinetic part e Aap∆t will require carrying out two, one direct and one inverse, spatial Fourier transforms [30].
An important feature of the decomposition integration (6) is its exact conservation of the symplectic map of the flow of particles in the phase space.This follows from the fact that separate shifts (9) of positions and velocities do not change the phase volume.The property S −1 (t) = S(−t) of time reversibility of evolution operators S(t) = e Lt can also be reproduced perfectly by imposing self-adjoint constraints on the coefficients a p , b p , and c p .There are two kinds of such constraints [33], namely, a 1 = 0, a p+1 = a P −p+1 , b p = b P −p+1 , c p = c P −p+1 , as well as a p = a P −p+1 , b p = b P −p , c p = c P −p with b P = 0 and c P = 0. Then single-exponential subpropagators will enter symmetrically into the decomposition (6), providing automatically the required reversibility.The existence of the two kinds of constraints is caused by the two kinds of single-exponential operators (9) and the fact that the symmetry is invariant with respect to the replacement of these operators between themselves.Note also that within the self-adjoint schemes, the total number of single-exponential operators (stages) in equation ( 6) is actually equal to S = 2P − 1, i.e. it takes only odd values (because one of the boundary set of coefficients is set to zero, a 1 = 0 or b P = c P = 0).

Order conditions
The above symmetry will result in the automatic disappearing of all even-order terms in the error function of decomposition transformation (6), i.e., (10) We see, therefore, that the order K of self-adjoint algorithms may accept only even numbers (K = 2, 4, 6, 8, . ..).The cancellation of the remaining odd-order terms (up to O K−1 ∆t K−1 for the order K) has to be provided by fulfilling a set of basic conditions for a p , b p , and c p .Let us write down explicit expressions for the functions O 1 , O 3 , O 5 , and O 7 (this will be enough to derive algorithms up to the eighth order).Expanding both sides of equation ( 6) into Taylor's series, and collecting the terms with the same powers of ∆t one finds: Here we take into account the equality [B, C] = 0 following from the commutation of operators B and C, so that any occurrence of combinations containing the chain
So that putting ν = 1 and σ = 1 will cancel the first-order truncation uncertainties, O 1 = 0.The next multipliers α, β, γ 1−4 , ζ 1−10 should be set to zero to kill higher-order truncation terms, namely, O 3 , O 5 , and O 7 (see equations ( 11)-( 13)).In such a way, we come to a set of the required order conditions.They present, in fact, a system of non-linear equations which should be solved with respect to the time coefficients a p , b p , and c p .

Optimization of decomposition schemes
The main idea of our approach lies in the fact that for each given order, we can always introduce an extended set of single-exponential time coefficients with one or more free parameters.Then, imposing an additional constraint related to the norm of the principal error coefficient, the best solutions can be chosen by minimizing that norm with respect to these parameters.In this section we will first describe in detail the optimization of the most known non-gradient Verlet [21,22] and Forest-Ruth [5] integrators of orders 2 and 4, respectively.The final results on the optimization of the schemes with higher orders and stage numbers, including the case of force-gradient integrators, will also be presented.

Verlet-like algorithms of the second order
The method just highlighted is quite general to build numerical integrators of arbitrary orders.In particular, the second-order (K = 2) velocity Verlet (VV) algorithm [21,22] e (A+B)∆t+O(∆t is immediately reproduced from equation ( 6) at P = 2 and The case when the operators A and B are replaced by each other (A ↔ B) is also possible, and we come to the so-called position-Verlet (PV) algorithm [22], e (A+B)∆t+O(∆t 3 ) = e A∆t/2 e B∆t e A∆t/2 , corresponding to the choice Despite the fact that the method of construction of time-reversible integrators using symplectic decompositions is not new, some important cases have never been considered and have been completely ignored in the literature.This concerns, in particular, the following question.Are the above Verlet algorithms optimal in view of the time efficiency among all possible basic (i.e. with single splitting of the Liouville operator) decomposition integrators of the second-order?We can say only that the Verlet algorithms do minimize the number of force evaluations per time step.
However, as will be shown below, this does not guarantee the optimization with respect to the overall number of force recalculations (the most time-consuming part of MD simulations), which are necessary to perform during a fixed observation time in order to achieve a given precision in solutions.
It can be readily seen that the Verlet algorithms (P = 2) require only one (P − 1 = 1) force evaluation per time step ∆t, whereas the fourth-and higher-order schemes (P 4) need three or more such evaluations (see the next subsections 3.2 and 3.3).Let us consider now the intermediate case P = 3 which leads to an extended self-adjoint propagation in the form e (A+B)∆t+O 3 ∆t 3 +O(∆t 5 ) = e Aξ∆t e B ∆t 2 e A(1−2ξ)∆t e B ∆t 2 e Aξ∆t (15) following from equation ( 6) at 15) represents a whole family of symplectic time-reversible integrators of the second-order in which a particular member can be extracted by choosing a value for a free parameter ξ.
For ξ = 0, equation ( 15) reduces to the VV (see equation ( 14)) or PV (at A ↔ B) algorithm.The extended (when ξ = 0) propagation requires already two, instead of one, force recalculation per time step.For this reason, we can come to an incorrect conclusion that such a propagation has no advantage over the Verlet algorithms.
In order to prove that the above conclusion is indeed incorrect, let us analyze more in detail the influence of truncation errors O 3 ∆t 3 on the result.The Taylor's ∆t-expansion of equation ( 15) yields so that the norm of O 3 with respect to the third-order commutators [A, [A, B]] and [B, [A, B]] (see equation (11)) is Then, the norm of local uncertainties appearing during a single-step propagation can be expressed in terms of d and ∆t as g = d∆t 3 .During a whole integration over a fixed time interval t, the total number l of such single steps is proportional to ∆t −1 (see equation ( 5)).As a result, the local third-order uncertainties will accumulate step by step leading, at t ∆t, to the second-order global errors Extended propagation (15) can now be optimized with respect to ξ by minimizing the function d(ξ).As can be readily verified, the minimum of d(ξ) is achieved at ξ = ξ 0 , where with d min ≡ d(ξ 0 ) ≈ 0.00855.On the other hand, the value d(0) of d corresponding to the Verlet algorithms (when ξ = 0) is equal to d(0) ≈ 0.0932, i.e., it increases in d(0)/d(ξ 0 ) ≈ 11 times.Remembering that the extended propagation requires two force evaluation per time step ∆t, it should be performed with double step size 2∆t with respect to that of the Verlet algorithms, in order to provide the same number of total force recalculations during the integration over the same time interval.Therefore, the extended propagation will be more efficient if the following inequality Γ 2 (ξ, 2∆t) < Γ 2 (ξ = 0, ∆t) takes place.Taking into account equation (18), such an inequality can be rewritten as d(0)/d(ξ) > 4, and thus it is fulfilled completely in the optimization regime.In particular, indicating that the optimized propagation, being applied even with double sizes of the time step, will reduce the global errors approximately three times.
In view of equations ( 9) and ( 15), more explicit expressions for the single-step propagation of position and velocity from time t to t + ∆t within the optimized VV-like algorithm are: whereas the optimized PV-like algorithm (when A ↔ B in equation ( 15)) reads: where the parameter ξ should take its optimal value ξ 0 (see equation ( 19)), and r I , r II , v I , and v II are the auxiliary quantities denoting positions and velocities of all particles in intermediate stages.
The algorithms are simple, require only slight modification with respect to the original Verlet versions, and can be easily implemented in the existing program codes.

Forest-Ruth-like algorithms of the fourth order
The fourth-order (K = 4) algorithm by Forest and Ruth (FR) [5] is reproduced from equation ( 6) at P = 4 as e (A+B)∆t+O(∆t 5 ) = e Aθ ∆t 2 e Bθ∆t e A(1−θ) ∆t 2 e B(1−2θ)∆t e A(1−θ) ∆t 2 e Bθ∆t e Aθ ∆t 2 (23) with 3512.Let us consider now an extended decomposition scheme of the fourth order by allowing P to exceed the necessary minimum (P = 4) by unity, i.e., letting P = 5.Remember that we cannot choose the number P to be too big, because this results in a too large number, namely P − 1, of expensive force recalculations.Choosing P = 5 we just hope to reduce the truncation errors significantly with a little additional computational cost, rather than to increase the order of the decomposition scheme (note that sixth-order integrators are derivable [4] beginning from P = 8).
For P = 5, the extended decomposition can be presented in the form e (A+B)∆t+O 3 ∆t 3 +O 5 ∆t 5 +O(∆t 7 ) = = e Bξ∆t e A(1−2λ) ∆t 2 e Bχ∆t e Aλ∆t e B(1−2(χ+ξ))∆t e Aλ∆t e Bχ∆t e A(1−2λ) ∆t 2 e Bξ∆t (24) following from equation ( 6) with Here the symmetry of time coefficients and the condition P p=1 a p = P p=1 b p = 1 have already been taken into account.The fourth-order conditions now read Equation ( 25) provides the cancellation (O 3 = 0) of third-order truncation uncertainties (see equation (11)) and thus describes a whole family of symplectic timereversible integrators of the fourth order.A particular member of the above family can be obtained by choosing the corresponding values for ξ, λ, and χ.As far as there are three parameters and only two constraints (see equation (25)), one of these parameters, ξ say, can be treated as a free one.Then, for example, putting ξ = 0, equation (24) reduces to the original FR algorithm (23).The extended (when ξ = 0) propagation will require already four, instead of three, force recalculation per time step.However, having a room in varying ξ, we can overcompensate the increased computational efforts by minimizing the fifth-order truncation uncertainties O 5 ∆t 5 .
In order to show this, let us consider explicit expressions for the γ-multipliers.They are: Then the norm of O 5 (see equation ( 12)) is and the global errors will behave as The optimization of the extended propagation (24) with respect to time coefficients ξ, λ, and χ can be carried out by finding the global minimum for the function γ(ξ, λ, χ), provided α = 0 and β = 0, i.e., solving the system of equations The result within sixteen significant digits is with the global minimum γ EFRL min ≈ 0.00065, where the abbreviation EFRL refers to the extended Forest-Ruth-like scheme (24).At the same time, the value of γ corresponding to the usual FR integration (23) is equal only to γ FR ≈ 0.028.We see, therefore, that applying the extended integration allows one to decrease the truncation errors approximately γ FR /γ EFRL min ≈ 43 times.Taking into account that such a decrease has been achieved by increasing the number of force evaluations per step from three to four, the extended propagation must be performed with step sizes which are a factor of 4/3 higher than those of the FR algorithm, in order to provide the same number of total force recalculations during the fixed overall interval of integration.Thus, we will come to more efficient calculations if the inequality so that the global errors can be reduced approximately 15 times with respect to the FR integration without spending any additional computational costs.

Classification of schemes through 11 stages
The decomposition schemes of higher orders and with larger stage numbers can be derived and optimized in a similar way as the above.Some of them have been reported in our recent papers [33,34,37].Herein, the optimization within the Suzuki composition approach [14] was considered as well [33].In this paper we present  (a) Particularly outstanding algorithms are shown by the bold font (b) The values corresponding to new algorithms (c) This nine-stages algorithm is, in fact, of the sixth order [34] only the final results with brief comments to the most important cases.All the decomposition algorithms, obtained by us within up to S = 11 stages, are collected in table 1.
In our classification, for each algorithm introduced we point out the order K of the integration as well as the numbers n f and n g of force and force-gradient evaluations per time step.The designations Err 3 , Err 5 , and Err 7 relate to the norms d, γ, and ζ = 10 i=1 ζ i 2 of third-, fifth-and seventh-order truncation errors, respectively (see equations ( 11)-( 13), (17), and ( 26)).Note that the number n f = P − 1 = (S + 1)/2 − 1 of force recalculations is directly connected with the number S = 2P − 1 of stages.The maximal possible value of n g is, thus, equal to n f .The minimal value is equal to zero and can be achieved when all the coefficients c p = 0 are equal to zero as well.The latter case will correspond to a scheme belonging to the non-gradient class.The force-gradient class will be filled up with algorithms having nonzero numbers n g .Different numbers n g will lead to different decomposition variants within the same value of S. In addition, we distinguished two versions, namely velocity and position ones, which are related to the first (a 1 = 0) and second (b P = c P = 0) type of boundary self-adjoint constraints.
The efficiency of the algorithms has been measured in terms of the quantity at G = 2, where Err K+1 is the leading error coefficient (i.e.Err 3 , Err 5 , or Err 7 , in dependence of the order K = 2, 4, or 6).The weight G reflects the fact that one gradient evaluation requires more (in factor G) processor time than that necessary for one force calculation.Each algorithm has been presented in an abbreviated form, so that the letters A and B correspond to the exponential operators exp(Aa p ∆t) and exp(Bb p ∆t), respectively, whereas the letter C denotes exp(Bb p ∆t + Cc p ∆t 3 ).The optimized values for time coefficients {a p , b p , c p } can be found in our quite recent paper [37].They have been obtained using the computer algebra packages, Maple 6 and Mathematica 4 (the two packages have been applied to ensure the correctness of the results and to minimize transcription mistakes).Each group of algorithms corresponding to the same number (S = 3, 5, 7, 9, and 11) of stages is separated by horizontal lines.Within the same group, the algorithms are allocated in the order of increasing of computational efforts.At the same values of n f and n g , the velocity versions (where the letters C or B, but not A, appear first) are written first as well.When all these parameters coincide among themselves, the preference is given to more precise schemes (with smaller values for the leading error coefficient, i.e. with higher efficiency).
It can be seen that within 11 stages there are 46 possible decomposition variants which lead to 45 (10 non-gradient plus 35 force-gradient) different algorithms overall (it has been established that the scheme CACACACAC actually reduces to BACACACAB, because the cancellation of fifth-order truncation errors is reached only when c 1 = 0).Among them, 4 non-plus 4 force-gradient integrators were known previously (see the references in the remarks to table 1).All other 37 schemes are new.According to the efficiency, the algorithms have been categorized using a threestar classification (the worst with no star and the best with 3 stars).Some of the integrators are particularly outstanding (they are marked by the bold font) because they lead to the most efficient computational scheme within a given order.For instance, the five-stages non-gradient algorithms (equation ( 15)), labelled by No. 5 and 6 in table 1, are the best among second-order decomposition schemes.The well-known Verlet integrators (equation ( 14)) No. 1 and 2, which are used in the majority of molecular dynamics simulations, are obviously worse (see also subsection 3.1).An interesting situation is observed in the fourth-order decomposition schemes.As we can see, the previously known seven-stages integrator by Forest and Ruth (equation ( 23)) No. 12 results in the worst (with no star) integration.The optimized nine-stages EFRL scheme No. 19 (equation ( 24)), described in subsection 3.2, is much better (two stars) but not the best.The best integrator of the order four is No. 36 and belongs to the gradient class with eleven stages.It allows to reduce the numerical uncertainties up to 120/0.32 ≈ 375 times at the same overall computational costs.The algorithm No. 23 leads to a similar reduction of numerical errors and appears to be the best within a nine-stages group.The scheme No. 30 also exhibits an equivalence in the efficiency and should be considered as the best within the non-gradient class.Finally, for the order six, the optimal integration can be carried out using the eleven-stages algorithm under No. 40.Its efficiency is higher by a factor of 0.0930/0.00067≈ 140 with respect to the worst sixth-order algorithm under No. 28.We see, therefore, that the best algorithms at each order considered belong to extended groups (where the number of stages exceeds the necessary minimum).Moreover, the efficiency of these new algorithms is higher with respect to the known schemes in factors from 3 up to 375 for orders 2 to 6.
It should be pointed out that formula (31) can be used to compare the algorithms of the same orders only.In view of the structure of this formula it follows that the relative efficiency of two algorithms is determined at a given K as the ratio of precisions which can be achieved during the integration within the same computational cost.The precision is introduced as the inverse to numerical uncertainties, i.e. as 1/Err.The leading term of these uncertainties behaves like Err (K) = C K Err K+1 ∆t K , where C K is the parameter which depends on the order K, on the system under consideration and on the measured quantity.For the same value of K, such a parameter is cancelled when determining the relative efficiency of one integrator with respect to another one.When considering algorithms of different orders, K and K + M say, one obtains that the relative efficiency is (K+M ) .It depends not only on the norms Err K+1 and Err K+M +1 of numerical uncertainties but also on the factor C K+M /C K .The last factor is not universal and cannot be evaluated theoretically.It should be investigated directly in simulations for each concrete system, where the final decision concerning the preference of higher-order algorithms over lower-order ones or vise versa can be given.
With the increase of the number of stages above 11, the number of decomposition variants increases too rapidly, so that the performance of a complete classification at S > 11 becomes difficult.Moreover, sufficiently high stage numbers will result in very cumbersome systems of non-linear order conditions.So that we may come to an unresolvable technical problem when trying to handle such conditions.Our investigations [34] have shown that because of the restricted capabilities of modern computers, the maximal number of stages, which can still be handled within the direct decomposition approach, is limited to 23.This corresponds to force-gradient algorithms of the order 8. Combining the decomposition method with an advanced composition technique allows us to increase the overall order up to 16.Note that in the earlier studies such extremely high-order algorithms were derived using the crude iterative procedure [27].As has been demonstrated [34], our composition technique is better in efficiency by factors 25 to 1000 for orders 8 to 12, respectively.For orders 14 and 16, the advanced composition schemes, at considerably smaller computational costs, permit to reduce the numerical uncertainties up to 100 000 times with respect to those of the standard iterative approach.

Application to molecular dynamics simulations
In order to verify our theoretical predictions presented in section 3, we have applied the optimized Verlet-and Forest-Ruth-like algorithms to MD simulations of a Lennard-Jones (LJ) fluid, and compared the results with those of the original approaches.The system under consideration was a collection of N = 256 particles interacting through a shifted LJ-like potential, ϕ(r) = Φ(r) − Φ(r c ) for r < r c and ϕ(r) = 0 otherwise, where Φ(r) = 4u[(σ/r) 12 − (σ/r) 6 ] is the genuine LJ potential.The particles were placed in a basic cubic box of volume V = L 3 , and the modification of Φ(r) with r c = L/2 ≈ 3.36σ as well as the periodic boundary conditions have been used to exclude the finite-size effects.The simulations were performed in a microcanonical ensemble at a reduced density of n * = N V σ 3 = 0.845 and at a reduced temperature of T * = k B T /u = 1.7.All the runs were of the length in l = 10 000 time steps and were started from an identical well equilibrated initial configuration ρ(0).The precision of the algorithms was measured in terms of the relative total energy fluctuations ) and denotes the microcanonical averaging.Note that if the equations of motion could be solved exactly, the above fluctuations should vanish, because in microcanonical ensembles the total energy is an integral of motion, E(t) = E(0).So that during approximate MD integrations, smaller values of E will correspond to a better precision in evaluation of phase trajectories.
The total energy fluctuations obtained in the simulations at the end of the runs for four (fixed within each run) dimensionless time steps, ∆t * = ∆t(u/mσ 2 ) 1/2 = 0.01, 0.005, 0.0025, and 0.001, are shown in figure 1 as depending on free parameter ξ.The subsets (a) and (b) of this figure correspond to the VV-(see equation ( 21)) and PV-(see equation ( 22)) like integration, respectively.As can be seen, all the dependencies E(ξ, ∆t) have one minimum which locates at the same point ξ ≈ 0.19 independently of the size ∆t of the time step.This point coincides completely with the minimum at ξ 0 (equation ( 19)) of function d(ξ) (equation ( 17)) which is included in figure 1 as well (dashed curves in the subsets).Moreover, the energy fluctuations E(ξ, ∆t) appear to be proportional to the norm Γ 2 (ξ, ∆t) of global errors (see equation ( 18)), and the coefficient of this proportionality almost does not depend on ξ and ∆t.In addition, at each step size considered the energy fluctuations decrease at the minimum more than ten times with respect to those at ξ = 0, that is in agreement with our predicted value d(0)/d(ξ 0 ) ≈ 11.
The MD results, obtained for the total energy fluctuations by the optimized VV-and PV-like algorithms (at ξ = ξ 0 ), are presented in figure 2. The calculated values are functions of the length of the simulations and are plotted by curves marked as OVV and OPV, respectively.For the purpose of comparison, the results corresponding to the original VV and PV integrators are also drawn there (curves marked as VV and PV).Note that for the original integrators, the time step within each subset was chosen to be always twice smaller than that of the optimized versions (this condition is necessary to provide the same number of force recalculations during the same observation time), namely, ∆t * = 0.005, 0.0025, 0.00125, and 0.0005 (see subsets (a), (b), (c), and (d), respectively).Note also that within the original Verlet algorithms, the higher-orders truncation uncertainties become too big at step sizes ∆t * > 0.005.In particular, then the ratio of the total energy fluctuations to the fluctuations in potential energy (the standard ratio for estimating the precision of the calculations) appears to be more than a few per cent.For this reason, such large step sizes cannot be used in precise MD simulations and, thus, are not considered in the present study.
As we see in figure 2, both the original (VV and PV) and optimized (OVV and OPV) algorithms exhibit very good stability properties.No systematic deviations in the total energy fluctuations can be observed for all the integrators.Instead, in each of the cases considered the amplitude of these deviations tends to its own value which does not increase with the further increase of the length of the simulations.However, this value appears to be significantly larger for the original versions VV and PV.On the other hand, using the optimized OVV and OPV algorithms even with double sizes of the time step allows us to decrease the unphysical energy fluctuations approximately by a factor of three.This is in an excellent accord with our theoretical prediction (20).Note also that the OPV algorithm is slightly better in energy conservation than its OVV version (whereas the VV integrator is better with respect to the PV counterpart).Furthermore, in view of the structure of equations ( 21) and ( 22), the OPV algorithm is more convenient when averaging macroscopic quantities.In particular, then the interparticle potentials can be calculated at the end of time steps simultaneously with the interparticle forces within the same loop, thereby increasing the time efficiency of the computations.
A similar pattern has been observed within the fourth-order integration, where the equations of motion were solved using the extended decomposition scheme (24).The total energy fluctuations, obtained in this case at the end of the runs, are plotted in figure 3 as functions of the parameter ξ for three dimensionless time steps, ∆t * = 0.00125, 0.0025, and 0.005.Note that the two other time coefficients λ(ξ) and χ(ξ), arising in the decomposition (24), are connected with ξ according to the order condition (25), so that γ(ξ, λ(ξ), χ(ξ)) ≡ γ(ξ) (see equation (26)).21) and ( 22), respectively).The simulation results are presented by circles connected by the solid curves.The function d(ξ) (see equation ( 17)) is plotted in both subsets by the dashed curve.24) and ( 29)) as well as the original FR integrator (equation ( 23)).The results corresponding to different values of the time step, namely, ∆t * = 0.00125 and 0.0009375, 0.0025 and 0.001875, 0.005 and 0.00375, as well as 0.01 and 0.0075 are presented in subsets (a), (b), (c), and (d), respectively.
Again, as in the case of second-order integrators (see figure 1), all the functions E(ξ, ∆t) display the global minimum at the same point on the ξ-axis, namely, at ξ ≈ 0.164, independently of ∆t.Such a point coincides with the minimum given by equation ( 29) for the function γ(ξ) (which is presented in figure 3 by the dashed curve).This is so because the fluctuations E(ξ, ∆t) prove to be proportional to the norm Γ 4 = γ∆t 4 (equation ( 27)) of fourth-order global errors.In addition, these fluctuations decrease at least 40 to 50 times with respect to those at ξ = 0, that is close to our predicted value γ FR /γ EFRL min ≈ 43.The EFRL-energy fluctuations obtained in the optimization regime (29) are presented in subsets (a), (b), (c), and (d) of figure 4 as depending on the length of the simulations for the time steps ∆t * = 0.00125, 0.0025, 0.005, and 0.01, respectively.The corresponding dependencies related to the original FR integrator (see equation ( 23)) are shown in this figure as well.Within the FR integration, the time step for each subset was by a factor of 4/3 smaller than that of the optimized version, i.e., ∆t * = 0.0009375, 0.001875, 0.00375, and 0.0075 (then the total number of force recalculations during a given observation time t ∆t is the same for the both approaches).As can be easily seen, the optimized EFRL algorithm leads to a much lower (15 to 25 times) level of numerical uncertainties than the original version, despite the usage of larger time steps.This is in a good agreement with our estimations obtained in subsection 3.2 (see equation (30)).
Some other fourth-order algorithms presented in table 1, including the forcegradient versions, have been applied to MD simulations as well [34,37].The estimated and actual efficiencies obtained for such algorithms were also close between themselves as was found for the cases considered above.

Conclusion
In the present study, a complete classification of all analytically integrable selfadjoint decomposition algorithms which include up to 11 stages has been given.The algorithms were consequently derived and the optimization processes used were described.As a result, we have found 37 (6 non-gradient plus 31 force-gradient) new schemes in addition to 8 (4 non-and 4 force-gradient) previously known integrators.It has been predicted theoretically and confirmed in our MD simulations that some of the new algorithms are particularly outstanding.They may lead to a much superior integration of the equations of motion in comparison with decomposition schemes widely used in the literature, such as, for example, the well-known Verlet and Forest-Ruth integrators.In particular, it has been shown that the new Verlet-like algorithms proposed should be considered as the best ones with respect to time efficiency among all possible decomposition integrators (with single splitting of the Liouville operator) of the second order.All the algorithms introduced are explicit (i.e., do not require any iteration), simple in implementation, and produce stable solutions which (like exact phase trajectories) are symplectic and time reversible.
The approach presented can also be adapted for the optimization of the integration of motion in more complicated systems, where splitting of the Liouville operator into more than two parts may pay dividends.The examples are multi-component fluids and systems with long-range interactions, when characteristic time scales of the dynamic processes can differ by many orders from each other.In some cases, e.g.systems with orientational degrees of freedom, additional splitting may be necessary to obtain analytically integrable parts.These and related problems will be considered in our further investigations.

Figure 1 .
Figure 1.The total energy fluctuations obtained in the MD simulations for different values of free parameter ξ at four reduced time steps, ∆t * = 0.01, 0.005, 0.0025, and 0.001, using the VV-(subset (a)) and PV-(subset (b)) like integration (equations (21) and (22), respectively).The simulation results are presented by circles connected by the solid curves.The function d(ξ) (see equation (17)) is plotted in both subsets by the dashed curve.

Figure 2 .
Figure 2. The total energy fluctuations as functions of the length of the MD simulations performed using the optimized VV-(solid curve marked as OVV) and PV-(dashed curve, OPV) like algorithms, as well as the original VV (solid curve, VV) and PV (dashed curve, PV) integrators.The results corresponding to different values of the time step, namely, ∆t * = 0.01 and 0.005, 0.005 and 0.0025, 0.0025 and 0.00125, as well as 0.001 and 0.0005 are presented in subsets (a), (b), (c), and (d), respectively.

Figure 3 .
Figure 3.The total energy fluctuations obtained in the MD simulations with the EFRL integration (equation (24)) for different values of free parameter ξ at three fixed time steps, ∆t * = 0.00125, 0.0025, and 0.005.The simulation results are presented by circles connected by the solid curves.The function γ(ξ) (see equation (26)) is plotted by the dashed curve.

Figure 4 .
Figure 4.The total energy fluctuations as functions of the length of the MD simulations performed using the optimized EFRL algorithm (equations (24) and (29)) as well as the original FR integrator (equation (23)).The results corresponding to different values of the time step, namely, ∆t * = 0.00125 and 0.0009375, 0.0025 and 0.001875, 0.005 and 0.00375, as well as 0.01 and 0.0075 are presented in subsets (a), (b), (c), and (d), respectively.

Table 1 .
The complete family of self-adjoint decomposition algorithms with up to eleven stages