On extrapolation of virial coefficients of hard spheres

Several methods of extrapolating the virial coefficients, including those proposed in this work, are discussed. The methods are demonstrated on predicting higher virial coefficients of one-component hard spheres. Estimated values of the eleventh to fifteenth virial coefficients are suggested. It has been speculated that the virial coefficients, B_n, beyond B_{14} may decrease with increasing n, and may reach negative values at large n. The extrapolation techniques may be utilized in other fields of science where the art of extrapolation plays a role.


Introduction
Virial coefficients play a key role in both molecular and phenomenological theory of fluids at low and medium densities. They are the coefficients in the density expansion of the equation of state expressed via the compressibility factor where β = 1/(k B T ) is the inverse temperature, P is the pressure, and ρ is the number density. Virial coefficients B i are defined by exact formulae in terms of integrals whose integrands are products of Mayer functions [1].
In this paper we concentrate on the simplest non-trivial model of fluid -the system of hard spheres. For hard spheres, i.e., hard body one-component fluids, the virial coefficients are numbers (they do not depend on temperature). For the system considered, the terms up to B 4 are known analytically [2,3]. It holds B 1 = 1,   The values are in the units of the packing fraction η = π 6 σ 3 ρ with σ standing for the sphere diameter.
The higher virial coefficients must be calculated numerically. Virial coefficients are sums of irreducible (cluster) integrals represented by diagrams [1]. It is a task for a high school student to determine the second virial coefficient for hard spheres. However, dimensionality of irreducible integrals increases rapidly with increasing order of the virial coefficient. In three dimensional space it is 3n − 6. For example the tenth virial coefficient contains 24-dimensional integrals (!) in the simplest case of spherically symmetric interactions between molecules.  d   unlabeled  labeled  unlabeled  labeled   2  1  1  1  1  1  3  1  1  1  1  3  4  3  10  2  4  6  5  10  238  5  68  9  6  56  11 368  23  3 053  12  7  Numbers of integrals (no general formula is known for the number of irreducible diagrams in dependence on n) increase even much more rapidly than the dimensionality of the integrals, see table 1.
Some of them are topologically equivalent which means that their integrands differ only in the numbering of variables. In the table, the heading "labeled" denotes the total number of irreducible diagrams and "unlabeled" denotes the number of topologically different diagrams with numbered black points. To reduce these numbers, Ree and Hoover in their pioneering work [4] replaced Mayer diagrams with fbonds by generalized Ree-Hoover diagrams with f -bonds and e-bonds using the identity f (r ) + e(r ) = 1.
Besides the reduction of the number of topologically different diagrams, there is another advantage of the Ree-Hoover approach: computer codes using these diagrams are more efficient than the Mayer approach.
For higher virial coefficients, say n > 7, it is impossible within an average lifetime of an explorer to determine the diagrams and their weights using "pencil and paper" avoiding "human factor" errors.
For example, for B 8 there are 7 123 different Mayer diagrams which can be reduced to 2 606 Ree-Hover diagrams, and for each of them their weights should be determined. The analysis should be done automatically on a computer. Different algorithms utilizing symbolic algebra programming and automatic code generation have been proposed [5][6][7][8].
Another problem is to evaluate the multi-fold integrals. This can be done by random-shooting Monte Carlo integration but it may require too much computer time. A more efficient way is to start from the so-called spanning diagrams -the integrals that can be calculated analytically. Configurations of the spanning diagrams are sampled either using a standard Metropolis Monte Carlo method or (for the linear chains) by the so-called reptation [9]. The values of the diagrams of interest are then evaluated by Monte Carlo integration. The fifth virial coefficient for hard spheres was calculated by Rosenbluth and Rosenbluth [10] and by Kratky [11][12][13][14], the sixth by Ree and Hoover [4], the seventh by Ree and Hoover [15], by Kim and Henderson [16], and by Janse van Rensburg and Torrie [17], the eighth by Janse van Rensburg [18] and by Vlasov et al. [5], the ninth by Labík et al. [6], and the tenth by Clisby and McCoy [7,8]. As a rule, while higher virial coefficients were just calculated, the lower ones were more precisely recalculated. The stateof-the-art values of the virial coefficients for hard spheres are shown in table 2 including their estimated uncertainties.

Extrapolation of Virial Coefficients
It is prohibitively difficult to calculate virial coefficients beyond B 10 because the numbers of diagrams increase enormously. There are 2 n 2 diagrams with n field points that should be analyzed. It is 2 45 for n = 10, 2 55 for n = 11, 2 66 for n = 12, etc. The number of diagrams increases by more than three orders with n going from 9 to 10. It is thus practical to estimate higher virial coefficients for n > 10 rather than to try to calculate them.
There are several approaches to extrapolation of virial coefficients based on an (unjustified) assumption that the higher coefficients depend on the lower ones. In the theory of fluids the most popular are Padé approximants, see [19] and references therein. They are based on the assumption that the compressibility factor may be expressed in the form where P n and Q m are polynomials of the order n and m, respectively, whose coefficients are combinations of the known virial coefficients. By expanding equation (2.1) into the Taylor series, higher virial coefficients are obtained. Differential approximants [20] are an extension of Padé approximants. The first order differential approximants have the form where P n , Q m , and R m are polynomials; for P n (η) = 0, the Padé approximants are revealed. Differential approximants have not been used in the theories of fluids so far. These two approaches utilize only a part of available information, namely the lower virial coefficients. Another source of information are computer simulation data on the compressibility factors. They can be utilized by proposing an equation of state (EOS) some of its parameters being determined from the known virial coefficients and the remaining constants fitted to the simulation data. For example, Kolafa et al. [21] used an equation of state in the form proposed by Barboy and Gelbart [22] where lower a i are combinations of the known B i and the higher ones are fitted to the computer simulation data. By expanding the equation in powers of η, the higher virial coefficients may be estimated.
This approach does not take into account the suggested convergence limit at high densities [23]. Therefore, we extended the form (2.3) by changing the pole η = 1 into where η pole is a (nonlinear) adjustable parameter. This pronounced pole should not be confused with a weak nonanalyticity at the freezing point discussed below. All the traditional extrapolation methods suffer from an assumption that the values of lower virial coefficients are known precisely. This is, certainly, not true, see table 2. To be accurate, uncertainties in the lower virial coefficients (as well as in EOS simulation data) should be taken into account [24].

23004-3
We believe that the safest way of utilizing both the lower virial coefficients data and the EOS data is to minimize the objective function where n is a number of known virial coefficients, σ i are their standard deviations, k is the number of simulated state points, and σ j are their standard deviations. Z j is the value of the compressibility factor at density η j given by a chosen compressibility factor in the form of equation (2.1) (Padé approximants), or in the form of equation (2.2) (differential approximants), or in the form of equation (2.3) (Barboy-Gelbart equation).
As a measure of fit accuracy we use a standard deviation S where p is a number of adjustable parameters.

Barboy-Gelbart approach
Let us return to equations (2.3) and (2.4). We allow for zero coefficients a i s in the expansions. This improves precision without adding more adjustable parameters especially at higher densities. For instance, equation (10) of [21], valid up to ρ = 0.98, uses powers x 1 to x 8 and x 12 . Therefore, the choice of powers serves as another (discrete) adjustable parameter. In order to elucidate the ability of expansions to predict unknown virial coefficients we repeated the calculations with forms (2.3) and (2.5), using various sets of input data. That is, we "forget" that we know the virial coefficients up to B 10 and use only a subset to n, 4 n 10.
Moreover, we employ only subsets of available EOS data. For ρ = 0.94 the error in B 11 ± 1 propagates to the error in Z ± 0.00083 which is 8 times the standard deviation of the MD datum. The density range lower than ρ max < 0.94, would not contain enough information to allow for extrapolating the virial coefficients. On the other hand, the equation of state is nonanalytical at the freezing point (density 0.947) [25], although the nonanalycity term is tiny and in moderately metastable region it is not detectable within available precision. Therefore, we choose ρ max = 0.98 as a compromise and calculate all fits with five EOS sets from low densities up to ρ max = 0.94, 0.95, 0.96, 0.97, 0.98.     The number of adjustable parameters is important in obtaining relevant predictions. We started with a small set and increased the number of adjustable parameters until the value of S dropped below 1.5. However, we never used more parameters if S < 1 was already reached to avoid a fitting noise. For each set of virial coefficients up to M, 4 M 10, we analyzed several maximum powers in expansion (2.3) with all the above mentioned EOS data sets. The results were sorted into three categories: the best with S < 1, the good ones with S < 1.2, and the worst results with S < 1.5. Then, we plotted the curves of the predicted B n vs. n and graphically determined the median value and also tried to estimate the uncertainty from the data scattering. This error estimate is rather sensitive to the systematic errors which are larger than the statistical ones. We admit a subjective factor present in this procedure. One example is shown in figure 1.
The results, based on more than a thousand of equations analyzed, are collected in tables 3 and 4. It is seen that the procedure is capable of predicting the virial coefficients even though the errors are 23004-5 sometimes underestimated. Of course, the virial coefficients obtained are less accurate than the directly obtained MC values. Only for B 10 , the precision of the predicted value based on all previous coefficients approaches the precision of the MC result [7,8]. In addition, the accuracy of B 14 and especially B 15 decreases.

Padé approximants
Simultaneous fitting of MD data and virial coefficients to rational functions (Padé approximants) is much more difficult because the equations are highly nonlinear. The approximants often give oscillating solutions from B 12 or B 13 ; we omit these solutions because we assume that the coefficients should behave "regularly".

Reliability of extrapolations
While the statistical errors in the predicted higher virial coefficients can be easily evaluated, this cannot be said about the systematic errors introduced by a particular functional form. The only way to assess the systematic errors is to compare as many functional forms (Padé indices, powers in polynomials, etc.) as possible. Then, we subjectively estimate (after cancelling the evidently defective values) the "most probable" values of them.
An example of a comparison of different methods of extrapolation is shown in  n this work ref. [7,8] ref. [26] ref. [27] ref. [28] ref. [ We cannot assert that really B 15 < B 14 ; in fact, oscillating behavior is typical of polynomial approximants and these results are consistent with a monotonous increase, at least for not too large n. Ultimately the sequence of B n cannot be monotonous because the equation of state is not analytical at the freezing point and consequently its radius of convergence cannot exceed the freezing density (we believe that it equals the freezing density).

Concluding remarks
In this work we have used several extrapolation methods to estimate the unknown virial coefficients of hard spheres from B 11 to B 15 and compared them with recent literature estimates. We believe that the presented results could be used in testing theoretical approaches of the thermodynamics of hard spheres.
The problem of reliability of any extrapolation result rests in the estimate of its uncertainty. This is, in principle, subjective depending on an author's view (either optimistic or pessimistic). Here we have tried neither to be too optimistic nor too pessimistic. We have used comparisons of extrapolated virial coefficients obtained using as many different approaches as possible.
The techniques used in this work can be readily extended to virial coefficients of hard body fluids such as spherocylinders, diatomics and multi-atomics, mixtures of hard spheres and other hard body fluids and their mixtures. What one needs is i) reliable values of lower virial coefficients and estimates of their uncertainties, and ii) precise equation-of-state computer simulation data. An extrapolation of virial coefficients of non-hard-body fluids is less straightforward due to their dependence on temperature.