A machine learning approach to the Berezinskii-Kosterlitz-Thouless transition in classical and quantum models

The Berezinskii-Kosterlitz-Thouless transition is a very specific phase transition where all thermodynamic quantities are smooth. Therefore, it is difficult to determine the critical temperature in a precise way. In this paper we demonstrate how neural networks can be used to perform this task. In particular, we study how the accuracy of the transition identification depends on the way the neural networks are trained. We apply our approach to three different systems: (i) the classical XY model, (ii) the phase-fermion model, where classical and quantum degrees of freedom are coupled and (iii) the quantum XY model.


Introduction
In many cases, thermodynamic phase transitions are clearly visible with well defined and easily identifiable critical points. In the Landau picture, we define a macroscopic order parameter that is nonzero in the ordered phase and vanishes when we cross the critical temperature. Typically, the transition is signaled by a discontinuity or divergence of some thermodynamic quantities such as specific heat or magnetic susceptibility. There also exist unconventional phase transitions which are much more difficult to find. One example are topological phase transitions, connected to the formation of topological defects, such as dislocations in two-dimensional crystals, vortices in two-dimensional superconductors and so on. The proliferation of these defects leads to the Berezinskii-Kosterlitz-Thouless (BKT) phase transition [1][2][3][4] where thermodynamic quantities behave smoothly. Therefore, traditional detection methods based on, e.g., the divergence of the specific heat or the spin susceptibility, cannot be used. In numerical approaches, the main difficulty with the identification of the BKT transition stems from the fact that the interaction between the topological charges depends logarithmically on the spatial separation. Therefore, numerical results converge very slowly with the size of the system, and a precise determination of the critical temperature is a computationally challenging task. The standard approach is based on the scaling properties of, e.g., the spin stiffness or superfluid density. This is particularly difficult for quantum models, where numerical methods are usually very involved and memory-and time-consuming.
Besides the traditional methods which rely on the idea of the order parameter or quantities accessible in experiments, such as the specific heat or magnetic susceptibility, attempts to use artificial neutral networks to identify phases in condensed matter and transition between them have been recently undertaken [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20]. These new methods have proven to be accurate and reliable for classical Ising-type models [5][6][7]. Less spectacular progress has been made for quantum systems [15,16] and for systems with topological phase transitions [17][18][19][20]. In the latter case, the main difficulty comes from the fact that the topologically ordered phase is not described by a local order parameter [1][2][3][4]. Instead, its formation is connected with suppression of non-local topological defects which are difficult to identify.
In this paper, we demonstrate the application of machine learning approaches to identify topological transitions in a few different types of two-dimensional classical and quantum systems. In particular, we study the classical XY (c-XY) model, the phase-fermion (PF) [21] where the interaction is only between quantum and classical degrees of freedom, and the fully quantum XY (q-XY) model.
The first of these models has already been thoroughly analyzed in [17]. The authors have shown there that treating spin configurations as raw images in the case of a feed-forward network does not lead to the correct value of the critical temperature. Instead, they propose to preprocess the spin configurations into vorticity and then use the results to train two kinds of artificial neural networks (ANN): a one-layer feed-forward network and a deep convolutional network. In both cases, the results are scaled with the system size towards the correct value of the critical temperature, but the one-layer network performed poorly for large systems. In the present approach, we do not have convolutional layers, but we use a deep feed-forward network composed of four fully-connected layers (we learned that the choice of particular meta-parameters is not crucial for the network performance). What is important is that, instead of using raw spin configurations where each spin is represented by a number from 0 to 2π (which gives rather inaccurate results, as demonstrated in [17]), we represent the configurations as vectors of sines and cosines of the spin angles, which reflects the system's symmetry.

Models
The first model describes classical spins of unit length in a square lattice with a nearest neighbor interaction given by the following Hamiltonian where J is the coupling between the spins, and θ i describes the direction of spin i. It is known that this model exhibits the BKT phase transition, and precise finite-size scaling gives the critical temperature T BKT ≈ 0.8935 in units of J [22]. In the next model, classical spins θ i interact with fermions. The Hamiltonian of the PF model is given by whereĉ † iσ (ĉ iσ ) is an operator that creates (annihilates) spin-σ electron at the lattice site i, and g describes the strength of the interaction between the classical and quantum degrees of freedom. We set the hopping integral t as the energy unit (t = 1). In this model, the fermions mediate an effective interaction between the classical spins θ i which also leads to the BKT phase transition. The critical temperature is a function of g, and for g = 2 Monte Carlo (MC) simulations give T BKT ≈ 0.12 [21]. The PF model can be treated as an approximation of the boson-fermion model [23], valid when fluctuations of the number of bosons at a lattice site can be neglected.
The Hamiltonian of the last model, the quantum XY model, is given by wheren i is the number operator that is canonically conjugate to the quantum phase operatorθ i , and E c is the charging energy.
To train neural networks and to classify phases, one needs extensive sets of spin configurations generated at different temperatures. They were produced with the help of the MC simulations. In the case of the classical XY model, we directly used the Metropolis algorithm. For the PF model, we also used the Metropolis algorithm, but in each MC step (i.e., for each generated spin configuration) we needed 33602-2 to diagonalize the fermionic Hamiltonian (2.2) [24]. For the quantum XY model, we used a quantum MC method in which the Hamiltonian was mapped to a classical action of spins on an effective (2+1)D lattice. We then used a Wolff cluster algorithm to simulate these spins. In all cases, the simulations were performed on 16 × 16 systems.
In all these models, the helicity modulus Υ, at finite temperature defined as the second derivative of the free energy with respect to an externally imposed global twist across the sample [25], has a universal jump at the BKT transition. In the thermodynamic limit, it drops at T BKT from 2 π T BKT to zero. However, in finite systems it evolves smoothly and converges only logarithmically to the thermodynamic limit, as shown in figures 1 (a), 1 (c), and 1 (e). A finite size scaling of the temperature at which Υ(T) crosses 2 π T can give a rough estimate of T BKT . This is demonstrated in figures 1 (b), 1 (d), and 1 (f). It is also possible to determine T BKT in a more precise way. At the critical temperature, the helicity modulus scales with the system size according to the Kosterlitz renormalization group (RG) equations [26], where L is a linear size of the system and C is a constant. Therefore, if one fits Υ L (T) calculated in MC simulations for different system sizes L to the RG predictions, the fitting errors drop almost to zero exactly at T BKT [29]. In figure 2 one can see the procedure.

Artificial neural network
Here, however, we study how the critical temperature can be determined with the help of ANN trained to identify the low-and high-temperature phases. When a system approaches the critical temperature, thermal fluctuations increase drastically so that the spin configurations can be very different from configurations generated at very low temperatures. However, as long as T is below T BKT , the system is still in the same low-temperature phase. The problem with systems where the BKT transition takes place, such as the c-XY, PF or q-XY models, is that in the low-temperature phase, MC simulations on finite clusters show finite magnetization, which according to the Mermin-Wagner theorem cannot exist in the thermodynamic limit. Since the MC results are used to train the ANN, it is possible that the network will learn finite-size features of the spin configurations, such as magnetization. It was proposed in [17] that this difficulty can be overcome by adding a convolutional layer designed to identify topological defects in spin configurations. Then, the network learns configurations of vortices and antivortices instead of raw spin configurations.
Our aim here is to show how the features that can be used to identify phases depend on the distance from the critical temperature T BKT . The standard ANN-based approach to the problem of finding a phase transition is to train the ANN to recognize features of the low-and high-temperature phases. Within the scheme of supervised learning, one needs to feed the ANN with labelled spin configurations generated at low and high temperatures. The network is supposed to extract characteristic patterns and to learn how to classify configurations which were not used at the training stage. Usually, this is an easy task for configurations generated at very low or at very high temperatures, since they are clearly distinctive. However, to precisely determine the critical temperature, the ANN must distinguish configurations generated slightly below T BKT and slightly above T BKT . Thermal fluctuations in this regime make those configurations very different from the fully ordered low-temperature configurations and from completely random high-temperature configurations. The question then is, whether an ANN trained at extreme temperatures will be capable of classifying the phases close to T BKT ? In other words, do the 33602-4 distinctive features learned by the ANN at the extremes balance each other out just right in the evaluation process such that the correct T BKT is predicted? To answer this question we trained the ANN at different distances from T BKT and checked how the distance |T − T BKT | affects the accuracy of finding T BKT . To be precise, we used MC simulations to generate sets of configurations {C 1 }, {C 2 }, . . . , {C N } at temperatures T 1 , T 2 , . . . , T N (T 1 < T 2 < . . . < T N ) below and above T BKT . For the c-XY, PF, and q-XY models, the ranges of temperatures were from 0.1J to 1.6J, from 0.02t to 0.2t, and from 0.1J to 1.5J, respectively. Then, we generated sets of configurations representing different low-(L m ) and high-temperature (H m ) ranges: where m < N/2 is the number of temperatures in each range. We randomly removed from L m and H m some number of configurations to keep the cardinality of these sets fixed. In this way, our study, on how the critical temperature predicted by the ANN depends on the range of temperatures used to train it, was not affected by a different number of configurations for different ranges. In order to connect m with a temperature range used in the trainings, we introduce τ as a measure of the relative temperature range: where T low and T high are the lowest and the highest temperatures lower than T BKT which were used to train the ANN. Configurations from L m and H m were mixed together and shuffled and then they were used to teach the ANN to identify the low-and high-temperature phases.
The spin configurations generated in MC are stored as numbers θ i from 0 to 2π. However, in order to take into account the character of classical two-dimensional spins, the configurations were rewritten as an array composed of cosines and sines: cos θ 1 , . . . , cos θ N , sin θ 1 , . . . , sin θ N , where N = L × L is the number of lattice sites. This is equivalent to a representation by complex numbers and has the advantage that almost parallel spins are represented by close numbers, which is not the case for the original representation by the angles θ i .
Since the generation of extensive sets of spin configurations for different system sizes and at different temperatures is a time-consuming task, especially for the PF and q-XY models, we used a technique known from machine learning-based image recognition to increase the number of configurations without additional MC simulations. Namely, we transformed the original configurations according to symmetries of the system. We used periodic boundary conditions, which allowed us to apply translations to create new configurations. Other transformations were reflections and rotations. Figure 3 schematically shows how the ANN is used to classify spin configurations.
The ANN was implemented using the KERAS package with TENSORFLOW as the computational backend. We used a deep feedforward network with four fully connected hidden layers with 512, 192,  64, 16, and 16 neurons. Such a structure was a balance between the number of training epochs required for convergence and the time needed for one epoch. It turns out, however, that the metaparameters are not crucial for the ANN performance. As the activation function, a rectified linear unit (ReLU) was used in the hidden layers and sigmoid function in the output layer. The network was trained to minimize the distance between the MC data and the model predictions defined by the binary cross entropy. The loss function is given by where y i are labels and p i are the corresponding predictions. We found that with the Tikhonov regularization (L2) [31], the network performs better in the classification of the low-and high-temperature phases. Figure 4 shows an example of learned weights used to classify phases of a 16 × 16 system. One can see that despite a rather large size of the network, most of the neurons are activated. Figure 5 shows how the predictions for T BKT of the c-XY model calculated by the ANN depend on the way the network was trained. In each case, we trained the neural network using a 10-fold cross validation technique [32] and repeated this procedure 10 times. As a result, we obtained 100 possible values of probability P that a given configuration belongs to the high-temperature phase (the probability that it belongs to the low-temperature phase is 1 − P). Each time, the ANN was initialized with random weights and biases, and a larger spread of the predictions indicates a greater difficulty in an unambiguous classification of the phase. The same method of multiple trainings starting from different random weights and biases was used in [13] to determine the standard error of the predicted critical temperature. One can see in figure 5 that even if the ANN was trained only at the extreme temperatures (T = 0.1 and T = 1.6, m = 1) corresponding to a fully ordered and completely random configurations, the average predicted T BKT is not far from the actual value. This could be an accidental coincidence because with an increasing m the deviation slightly increases, but always remains below 10%, which can be seen in figure 5 (d). For m = 16, the deviation is smaller than the line width. The problem, however, is that in this case, the uncertainty is large. So, for a precise determination of T BKT , the averaging over a large number of configurations generated at different temperatures is necessary. For example, for m = 1, the average prediction for 1000 statistically independent configurations is less than 1% off the actual T BKT , but individual predictions are spread within the range of ±18% around this value. This means that if one saves the computational time required to generate configurations for training, more configurations should be generated for the evaluation stage. With an increasing width of the temperature range used for training, the spread of the calculated probabilities decreases significantly. Figure 5 (d) shows how the average T BKT depends on the number of different temperatures used in training. On the right-hand vertical axis, showing the relative error, one can see that the error is always below 10%, and with an increasing number m, it converges to the actual value of T BKT for the c-XY model. In the case of the PF model, the results are different. As can be seen in figure 6, the spread of the calculated probabilities is less dependent on m, but the average critical temperature strongly depends on m. This means that for the PF model, an increase of the number of configurations used at the stage of phase classification will not guarantee a more precise estimation of T BKT . Instead, for this model, a sufficiently The distribution of probabilities calculated for the q-XY model is similar to that for its classical counterpart. It is presented in figure 7. Though for m = 1, the estimated critical temperature differs from its real value by 14% [see figure 7 (d)], the difference decreases very quickly with an increasing m and already for m 3, the relative error is around 2%.

Results
The results show that in the case of the PF model, a much richer set of configurations is required to properly train the ANN than for the XY models. The reason can be connected to a different character of this model. In both the classical and quantum XY models, the interaction range is limited to nearest neighbors. On the other hand, fermions in the PF model mediate the effective interactions between arbitrarily spaced lattice sites. This effect is seen in figure 1: at low temperature the helicity modulus in the c-XY and q-XY models converges even for very small systems. This is not the case for the PF model, where even at very low temperatures (i.e., in an almost fully ordered state) the energy per lattice site depends on the system size. This results from the delocalization of fermions which in the ordered state are similar to quantum particles in an infinite quantum well, with their energies strongly dependent on the size of the well.

Summary
We have demonstrated how the accuracy of finding the BKT transition in three different models depends on the range of temperatures at which the ANN was trained. We used a simple feedforward network with densely connected hidden layers. We did not perform any feature engineering of the spin configurations generated in MC simulations and we did not use convolutional layers. Therefore, the phase classification was based on raw spin configurations, not on the explicitly extracted vortices as in [17]. Nevertheless, in figure 5 (c) we compare the calculated probabilities and magnetization that results from the finite size of the system. One can see there that the section of P(T) that indicates the BKT transition is much steeper than the temperature dependence of the magnetization, even if the network was trained on the extreme temperatures [ figure 5 (a)]. Therefore, we believe that the ANN learns not only the magnetization (which would vanish in the thermodynamic limit), but also some topological features connected with the BKT transition. One also cannot exclude that the ANN is capable of learning the character of the spin-spin correlations which change their behavior at the BKT transition. To confirm this, however, at least a finite size analysis of the ML results would be necessary, which has not been performed here. However, our aim was different -we wanted to demonstrate how the critical temperature determined by the ANN depends on the composition of the training set. As one can expect, the larger is the variety of the configurations representing the low-temperature and high-temperature phases, the better is the accuracy of the critical temperature. We found, however, that for the c-XY and q-XY model, the average T BKT was close to the actual one even if the ANN was trained relatively far from the critical point. Increasing the range of temperatures at which the network was trained, only slightly improves the numerical accuracy (i.e., the difference between the average T BKT determined by the ANN and the value found from the RG equations), but significantly reduces the uncertainty. The situation is different for the PF model, where the numerical accuracy is strongly dependent on the temperature range used at the training stage. We attribute this behavior to the long-range effective interaction present in the PF model which can lead to a longer range of the spin-spin correlations and their different temperature dependence. Despite the difference in the results for the XY models and the PF model, in all cases it is important to train the ANN not only at very low and at very high temperatures, but also as close as possible to the critical temperature. The main problem in training at temperatures close to T BKT is that for supervised learning, the configurations must be labelled, so one should know at least an approximate value of the 33602-9 critical temperature. One of the ways to overcome this difficulty is to use the learning by confusion approach [14] based on a combination of supervised and unsupervised techniques.