Football fever: self-affirmation model for goal distributions

Throughout Europe football undoubtedly is a social and economic phenomenon on a large scale with thousands of players organized in several levels of leagues in most countries and an audience in the arenas as well as in front of the television sets counting at least tens of millions. Naturally, an event of such dimensions has triggered a signican t number of research studies, for instance concerned with the improvement of game tactics or the question of the predictability of results. Clearly less eort has been devoted, it seems, to the understanding of football (and other ball sports) from the perspective of the stochastic behavior of co-operative \agents" (i. e., players) in


Introduction
Throughout Europe football undoubtedly is a social and economic phenomenon on a large scale with thousands of players organized in several levels of leagues in most countries and an audience in the arenas as well as in front of the television sets counting at least tens of millions.Naturally, an event of such dimensions has triggered a significant number of research studies, for instance concerned with the improvement of game tactics or the question of the predictability of results.Clearly less effort has been devoted, it seems, to the understanding of football (and other ball sports) from the perspective of the stochastic behavior of co-operative "agents" (i.e., players) in abstract models.From a statistical mechanics point-of-view, however, this problem appears to be an ideal testing ground for the application of its simplified stochastic models to subjects outside of condensed-matter physics [1,2].A number of such attempts to study ball sports with a statistical mechanics machinery have been reported earlier; see, for instance, the examples collected in [3].
Score distributions of football and other ball games have been occasionally considered by mathematical statisticians for more than fifty years [4][5][6][7][8][9][10].The first studies, involving only very limited statistical data, seemed to indicate that the resulting score distributions were compatible with the simplest assumption of a completely random, Poissonian process with a fixed, albeit possibly teamdependent, scoring probability [4].Somewhat later, due to the observed deviations (especially) of the tails of the empirical distributions from the Poisson shape, the latter was, on purely empirical grounds, abandoned in favor of a negative binomial distribution (NBD) [11].The NBD occurs naturally for a mixture of Poissonian processes with a certain distribution of (independent) success probabilities [5].Much more recently, the availability of significantly more extensive collections of scoring data led the authors of [10] to the conclusion that, at least for some leagues, histograms of football scores resembled rather the generalized distributions of extreme value statistics [12] than any other distribution that had been considered so far.In total, these studies yielded a rather heterogeneous picture, offering statistical descriptions of the empirically observed goal distributions purely on the grounds of best fit, that is, without suggesting any microscopic justification for the manifestation of the assumed analytical forms of probability densities.Furthermore, for a system of highly co-operative entities it might be presumed that such models without correlations cannot be an adequate description anyway.
The distribution of extremes, i. e., the probability density function of (k th ) maximal or minimal values of independent realizations of a random variable, is described by only a few universality classes, depending on the asymptotic behavior of the original distribution [12].Apart from the direct importance of the problem of extremes in actuarial mathematics and engineering, generalized extreme value (GEV) distributions have been found to occur in such diverse systems as the statistical mechanics of regular and disordered systems [13][14][15][16][17][18][19], turbulence [20] or earthquake data [21].However, in most cases global properties were considered instead of explicit extremes, and the occurrence of GEV distributions led to speculations about hidden extreme processes in these systems, which could not be identified in most cases, though.It was only realized recently that GEV distributions can also arise naturally as the statistics of sums of correlated random variables [22][23][24], which could explain their ubiquity in physical systems.
For the problem of scoring in football, correlations naturally occur through processes of (positive and negative) feedback of scoring on both teams, and we shall see how the introduction of simple rules for the adaptation of the success probabilities in a modified Bernoulli process upon scoring a goal leads to systematic deviations from Gaussian statistics.We find simple models with a single parameter of self-affirmation to best describe the available data, including cases with relatively poor fits of the NBD.The latter is shown to result from one of these models in a particular limit, offering an explanation for the relatively good fits observed earlier.For the models under consideration, exact recurrence relations and precise closed-form approximations of the probability density functions can be derived.Although the limiting distributions of the considered models in general do not follow the statistics of extremes, it is demonstrated how alternative models leading to GEV distributions could be constructed.The best fits are found for the models where each extra goal encourages a team even more than the previous one: a true sign of football fever .

Probability distributions
Considering a simplified statistical description of a football match, the most natural quantity to start with is certainly given by the overall score of the game.Remaining on this level of a first approximation, we restrict ourselves here to the analysis of the distributions of goals scored by the home and away teams in football league or cup matches.Disregarding any effects of player skill, and thus degrading football to a pure game of chance, one might start out assuming independent and constant probabilities of each team for scoring during an appropriate time interval of the match.Since the scoring probabilities will be small, the resulting probabilities of final scores will follow a Poissonian distribution, where n h and n a are the final scores of the home and away teams, respectively, and the parameters λ h and λ a are related to the average number of goals scored by a team, λ = n .As an additional check of the fit to the data, one might then also consider the probability densities of the sum s = n h + n a and difference d = n h − n a of goals scored under the assumption of Poissonian distributions for n h and n a , where I d is the modified Bessel function (see [25], p. 374).Note that P Σ λ h ,λa (s) is itself a Poissonian distribution with parameter λ = λ h + λ a .
Clearly, even the most pessimistic football fan would tend to allow for some effects of player behavior on the outcome of a match and hence consider the assumption of constant and independent scoring probabilities for the teams as not appropriate for real-world football matches.Since we are interested in averages over the matches during one or several seasons of a football league or cup, one might expect a distribution of scoring probabilities λ depending on the different skills of the teams, the lineup for the match, tactics, weather conditions etc., leading to the notion of a compound Poisson distribution.It can be easily shown [26,27] that for the special case of the scoring probabilities λ following a gamma distribution, the resulting compound Poisson distribution has the form of a NBD [27], where p = 1/(1 + a).This distribution has been found to rather well describe football score data [6,10].The underlying assumption of the scoring probabilities following a gamma distribution seems to be rather ad hoc, however, and fitting different seasons of our data with the Poissonian model (1), the resulting distribution of the parameters λ does not resemble the gamma form (3). At the level of discussion up to this point, the parameter r introduced in equation (3) only appears as an empirical fit parameter.As will be shown below, however, it corresponds to the ratio of initial scoring probability and "self-affirmation factor" in the context of one of the microscopic models considered here.Analogous to equation (2), for the NBD (4) one can evaluate the probabilities for the sum s and difference d of goals scored by the home and away teams, where 2 F 1 is the hypergeometric function (see [25], p. 555).Restricting to p h = p a = p, the distribution of the total score simplifies to P Σ r h ,p,ra,p (s) = P r h +ra,p (s), i. e., one finds a composition law similar to the case of the Poissonian distribution.
Starting from the observation that the goal distributions of certain leagues do not seem to be well fitted by the NBD, Greenhough et al. [10] considered fits of the GEV distributions, to the data, obtaining good fits in some cases.Depending on the sign of the parameter ξ, these distributions are called Weibull (ξ < 0), Gumbel (ξ = 0) and Fréchet (ξ > 0) distributions, respectively.The shape parameter ξ controls the asymptotic decay of P ξ,µ,σ (n) for large n, such that increasing values of ξ correspond to stronger feedback effects in terms of the self-affirmation models discussed in the next section.

Scoring models
From the discussion of the previous section it should be apparent that the previously employed probability distributions for modelling football scores were chosen rather ad hoc mainly from the criterion of best fit to the observed data, but without offering any explanation for their suitability to the purpose.The only exception to this observation is the use of the Poisson distribution which, however, has the drawback of not describing the empirical distributions well.The major shortcoming of this latter description appears to be the assumption of independent scoring events, ignoring the fact that scoring certainly has a profound feedback on the motivation and thus the likelyhood of subsequent scoring of both teams (via direct motivation/demotivation of the players, but also, e. g., by a strengthening of defensive play in case of a lead), i. e., there is a fundamental component of (positive or negative) feedback in the system.We include such correlation effects by introducing feedback into the binomial model (being a discrete version of the Poissonian model (1) above): consider a football match divided into N time steps (we restrict ourselves here to the natural choice N = 90, but good fits are found for any choice of N within reasonable limits) with both teams having the possibility to either score or not score in each time step.Feedback is introduced into the system by having the scoring probabilities p depend on the number n of goals scored so far, p = p(n).Several possibilities arise.For our model "A", upon each goal the scoring probability is modified as with some fixed constant κ.Alternatively, one might consider a multiplicative modification rule, which we refer to as model "B".The resulting modified binomial distributions P N (n) for the total number of goals scored by one team can be computed exactly from a Pascal type recurrence relation [28], where, e. g., p(n) = p 0 + κn for model "A" and p(n) = p 0 κ n for model "B".For the case of the additive model "A", it can be shown that the continuum limit of P N (n), i. e., N → ∞ with p 0 N and κN kept fixed, is given by the NBD (4) with r = p 0 /κ and p = 1 − e −κN [28].Thus the good fit of a NBD to the data can be understood from the "microscopic" effect of self-affirmation of the teams or players, without making reference to the somewhat poorly motivated composition of the pure Poissonian model with a gamma distribution.To elaborate on these simple models, one might relax the condition of independence of the scoring of the home and away teams by coupling the adaptation rules upon scoring, for instance as for a goal of the home team, for a goal of the away team, (10) which we refer to as model "C".If both teams have κ > 1, this results in an incentive for the scoring team and a demotivation for the opponent.But a value κ < 1 is conceivable as well.These microscopic models are not only related to the Poissonian ansatz and the NBD used earlier, but also distributions of the GEV type can result from a modified microscopical model with feedback.To see this, consider again a series of trials for a number N of time steps.Assume that the probability to score U 1 goals in time step 1 is distributed according to P 1 (U 1 ) = P (U 1 ) (e. g., with a Poisson distribution P ), the probability to score U 2 goals in time step 2 is For any continuous distribution P , this means that due to the normalization factors Z i the distribution of U i will possess enhanced tails compared to the distribution of U i−1 (unless U i−1 = 0) etc., resulting in a positive feedback effect similar to that of models "A", "B" and "C".We refer to this prescription as model "D".From the results of Bertin and Clusel [23,24] it then follows that the limiting distribution of the total score n = N i=1 U i is a GEV distribution, where the specific form of distribution [in particular the value of the parameter ξ in (6)] depends on the falloff of the original distribution P in its tails.
Our "coarse-grained" scoring models with a single parameter of self-affirmation are clearly a gross over-simplification of the complex psycho-social phenomena on a football pitch and thus a plethora of opportunities for improvement of the description and further studies opens up.For instance, considering the averages over whole leagues or cups, we do not take into account the differences in skill between the teams.Likewise, if time-resolved scoring data were made available, a closer investigation of the intra-team and inter-team motivation and demotivation effects would provide an intriguing future enterprise to undertake.Such data would allow us to investigate the behavior of the (average) scoring probability as a function of playing time, and hence a direct test of our basic assumption of score-dependent scoring probabilities incorporated into the models "A"-"D" discussed above.In particular, the functional form of the thus extracted scoring probability p(n) could be compared to the linear or exponential forms implied by equations ( 7) and (8) for models "A" and "B".Some data of this type have been analyzed in [29], showing a clear increase of scoring frequency as the match progresses, thus supporting the presence of feedback as discussed here.

Bundesliga and Oberliga
We now turn to the discussion of football matches played in leagues, using the example of football played in Germany.Our main data set consists of the matches played in the "Bundesliga" (men's premier league FRG, 1963/64-2004/05, ≈ 12 800 matches), the "Oberliga" (men's premier league GDR, 1949/50-1990/91, ≈ 7700 matches), and the "Frauen-Bundesliga" (women's premier league FRG, 1997/98-2004/05, ≈ 1050 matches) [30][31][32][33].Beyond the question of which probability distribution or microscopic model might describe these data, we here wanted to see how the score distributions depend on cultural and political circumstances and are possibly different between men's and women's leagues.We first determined histograms estimating the probability density functions (PDFs) P h (n h ) and P a (n a ) of the final scores of the home and away teams, respectively [34].Similarly, we determined histograms for the PDFs P Σ (s) and P ∆ (d) of the sums and differences of final scores.To arrive at error estimates on the histogram bins, we utilized the bootstrap resampling scheme [35].
We first considered fits of the PDFs of the phenomenological descriptions considered previously, namely the Poissonian form (1), the NBD (4) and the distributions (6) of extreme value statistics.The parameters of the fits of these types to the data are summarized in table 1 comparing the East German "Oberliga" to the West German "Bundesliga" (1963/64-1990/91, ≈ 8400 matches) during the time of the German division, and in table 2 comparing the data for all games of the German men's premier league "Bundesliga" to the German women's premier league "Frauen-Bundesliga".Not to our surprise, and in accordance with previous findings [5,10], the simple Poissonian ansatz (1) is not found to be an adequate description for any of the data sets.Deviations occur here mainly in the tails with large numbers of goals which in general are found to be fatter than what can be accommodated by a Poissonian model, whereas the distribution peaks are reasonably well represented.On the contrary, the NBD form (4) models all of the considered data well as is illustrated with fits of the corresponding form to our data in figure 1 comparing "Oberliga" and "Bundesliga" and in figure 2 presenting "Bundesliga" and "Frauen-Bundesliga".Comparing the leagues, we find that the parameters r of the NBD fits for the "Bundesliga" are about twice as large as for the "Oberliga", whereas the parameters p are smaller for the "Bundesliga", cf. the data in table 1. Recalling that the form (4) is in fact the continuum limit of the feedback model "A" discussed above, these differences translate into larger values of κ and smaller values of p 0 Table 1.Fits of the phenomenological distributions (1), ( 4) and (6)   Probability density of goals scored by home and away teams, and of the total number of goals scored in a match of the GDR "Oberliga" (left) and the FRG "Bundesliga" (right), restricted to the seasons of 1963/64-1990/91.The lines for "home" and "away" show fits of the NBD (4) to the data; the line for "total" denotes the resulting distribution (5) for the sum.
for the "Oberliga" results.That is to say, scoring a goal in a match of the East German premier league was a more encouraging event than scoring a goal in a match of the West German league.Alternatively, this observation might be interpreted as a stronger tendency of the perhaps more professionalized teams of the West German league to switch to a strongly defensive mode of play in case of a lead.Consequently, the tails of the distributions are slightly fatter for the "Oberliga" than for the "Bundesliga".Comparing the results for the "Frauen-Bundesliga" to those for the "Bundesliga", even more pronounced tails are found for the former, resulting in very significantly larger values of the self-affirmation parameter κ for the matches of the women's league, see the fit parameters collected in table 2 and the fits of the NBD type presented in figure 2.
Considering the fits of the GEV distributions (6) to the data for all three leagues, we find that extreme value statistics are in general a reasonably good description of the data.The shape parameter ξ is always found to be small in modulus and negative in the majority of the cases, Table 2. Fits of the phenomenological distributions (1), ( 4) and (6) to the data for the German men's premier league "Bundesliga" between 1963/64 and 2004/05 and for the German women's premier league "Frauen-Bundesliga" for the seasons of 1997/    indicating a distribution of the Weibull type (which is in agreement with the findings of [10]).On the other hand, fixing ξ = 0 yields overall clearly larger values of χ 2 per degree-of-freedom (d.o.f.), indicating that the data are hardly compatible with a distribution of the Gumbel type.Comparing "Oberliga" and "Bundesliga", we consistently find larger values of the parameter ξ for the former, indicative of the comparatively fatter tails of these data discussed above, see the data in table 1.The location parameter µ, on the other hand, is larger for the West German league which features a larger average number of goals per match (which can be read off also more directly from the λ parameter of the Poissonian fits), while the scale parameter σ is similar for both leagues.Compared to the results for the NBD, we do not find any cases where the GEV distributions would provide the best fit to the data, so clearly the leagues considered here are not of the type of the general "domestic" league data for which Greenhough et al. [10] found better matches with the GEV than for the NBD statistics.Similar conclusions hold true for the comparisons of "Bundesliga" and "Frauen-Bundesliga", with the latter taking on the role of the "Oberliga".
In total, the best fits so far are clearly achieved by the NBD ansatz.Since this distribution is  Figure 3. Goal differences in the German women's premier league together with fits of models "A" and "B".obtained only as the continuum limit of the microscopic model "A", it is interesting to see how fits of the exact distribution (for N = 90) resulting from the recurrence (9) for model "A", but also fits of the multiplicatively modified binomial distribution of model "B" compare to the results found above.We performed fits to the exact distributions of both models by employing the simplex method [36] to minimize the total χ 2 of the data for the home and away scores.Alternatively, we also considered fitting additionally to the sums and differences in a simultaneous fit and found very similar results with an only slight improvement of the fit quality for the sums and differences at the expense of somewhat worse fits for the home and away scores.We summarize the fit results in table 3. We also performed fits to the more elaborate model "C", but found the results rather similar to those of the simpler model "B" and hence do not present the results here.Comparing the results of model "A" to the fits of the limiting NBD, we find almost identical fit qualities for the final scores of both teams.However, the sums and differences of scores are considerably better described by model "A", indicating that here the deviations from the continuum limit are still relevant.In figure 3, we present the differences of goals in the German women's premier league together with the fits of models "A" and "B".The multiplicative model "B", where each goal motivates a team even more than the previous one, within the statistical errors yields fits of the same quality as model "A", such that a distinct advantage cannot be attributed to either of them, cf. the data in table 3.

FIFA World Cup
Somewhat different conditions than for football in premier leagues apply to the case of international football tournaments.In particular, we considered the score data of the "FIFA World Cup" series from 1930 to 2006, focusing on the results from the qualification stage (≈ 4800 matches) [37] (the final knockout stage follows different rules: matches are played on neutral grounds -apart from the team of the host country -and games cannot end in a draw).The results of fits of the phenomenological distributions (1), ( 4) and ( 6) as well as the models "A" and "B" are collected in table 4. Compared to the domestic league data discussed above, the results of the World Cup show distinctly heavier tails, cf. the presentation of the data in figure 4. Considering the fit results, this leads to good fits for the heavy-tailed distributions, and, in particular, in this case the GEV distributions provide a better fit than the NBD, similar to what was found by Greenhough et al. [10] for some of their data.This difference to the German league data discussed above can be attributed to the possibly very large differences in skill between the opposing teams occurring since all countries are allowed to participate in the qualification round.A glance back to table 2 reveals a remarkable similarity with the parameters of the "Frauen-Bundesliga" (e. g., in both cases the NBD parameters p are comparatively large while r is small, and the GEV parameters ξ are positive), where a similar explanation appears quite plausible since the very good players are concentrated in two or three teams only.Turning to the fits of the models "A" and "B", we again find model "A" to fit rather similar to its continuum approximation, the NBD.On the other hand, model "B" describes the data extremely well, for the away team even better than the GEV distributions (6).

Tabletop football
Finally, we also conducted our own empirical experiments relying on the football fever of the visitors of the "Science Summer 2008" ("Wissenschaftssommer 2008") exhibition, the central opening event of the German "Year of Mathematics" held on Leipzig's Augustus Square in July 2008.To this end we rented two football tables on which visitors could play matches in teams of up to two players, see figure 5.All results were recorded and analyzed on-site in order to involve the visitors as closely as possible.By the end of the week, a total of about 2500 visitors had participated in the table football matches, in total contributing about 1000 results.With a fixed playing time of three minutes, a typical match resulted in about 5-10 goals, quite significantly more than the average number of goals (≈ 3) scored in matches of the professional football leagues considered above.Still, the overall trend of the goal distribution turned out to be surprisingly similar to the features seen for professional leagues, indicating a certain degree of universality of our interpretationthe football fever was indeed already visually apparent during the whole exhibition week (which, admittedly, was particularly suited since it ended with the final match of the UEFA Euro 2008 in Vienna between Germany and Spain (0:1)).The results can be inspected in the left panel of figure 6, where the empirical goal distribution (for one-to-one matches) is compared to our various fit models.As before, the simple Poissonian ansatz (with a χ 2 /d.o.f.= 8.9) does not work at all, but both feedback models "A" and "B" give satisfactory fits to the data (with χ 2 /d.o.f.≈ 2).For the self-affirmation parameter κ we find here κ = 0.0074 ± 0.0006 for the additive model "A" and κ = 1.11 ± 0.01 for the multiplicative model "B".In order to compare directly with the goal distribution of the home teams of the German Bundesliga, in the right panel of figure 6 we show the table football data together with the results from the German Bundesliga, where the former was renormalized to yield the same average number of goals per match.While it does not come unexpected that the two curves do not really fall on top of each other, the overall trend is surprisingly similar, given the completely different set-ups leading to the two data sets.

Conclusions
By analyzing German domestic and international football score data we have shown that the goal distributions can be modeled with a certain class of modified binomial models supplemented by a built-in effect of self-affirmation of the teams upon scoring a goal.The simple Poissonian ansatz assuming independent scoring probabilities is clearly ruled out.The NBD suggested earlier [5], which fits many of our data sets quite satisfactorily, can in fact be understood as the limiting distribution of our additive model "A".It should be stressed that the exact distribution of model "A" provides in general rather better fits to the data than the limiting NBD.This is particularly pronounced for the sums and differences of goals scored.However, the quality of the fits is limited in cases with heavier tails such as the qualification round of the "FIFA World Cup" series.Here the multiplicative model "B", in which each goal motivates the team even more than the previous one, provides an outstanding fit to these data as well as the data from the German domestic leagues.Thus, the contradicting evidence for better fits of some football score data with NBD and other data with GEV distributions is reconciled with the use of a plausible microscopic model covering both cases by successfully interpolating between the two extremes.Also the tabletop football score data of our field study with visitors of the "Science Summer 2008" are well represented with our feedback models, underscoring their apparently rather universal applicability.
Comparing the score data between the separate German premier leagues during the cold war times, we find heavier tails for the East German league.In terms of our microscopic models, this corresponds to a stronger component of self-affirmation as compared to the West German league.Similarly, the German women's premier league "Frauen-Bundesliga" shows a much stronger feedback effect than the men's premier league, with at first glance surprisingly many parallels to the "FIFA World Cup" series.We also analyzed the results from further leagues, such as the Austrian, Belgian, British, Bulgarian, Czechoslovak, Dutch, French, Hungarian, Italian, Portuguese, Romanian, Russian, Scottish and Spanish premier leagues, and arrived at similar conclusions.In general, we find less professionalized leagues to feature stronger components of positive feedback upon scoring a goal, perhaps indicating a still stronger infection with the football fever there . . .

Figure 1 .
Figure 1.Probability density of goals scored by home and away teams, and of the total number of goals scored in a match of the GDR "Oberliga" (left) and the FRG "Bundesliga" (right), restricted to the seasons of 1963/64-1990/91.The lines for "home" and "away" show fits of the NBD (4) to the data; the line for "total" denotes the resulting distribution (5) for the sum.

Figure 2 .
Figure 2.Probability density of goals scored in the German premier league "Bundesliga" for all seasons (left) and in the women's "Frauen-Bundesliga" (right).

Figure 4 .
Figure 4. Probability density of goals scored by the home and away teams in the qualification stage of the "FIFA World Cup" series on a linear (left) and logarithmic (right) scale.

Figure 5 .FFigure 6 .
Figure 5. Football fever infected visitors of the "Science Summer 2008" exhibition in Leipzig fighting in a tabletop football match.From left to right: Former Foreign Minister and Vice-Chancellor of the Federal Republic of Germany Dr. Klaus Kinkel, Vice-Rector for Research of the University of Leipzig Prof. Dr. Martin Schlegel, Lord Mayor of the City of Leipzig Burkhard Jung, and Parliamentary State Secretary to the German Federal Minister of Education and Research Thomas Rachel.¢¡ ¤£ ¦¥ ¨ § © ¡ ¤£ ¤¥ ¨ § © ¡ !" " #¡ !$ to the data for the East German "Oberliga" between 1949/50 and 1990/91 and for the West German "Bundesliga" for the seasons of 1963/64-1990/91.

Table 3 .
Fit results for models "A" and "B".Fits were performed to the score distributions of the home and away teams only and the resulting model estimates for the sums and differences of goals compared to the data.

Table 4 .
Fit results for the qualification phase of the "FIFA World Cup" series from 1930 to 2006.