Selection-mutation balance models with epistatic selection

We present an application of birth-and-death processes on configuration spaces to a generalized mutationselection balance model. The model describes aging of a population as a process of accumulation of mutations in a genotype. A rigorous treatment demands that mutations correspond to points in abstract spaces. Our model describes an infinite-population, infinite-sites model in continuum. The dynamical equation which describes the system, is of Kimura-Maruyama type. The problem can be posed in terms of evolution of states (differential equation) or, equivalently, represented in terms of Feynman-Kac formula. The questions of interest are existence of a solution, its asymptotic behavior, and properties of the limiting state. In the non-epistatic case the problem was posed and solved in [D. Steinsaltz, S.N. Evans, and Wachter K.W., Adv. Appl. Math., 35(1), 2005]. In our model we consider a topological space X as the space of positions of mutations and the influence of an epistatic potential.


The model
First, we recall some genetical concepts and notions, see e.g.[1].A gene represents a (contiguous) region of DNA coding.It may have different forms, called alleles.Thus an allele is one of the variant forms of a gene that occupies a given locus (position) on a chromosome, i.e. alleles are DNA sequences that code a gene.An individual's genotype is the collection of alleles it consists of.A change of genetic material is called a mutation, and the affected allele is called mutant allele.We call the "null genotype" the one which has wild-type alleles at every locus and carries none of mutant alleles.So a wild-type allele is an allele which is considered to be "normal" for the organism in question, as opposed to a mutant allele which appears due to mutation.In this chapter we will use the word "genotype" in a sense somewhat different from the mentioned above: a genotype represents a set of mutant alleles that an individual may carry.So in contrast to the usual definition we are interested only in the set of mutant alleles, rather than in the complete information about all alleles.
In this section we describe a model introduced by [7], which describes the aging of a population.Let X be a Polish space, interpreted as the space of loci (i.e.positions of possible mutations).Denote the Borel σ−algebra on X by B(X), and fix a measure locally finite σ-finite measure σ on (X, B(X)) -interpreted as mutation rate.For simplicity, we assume that at each locus at most one mutation may occur.A locally finite configuration of points in X (a countable subset of X without accumulation points) is interpreted as a genotype.Then γ = ∅ plays the role of the null genotype (wild-type genotype).The set of all genotypes γ is thus the configuration space Γ := Γ(X).We assume that genotypes are influenced by a selection cost Φ, which is a continuous function Φ : Γ −→ R, e.g.Φ(∅) = 0, Φ(γ) > 0, for γ = ∅.
The emergence of mutant alleles is described by a stochastic process, the state of the population of genotypes at each fixed moment of time t is described by a probability measure µ t on Γ.The time development of the population is modelled by a Kimura-Maruyama type equation where µ t (F ) := Γ F dµ t , F : Γ −→ R is a bounded cylinder function (depending on γ only locally).The questions of interest for us are: the existence of solution µ t , convergence of µ t → µ for t → +∞ and properties of the obtained limiting state µ.A useful choice of time parameterization is to start the process in the remote past, namely at time t = −T < 0, in the state µ −T .Then we arrive at t = 0 in the state µ 0,T .The limiting state for long time is then given by lim Next, we consider another representation of the model, which gives us an explicit solution of equation ( 1) with the help of the Feynman-Kac formula.Denote by E := Γ(X).Remind that E is a Polish space.Let L be a Markov generator defined by for bounded cylindric functions F : E −→ R. The continuous function Φ : E −→ R will play the role of potential in Feynman-Kac formula.Rewriting (1) in terms of these notations we obtain Denote by µ T t , −T t 0 the measure-valued dynamical system which is the solution of (2) for each bounded cylindric function F : E −→ R, started in µ T −T = µ.The solution µ T t of (2) can be explicitly written as where Z t is the normalizing constant.Via Feynmann-Kac formula we can represent µ T t as , where ξ T τ denotes the Markov process corresponding to the generator L, started in µ T −T = µ.Performing the limit T −→ +∞ gives us heuristically where Z is the normalizing constant.The aim of the following sections is to give proper sense to ν Φ , defining the measure first in a bounded volume and for finite time and then going to the limit.By means of ν Φ we derive the large time asymptotic for µ T 0 .In the non-epistatic case the problem was posed and solved in [7].The articles [2], [3] were motivated by this work, and treat the case of a more general potential -the epistatic one.In both articles the space of the possible positions of mutations is R d .The generalization to a topological space X seems important because of the nonlinear structure of the DNA.In our model we consider a topological space X as the space of positions of mutations and the influence of an epistatic potential.

Pure birth process
We define the pure birth Markov process on Γ(X), starting with an empty configuration at time t = −T , via the generator for bounded cylinder functions F (γ).In our interpretation this means that there were no mutant alleles at the beginning, in other words we start with the null genotype.As the time passes, the mutations gradually appear in some points x i ∈ X at times t i , −T < t i 0, and then they stay there forever.Notational convention: for readability reasons we prefer to consider the following positive times.Nevertheless, we would like to consider 0 as the final time.Therefore, we reflect the time w.r.t. to the origin.So we consider our pure birth process on the space of marked configurations Γ(X, R + ), which is defined by For more details about marked configuration spaces see e.g.[1,5,6].The spaces Γ(Λ, R + ) and Γ(Λ, [0, T ]) are defined analogously.Denote the marked Poisson measure on Γ(X, [0, T ]) by ν 0 T , and its restriction to Γ(Λ, [0, T ]) by ν 0 Λ,T .It is well known that the marked Poisson measure ν 0 T can be characterized by its Laplace transform The Markov birth process ξ τ (γ), 0 τ T (time is going backwards, i.e. the process starts at T and ends at 0) on ( Γ(X, [0, T ]), ν 0 T ), corresponding to the generator (5) can be realized by Further, we assume the effect of a selection cost function Φ : Γ −→ R + , which consists of two parts: Φ ne (γ) is the nonepistatic part, which describes the life costs of a mutation, and it is given by Φ e (γ) is the epistatic part, which describes the coexistence costs of mutations, and it is defined by Φ e (γ) := {x,y}⊂γ φ(x; y), conditions on φ are specified later.
As the configuration γ may contain, in general, an infinite number of points, the above cost functions are well-defined only in a bounded region Λ ⊂ X.
For convenience we introduce the corresponding path space measure in two steps: first we consider only the effect of the nonepistatic part of the cost function and then take into consideration the effect of the epistatic part.

The effect of the nonepistatic part of the potential
First we construct the path space measure ν h on the space Γ(X, R + ), obtained under the effect of Φ ne .The restriction of ν h to Γ(Λ, [0, T ]) is denoted by ν h Λ,T , and defined for bounded Λ ⊂ X as where Z Λ,T is the normalizing constant Then we obtain the measure ν h as the limit of the measure ν h Λ,T , which are defined in a bounded volume Λ and for finite time T .ν h Λ,T is the so-called Gibbs perturbation of the marked Poisson measure ν 0 T .First we will show that ν h Λ,T still remains a Poisson measure.For this purpose we calculate its intensity measure by computing the Laplace transform of ν h Λ,T .
Then we have Then the normalizing constant Z Λ,T is Calculating the integral of F = e f,γ w.r.t the measure ν h Λ,T we obtain

Thus ν h
Λ,T is a marked Poisson measure on Γ(Λ, [0, T ]) with intensity measure e −sh(x) dσ(x)ds.Note that we understand "weak" limit lim Λ↑X ρ Λ = ρ in this paper in the following sense for all bounded cylinder functions F ∈ FL 0 ( Γ(X, [0, T ])).The set of cylinder functions FL 0 ( Γ(X, [0, T ])) is defined as the set of all measurable F such that there exists a Λ ∈ B c (X) with We are interested in the weak limit of ν h Λ,T for Λ ↑ X, T → +∞.In the case considered here the limit does not depend on the order in which the limits are taken.First we can take, for example, Λ ↑ X, then T → +∞.As a result we get the following statement: Theorem 2.2. 1) There exists the "weak" limit where ν h T is the marked Poisson measure on Γ(X, [0, T ]) with intensity measure e −sh(x) σ(dx)ds.2) There exists the "weak" limit lim where ν h is the marked Poisson measure on Γ(X, R + ) with the same intensity measure e −sh(x) σ(dx)ds.
Ultimately, the measure ν h can also be described as a marked point field γ = (γ, s γ ), where γ is distributed according to π σ/h(x) -Poisson measure on Γ(X) -with marks s x ∈ R + distributed independently with probability p(ds) = h(x)e −h(x)s ds on R + .
The main object of our interest is the final distribution of mutations µ h , i.e. the distribution of end points of the bars.Recall that we have chosen the time range so that the final time is 0. We obtain µ h , similar to the construction above, as the limit of final distributions µ 0 Λ,T given in the bounded volume and for finite time.The measure µ h Λ,T on Γ(X) is defined for By definition of µ h Λ,t (small t) the integral w.r.t.µ h Λ,t is so µ 0 Λ,T −t is the solution of (1) in the bounded volume Λ for finite time T , where Φ(γ) := h, γ .Note that for f ∈ C 0 (X), γ ∈ Γ(X, R + ) we have where F (x, s) = f (x)1 1 [0,T ] (s).Therefore, the following lemma is a corollary of Lemma 2.1.
Now we calculate the integral in (11) Again, as before, we are interested in the "weak" limit of µ h Λ,T for Λ ↑ X, T → +∞.The limit does not depend on the order as well.We can first take for example Λ ↑ X, then T → +∞.As a result, we get the following statement: Theorem 2.4.(cf.[7]) 1) There exists the "weak" limit lim where µ h T is a Poisson measure on Γ(X) with intensity (1 − exp {−T h(x)}) h(x) dσ(x).
"Weak" limit in the sense that for all bounded cylinder functions F ∈ FL 0 (Γ(X)) 2) According to Lebesgue's dominated convergence theorem, there exists the "weak" limit where µ h is the Poisson measure on Γ(X) with intensity measure 1 h(x) σ.

The effect of the epistatic part of the potential
Now we include the effect of the epistatic part of the potential Φ e (γ).We consider the Gibbs perturbation of measure ν h from Theorem 2.2 through Φ e , i.e.

Cluster expansion
By the definition of dν β,φ Denote by σ(dx, ds) = e −sh(x) σ(dx)ds.Theorem 2.2 says that ν h Λ is the Poisson measure on Γ(Λ, R + ) with intensity σ(dx, ds).By the definition of Poisson and the Lebesgue-Poisson measure Then (18) can be written as where Ẑβ,Λ = Z β,Λ • exp{σ(Λ × [0, +∞))}.Cluster expansion is a tool which is used to effectively estimate the Gibbs factor e −βE(γ) for small parameters, see e.g.[6].Here we follow the presentation given in [4,5].There the cluster expansion was generalized to a general metric space, i.e. no translation invariant structure is present.In our case the factor which we are going to decompose is The cluster decomposition of (19) is as follows: