Gillespie algorithm

In probability theory, the Gillespie algorithm (or the Doob–Gillespie algorithm or stochastic simulation algorithm, the SSA) generates a statistically correct trajectory (possible solution) of a stochastic equation system for which the reaction rates are known. It was created by Joseph L. Doob and others (circa 1945), presented by Dan Gillespie in 1976, and popularized in 1977 in a paper where he uses it to simulate chemical or biochemical systems of reactions efficiently and accurately using limited computational power (see stochastic simulation).{{Cite journal |last=Gillespie |first=Daniel T. |date=2007-05-01 |title=Stochastic Simulation of Chemical Kinetics |url=https://www.annualreviews.org/doi/10.1146/annurev.physchem.58.032806.104637 |journal=Annual Review of Physical Chemistry |language=en |volume=58 |issue=1 |pages=35–55 |doi=10.1146/annurev.physchem.58.032806.104637 |pmid=17037977 |bibcode=2007ARPC...58...35G |issn=0066-426X}} As computers have become faster, the algorithm has been used to simulate increasingly complex systems. The algorithm is particularly useful for simulating reactions within cells, where the number of reagents is low and keeping track of every single reaction is computationally feasible. Mathematically, it is a variant of a dynamic Monte Carlo method and similar to the kinetic Monte Carlo methods. It is used heavily in computational systems biology.{{citation needed|date=June 2012}}

History

The process that led to the algorithm recognizes several important steps. In 1931, Andrei Kolmogorov introduced the differential equations corresponding to the time-evolution of stochastic processes that proceed by jumps, today known as Kolmogorov equations (Markov jump process) (a simplified version is known as master equation in the natural sciences). It was William Feller, in 1940, who found the conditions under which the Kolmogorov equations admitted (proper) probabilities as solutions. In his Theorem I (1940 work) he establishes that the time-to-the-next-jump was exponentially distributed and the probability of the next event is proportional to the rate. As such, he established the relation of Kolmogorov's equations with stochastic processes.

Later, Doob (1942, 1945) extended Feller's solutions beyond the case of pure-jump processes. The method was implemented in computers by David George Kendall (1950) using the Manchester Mark 1 computer and later used by Maurice S. Bartlett (1953) in his studies of epidemics outbreaks. Gillespie (1977) obtains the algorithm in a different manner by making use of a physical argument.

Idea

= Mathematics =

In a reaction chamber, there are a finite number of molecules. At each infinitesimal slice of time, a single reaction might take place. The rate is determined by the number of molecules in each chemical species.

Naively, we can simulate the trajectory of the reaction chamber by discretizing time, then simulate each time-step. However, there might be long stretches of time where no reaction occurs. The Gillespie algorithm samples a random waiting time until some reaction occurs, then take another random sample to decide which reaction has occurred.

The key assumptions are that

each reaction is Markovian in time
there are no correlations between reactions

Given the two assumptions, the random waiting time for some reaction is exponentially distributed, with exponential rate being the sum of the individual reaction's rates.

= Validity in biochemical simulations =

Traditional continuous and deterministic biochemical rate equations do not accurately predict cellular reactions since they rely on bulk reactions that require the interactions of millions of molecules. They are typically modeled as a set of coupled ordinary differential equations. In contrast, the Gillespie algorithm allows a discrete and stochastic simulation of a system with few reactants because every reaction is explicitly simulated. A trajectory corresponding to a single Gillespie simulation represents an exact sample from the probability mass function that is the solution of the master equation.

The physical basis of the algorithm is the collision of molecules within a reaction vessel. It is assumed that collisions are frequent, but collisions with the proper orientation and energy are infrequent. It is assumed that the reaction environment is well mixed.

Algorithm

A review (Gillespie, 2007) outlines three different, but equivalent formulations; the direct, first-reaction, and first-family methods, whereby the former two are special cases of the latter. The formulation of the direct and first-reaction methods is centered on performing the usual Monte Carlo inversion steps on the so-called "fundamental premise of stochastic chemical kinetics", which mathematically is the function

: $p(\tau,j\mid\boldsymbol{x},t) = a_j(\boldsymbol{x})\exp\left(-\tau\sum_{j} a_{j}(\boldsymbol{x})\right),$

where each of the $a$ terms are propensity functions of an elementary reaction, whose argument is $\boldsymbol{x}$ , the vector of species counts. The $\tau$ parameter is the time to the next reaction (or sojourn time), and $t$ is the current time. To paraphrase Gillespie, this expression is read as "the probability, given $\boldsymbol{X}(t) = \boldsymbol{x}$ , that the system's next reaction will occur in the infinitesimal time interval $[t+\tau, t+\tau+d\tau]$ , and will be of stoichiometry corresponding to the $j$ th reaction". This formulation provides a window to the direct and first-reaction methods by implying $\tau$ is an exponentially-distributed random variable, and $j$ is "a statistically independent integer random variable with point probabilities $a_{j}(\boldsymbol{x}) / \sum_{j}a_{j}(\boldsymbol{x})$ ".

Thus, the Monte Carlo generating method is simply to draw two pseudorandom numbers, $r_{1}$ and $r_{2}$ on $[0,1]$ , and compute

: $\tau = \frac{1}{\sum_{j}a_{j}(\boldsymbol{x})}\log\left(\frac{1}{r_{1}}\right),$

and

: $j ={}$ the smallest integer satisfying $\sum_{j'=1}^j a_{j'}(\boldsymbol{x}) > r_2 \sum_j a_j (\boldsymbol{x}).$

Utilizing this generating method for the sojourn time and next reaction, the direct method algorithm is stated by Gillespie as

1. Initialize the time $t = t_0$ and the system's state $\boldsymbol{x} = \boldsymbol{x}_0$

2. With the system in state $\boldsymbol{x}$ at time $t$ , evaluate all the $a_j(\boldsymbol{x})$ and their sum $\sum_{j}a_j(\boldsymbol{x})$

3. Calculate the above value of $\tau$ and $j$

4. Effect the next reaction by replacing $t \leftarrow t + \tau$ and $\boldsymbol{x} \leftarrow \boldsymbol{x} + \nu_j$

5. Record $(\boldsymbol{x}, t)$ as desired. Return to step 2, or else end the simulation.

where $\nu_j$ represents adding the $j^\text{th}$ component of the given state-change vector $\nu$ . This family of algorithms is computationally expensive and thus many modifications and adaptations exist, including the next reaction method (Gibson & Bruck), tau-leaping, as well as hybrid techniques where abundant reactants are modeled with deterministic behavior. Adapted techniques generally compromise the exactitude of the theory behind the algorithm as it connects to the master equation, but offer reasonable realizations for greatly improved timescales. The computational cost of exact versions of the algorithm is determined by the coupling class of the reaction network. In weakly coupled networks, the number of reactions that is influenced by any other reaction is bounded by a small constant. In strongly coupled networks, a single reaction firing can in principle affect all other reactions. An exact version of the algorithm with constant-time scaling for weakly coupled networks has been developed, enabling efficient simulation of systems with very large numbers of reaction channels (Slepoy Thompson Plimpton 2008). The generalized Gillespie algorithm that accounts for the non-Markovian properties of random biochemical events with delay has been developed by Bratsun et al. 2005 and independently Barrio et al. 2006, as well as (Cai 2007). See the articles cited below for details.

Partial-propensity formulations, as developed independently by both Ramaswamy et al. (2009, 2010) and Indurkhya and Beal (2010), are available to construct a family of exact versions of the algorithm whose computational cost is proportional to the number of chemical species in the network, rather than the (larger) number of reactions. These formulations can reduce the computational cost to constant-time scaling for weakly coupled networks and to scale at most linearly with the number of species for strongly coupled networks. A partial-propensity variant of the generalized Gillespie algorithm for reactions with delays has also been proposed (Ramaswamy Sbalzarini 2011). The use of partial-propensity methods is limited to elementary chemical reactions, i.e., reactions with at most two different reactants. Every non-elementary chemical reaction can be equivalently decomposed into a set of elementary ones, at the expense of a linear (in the order of the reaction) increase in network size.

Examples

= Reversible binding of A and B to form AB dimers=

A simple example may help to explain how the Gillespie algorithm works. Consider a system of molecules of two types, {{math|A}} and {{math|B}}. In this system, {{math|A}} and {{math|B}} reversibly bind together to form {{math|AB}} dimers such that two reactions are possible: either A and B react reversibly to form an {{math|AB}} dimer, or an {{math|AB}} dimer dissociates into {{math|A}} and {{math|B}}. The reaction rate constant for a given single A molecule reacting with a given single {{math|B}} molecule is $k_\mathrm{D}$ , and the reaction rate for an {{math|AB}} dimer breaking up is $k_\mathrm{B}$ .

If at time t there is one molecule of each type then the rate of dimer formation is $k_\mathrm{D}$ , while if there are $n_\mathrm{A}$ molecules of type {{math|A}} and $n_\mathrm{B}$ molecules of type {{math|B}}, the rate of dimer formation is $k_\mathrm{D}n_\mathrm{A}n_\mathrm{B}$ . If there are $n_\mathrm{AB}$ dimers then the rate of dimer dissociation is $k_\mathrm{B}n_\mathrm{AB}$ .

The total reaction rate, $R_\mathrm{TOT}$ , at time t is then given by

: $R_\mathrm{TOT}=k_\mathrm{D}n_\mathrm{A}n_\mathrm{B}+k_\mathrm{B}n_\mathrm{AB}$

So, we have now described a simple model with two reactions. This definition is independent of the Gillespie algorithm. We will now describe how to apply the Gillespie algorithm to this system.

In the algorithm, we advance forward in time in two steps: calculating the time to the next reaction, and determining which of the possible reactions the next reaction is. Reactions are assumed to be completely random, so if the reaction rate at a time t is $R_\mathrm{TOT}$ , then the time, δt, until the next reaction occurs is a random number drawn from exponential distribution function with mean $1/R_\mathrm{TOT}$ . Thus, we advance time from t to t + δt.

File:Example calculation illustrating the Gillespie algorithm for reversible dimerising molecules.png

The probability that this reaction is an {{math|A}} molecule binding to a {{math|B}} molecule is simply the fraction of total rate due to this type of reaction, i.e.,

the probability that reaction is $P(\ce{{A} + B -> AB}) = k_Dn_An_B/R_\ce{TOT}$

The probability that the next reaction is an {{math|AB}} dimer dissociating is just 1 minus that. So with these two probabilities we either form a dimer by reducing $n_\mathrm{A}$ and $n_\mathrm{B}$ by one, and increase $n_\mathrm{AB}$ by one, or we dissociate a dimer and increase $n_\mathrm{A}$ and $n_\mathrm{B}$ by one and decrease $n_\mathrm{AB}$ by one.

Now we have both advanced time to t + δt, and performed a single reaction. The Gillespie algorithm just repeats these two steps as many times as needed to simulate the system for however long we want (i.e., for as many reactions). The result of a Gillespie simulation that starts with $n_\mathrm{A}=n_\mathrm{B}=10$ and $n_\mathrm{AB}=0$ at t=0, and where $k_\mathrm{D}=2$ and $k_\mathrm{B}=1$ , is shown at the right. For these parameter values, on average there are 8 $n_\mathrm{AB}$ dimers and 2 of {{math|A}} and {{math|B}} but due to the small numbers of molecules fluctuations around these values are large. The Gillespie algorithm is often used to study systems where these fluctuations are important.

That was just a simple example, with two reactions. More complex systems with more reactions are handled in the same way. All reaction rates must be calculated at each time step, and one chosen with probability equal to its fractional contribution to the rate. Time is then advanced as in this example.