Gene genealogies and the coalescent - Raphael Leblois

Coalescent advantages ..... for the inference of populational evolutionary parameters. (genetic ..... sample, and on various summary statistics (e.g. He, FST,. . . ).
4MB taille 2 téléchargements 356 vues
Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

Module de Master 2 Biostatistique: mod` eles de g´ en´ etique des populations

Gene genealogies and the coalescent Rapha¨el Leblois & Fran¸cois Rousset Centre de Biologie pour la Gestion des populations (CBGP, UMR INRA)

December 2013

TD

Introduction

Coalescent theory

Trees and mutations

Introduction Coalescent theory 2 lineages k lineages TMRCA

Coalescent advantages

Simulation algorithms

conclusions

This is the introduction to the coalescent theory, the coalescent will be used extensively in the next courses : - TD fluctuating population size

Trees and mutations

- structured populations

Coalescent advantages

- ML-based inferences under coalescent models (MCMC & IS)

Simulation algorithms Tree simulation Gen by gen continuous time

Simulating mutations conclusions TD

- Coalescent with recombination and Inferences from genomic data (HMM) - Inference using ABC

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

The Wright-Fisher Model (reminder...) size, with non-overlapping generations, and in which all individuals have equal reproductive success, each gene does not necessary leave one descendant in the next generation, but the number of descendants of each gene is a random variable following a probability distribution with expectation equal to one.

Time

• In a population of constant and finite

→ Drift, fixation of alleles, loss of genetic variation,... ?

conclusions

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

TD

backward in time

backward in time

forward in time

The coalescent theory

In the coalescent framework, we look backward in time at the genealogy of a sample up to its most recent common ancestor (MRCA)

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

The coalescent theory Genealogy of the sample

backward in time

Genealogy of the population

forward in time

Introduction

Coalescent tree

6

?

• Classical approach • Population • Gene frequencies • Forward in time

• Coalescent approach • Sample • Gene genealogies • Backward in time

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

The coalescent theory

• The main idea behind the coalescent theory is the following : • By definition, in neutral models, the number of descendant of

a gene does not depend on its genetic type. • Thus mutations does not affect the genealogy.

→ mutational processes are independant of demographic processes, i.e. mutations and genealogy can be analyzed separately.

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

Coalescence of 2 lineages

In one generation :

t=1 t=0

P(G2 = 1) =?

conclusions

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

Coalescence of 2 lineages

In one generation : probability that the two genes have a common parental gene in the previous generation t=1 t=0

P(G2 = 1) =

1 N

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

Coalescence of 2 lineages In two generations :

P(G2 = 2) =?

t=2 t=1 t=0 Haploid population of size N

conclusions

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

Coalescence of 2 lineages In two generations : probability that the two genes did not coalesce in the first generation, multiplied by the probability that the two genes have a common parental gene in the second generation

P(G2 = 2) = (1 −

t=2 t=1 t=0 Haploid population of size N

1 1 N )N

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

Coalescence of 2 lineages in i generations t=i

P(G2 = i) =?

t =i −1

In i generations :

t=1 t=0

conclusions

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

TD

Coalescence of 2 lineages in i generations t=i

t =i −1

t=1 t=0

P(G2 = i) = (1 −

1 i−1 1 N) N

In i generations : probability that the two genes did not coalesce in the first (i − 1) generations, multiplied by the probability that the two genes have a common parental gene in the i th generation

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

Coalescence of 2 lineages in i generations • Coalescence probability of two lineages in i generations :

1 i−1 1 ) N N • It is a geometric distribution with parameter 1/N

probability

P(G2 = i) = (1 −

time

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

Coalescence of 2 lineages in i generations • Coalescence probability of two lineages in i generations :

1 i−1 1 ) N N • It is a geometric distribution with parameter 1/N P(G2 = i) = (1 −

→ The expectation of t is ∞

E(G2 ) = ∑ iP(G2 = i) = N i=0

[intuitively, if there is one chance over 6 to get a 4 with a dice, we need 6 rolls of dice, in average, to get a 4...]

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

Coalescence of 2 lineages in i generations • Coalescence probability of two lineages in i generations :

1 i−1 1 ) N N • It is a geometric distribution with parameter 1/N P(G2 = i) = (1 −

→ The expectation of t is ∞

E(G2 ) = ∑ iP(G2 = i) = N i=0

• Thus, in average, a common ancestor for the two genes is found

N generations backward in time, but there is a large variance : V(G2 ) = N(N − 1) ≈ N 2

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

Coalescence of 2 lineages in i generations For x ≪ 1, we have (1 − x)t ≈ e −tx , → For large N, the discrete geometric distribution can be approximated by a continuous exponential distribution of rate N (also its expectation) : P(G2 = i) ≈

1 −i eN N

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

Coalescence of 2 lineages in i generations For x ≪ 1, we have (1 − x)t ≈ e −tx , → For large N, the discrete geometric distribution can be approximated by a continuous exponential distribution of rate N (also its expectation) :

probability

P(G2 = i) ≈

1 −i eN N

Illustration with N = 20

time

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

Coalescence of k lineages in i generations k(k−1) Considering a sample of k genes, there is (k2) = 2 pairs that can coalesce with probability 1/N, the probability that a single pair of gene coalesce in the previous generation is thus

k 1 k(k − 1) P(Gk = 1) = ( ) = 2N 2 N Then the probability that a coalescence took place i generations backward in time in a sample of k genes is follows a geometric k(k−1) distribution with parameter 2N : P(Gk = i) = (1 −

k(k − 1) i−1 k(k − 1) ) 2N 2N

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

Coalescence of k lineages in i generations The probability that a coalescence took place i generations backward in time in a sample of k genes is follows a geometric k(k−1) distribution with parameter 2N :

probability

P(Gk = i) = (1 −

k(k − 1) i−1 k(k − 1) ) 2N 2N

Illustration with N = 20

time

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

Coalescence of k lineages in i generations For large N, the distribution of coalescence times in a sample of k genes can thus be approximated by a continuous exponential k(k−1) distribution with parameter 2N : P(Gk = i) ≈

k(k − 1) −i k(k−1) e 2N 2N

Then scaling time by the population size (i.e. T = G /N, change of variable), we get P(Tk = t) ≈

k k(k − 1) −t k(k−1) k 2 e = ( )e −t(2) 2 2

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

TD

Coalescence of k lineages in i generations

probability

E(Gk ) =

2N k(k − 1)

The larger the sample size or lineage number is, the shorter expected coalescence times are.

Illustration with N = 20

time

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

TD

Coalescence of k lineages in i generations E(Gk ) =

V(Gk ) =

2N k(k − 1)

The larger the sample size or lineage number is, the shorter expected coalescence times are.

Coalescence times have high variance : two 4N 2 independent loci could show very different k 2 (k − 1)2 coalescence times, and thus very different coalescent trees (genealogies)

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

TMRCA, length of a coalescent tree TMRCA = Time to the Most Recent Common Ancestor = length (or height) of the coalescent tree k

k

2N j=2 j(j − 1)

E(GMRCA ) = ∑ E(Gj ) = ∑ j=2

k

= 2N ∑ ( j=2

1 1 1 − ) = 2N(1 − ) j −1 j k

For time scaled by population size :

1 E(TMRCA ) = 2(1 − ) k

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

TD

TMRCA, length of a coalescent tree For time in generations :

1 E(GMRCA ) = 2N(1 − ) k For time scaled by population size :

1 E(TMRCA ) = 2(1 − ) k

→ TMRCA tends to 2N (or 2) for large sample sizes → TMRCA for a relatively small random sample is almost the same as the one for the whole population. → E (T2 ) = 1 means that half of the tree height is due to the last coalescent event (i.e. of the last pair of genes).

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

Addition of mutations to a coalescent tree • Under neutrality assumption, mutations are independent of

the genealogy, because genealogical process strictly depends on demographic parameters → First, genealogies are build given the demographic parameters considered (e.g. N), then mutation are added a posteriori on each branch of the genealogy, from MRCA to the leaves, given a mutation rate and a mutation model. This allows to obtain polymorphism data under the demographic and mutation models considered.

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

TD

Addition of mutations to a coalescent tree • The number of mutations on each branch of the tree is a function

of the mutation rate of the genetic marker and the branch length. mutation rate µ = mean number of mutation per locus per generation. e.g. 5.10-4 for microsatellites, 10-8 per nucleotide for DNA sequences

→ For a branch of length t, the number of mutation m thus follows a binomial distribution with parameters (µ, t), often approximated by a Poisson distribution with parameter (µt).

P(m∣t) =

(µt)m e −µt m!

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

TD

Addition of mutations to a coalescent tree • The number of mutations on each branch of the tree is a function

of the mutation rate of the genetic marker and the branch length. (µt)m e −µt m! • Once the number of mutation on each branch is fixed, a genetic type ( allele or haplotype) is chosen for the MRCA, and then the effect of each mutation is added step by step, from the MRCA to the leaves, given the mutation model considered. P(m∣t) =

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

Advantages of the coalescent • It offers a probabilistic model for gene genealogies • The coalescent often simplifies the analyses of stochastic

population genetic models and their interpretation. • The coalescent allows extremely efficient simulations of

genetic polymorphism under various demo-genetic models (sample vs. entire population) • The coalescent allows the development of powerful methods

for the inference of populational evolutionary parameters (genetic, demographic, reproductive,. . . ), some of those methods uses all the information contained in the genetic data (likelihood-based methods) .

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

Simulation of trees and polymorphism data under the coalescent theory

(Reminder) • For neutral markers, the number of offspring is independent of the genetic types of the parents → Demographic and mutational processes are thus independent. • Simulation of polymorphism data is thus be done in two

steps : (1) Tree simulation : topology and branch length (2) Addition of mutations on the tree

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

Simulation of trees under the coalescent

There are two main methods for coalescent tree simulation : • Using continuous time algorithm (Hudson, 1991) - very fast but approximations only valid for large population sizes, weak mutation and migration rates. • Using generation by generation algorithm - can consider any mutational and demographic model, but can be very slow.

RAPIDITY : Continuous time approximations > Generation by generation FLEXIBILITY : Generation by generation > Continuous time approximations

TD

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

Representation of a tree and usual terminology Past

Time

Introduction

?Present

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

generation by generation algorithm Very simple and without any approximations : • Go backward in time generation by generation • At each generation, stochastically draw all events that can

affect the genealogy e.g. coalescence, migration, recombinaison • Stop when the most recent common ancestor of all sampled

genes (MRCA) is reached

Toy example : • 4 genes • A single neutral locus • An haploid WF population with N = 10

→ there is only coalescence events

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

TD

generation by generation algorithm • Generation G = 0 :

Node / lineage random parental gene Generation

1

2

3

4

0

0

0

0

1

2

3

4

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

TD

generation by generation algorithm • Generation G = 0 :

Node / lineage random parental gene Generation

1 5 0

2 2 0

3 6 0

→ Coalescence : randomly choose the parents by assigning a uniform random number between 1 and N for each lineage

4 2 0

1

2

3

4

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

TD

generation by generation algorithm • Generation G = 1 :

Node / lineage random parental gene Generation

1 5 0

3 6 0

5 2 1

→ Make the coalescence event...

5 1

2

4

3

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

TD

generation by generation algorithm • Generation G = 1 :

Node / lineage random parental gene Generation

1 1 0

3 4 0

5 1 1

→ Coalescence : randomly choose the parents by assigning a uniform random number between 1 and N for each lineage

5 1

2

4

3

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

TD

generation by generation algorithm • Generation G = 2 :

Node / lineage random parental gene Generation

3 4 0

6 1 2 6

→ Make the coalescence event...

5 1

2

4

3

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

TD

generation by generation algorithm • Generation G = 2 :

Node / lineage random parental gene Generation

3 3 0

6 9 2 6

→ Coalescence : randomly choose the parents by assigning a uniform random number between 1 and N for each lineage

5 1

2

4

3

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

TD

generation by generation algorithm • Generation G = 3 :

Node / lineage random parental gene Generation

3 3 0

6 9 2 6

→ Nothing happened at generation 3...

5 1

2

4

3

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

TD

generation by generation algorithm • Generation G = 3 :

Node / lineage random parental gene Generation

3 5 0

6 7 2 6

→ Coalescence : randomly choose the parents by assigning a uniform random number between 1 and N for each lineage

5 1

2

4

3

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

TD

generation by generation algorithm • Generation G = 4 :

Node / lineage random parental gene Generation

3 5 0

6 7 2 6

→ Nothing happened at generation 4...

5 1

2

4

3

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

TD

generation by generation algorithm • Generation G = 4 :

Node / lineage random parental gene Generation

3 10 0

6 2 2 6

→ Coalescence : randomly choose the parents by assigning a uniform random number between 1 and N for each lineage

5 1

2

4

3

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

TD

generation by generation algorithm • Generation G = 5 :

Node / lineage random parental gene Generation

3 10 0

6 2 2 6

→ Nothing happened at generation 5...

5 1

2

4

3

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

TD

generation by generation algorithm • Generation G = 5 :

Node / lineage random parental gene Generation

3 3 0

6 3 2 6

→ Coalescence : randomly choose the parents by assigning a uniform random number between 1 and N for each lineage

5 1

2

4

3

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

TD

generation by generation algorithm 7

• Generation G = 6 :

Node / lineage random parental gene Generation

7 3 6 6

→ Make the coalescence event...

5 1

2

4

3

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

generation by generation algorithm

conclusions

TD

MRCA 7

• Generation G = 6 :

• The coalescent tree is finished !

we have the topology and the branch lengths. • This is a stochastic process with a

high variance,

6

so if we build many trees, they will all be different but share common properties

5

• To get polymorphism data, we need to

add mutations on the tree. . .

1

2

4

3

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

continuous time algorithm Two steps in case of a single WF population with large N : • First simulate the topology of the tree by randomly coalescing

lineages (all leaves are equivalent). • Then draw coalescence times to add branch length. e.g. using continuous time exponential approximations

Same toy example : • 4 genes • A single neutral locus • An haploid WF population with N = 10

→ there is only coalescence events

A bit more complex for structured models because the topology is constrained by migration events...

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

TD

continuous time algorithm (1) Build the topology by randomly coalescing ancestral lineages

→ Lineages 2 and 4 were randomly chosen. That’s the 1st coalescent event. 5 1

2

4

3

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

TD

continuous time algorithm (1) Build the topology by randomly coalescing ancestral lineages

→ Lineages 2 and 4 were randomly chosen. That’s the 1st coalescent event.

6

→ Lineages 1 and 5 were randomly chosen. That’s the 2d coalescent event.

5 1

2

4

3

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

TD

continuous time algorithm (1) Build the topology by randomly coalescing ancestral lineages

MRCA → Lineages 2 and 4 were randomly chosen. That’s the 1st coalescent event.

7 6

→ Lineages 1 and 5 were randomly chosen. That’s the 2d coalescent event. → Lineages 3 and 6 were randomly chosen. That’s the 3d and last coalescent event.

5 1

2

4

3

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

TD

continuous time algorithm (1) Build the topology by randomly coalescing ancestral lineages (2) draw the 3 branch lengths T4 , T3 and T2 . MRCA • branch lengths are drawn from

exponential distributions P(Tk = t) ≈

k(k − 1) −t k(k−1) 2 e 2

7

T2 6

T3

5

T4 1

2

4

3

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

TD

MRCA

continuous time algorithm (1) Build the topology by randomly coalescing ancestral lineages (2) draw the 3 branch lengths T4 , T3 and T2 .

7

T2

• branch lengths are drawn from

exponential distributions k(k − 1) −t k(k−1) 2 e P(Tk = t) ≈ 2

T3

random deviates → T4 = 0.8,T4 = 1.4 and T4 = 4.3.

T4

6

5 1

2

4

3

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

continuous time algorithm (1) Build the topology by randomly coalescing ancestral lineages (2) draw the 3 branch lengths T4 , T3 and T2 . • The coalescent tree is finished !

we have the topology and the branch lengths. • Note : Coalescence times distributions

must be known under the demographic model considered ! • To get polymorphism data, we need to

add mutations on the tree. . .

conclusions

MRCA

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

Addition of mutations • General principle (reminder) : - Mutations are distributed on the different branches from the MRCA to the leaves as a function of the mutation rate µ - Each mutation induce a change in the allelic/nucleotidic state of the descending node - This genetic state change is made according to the mutational model considered, which may reflect real mutational processes of some genetic markers

conclusions

MRCA

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

Addition of mutations

conclusions

MRCA

• For a branch of length t, the number of

mutation m follows a binomial distribution with parameters (µ, t), that is often approximated by a Poisson distribution with parameter (µt) : P(m∣t) =



(µt)m e −µt m!



☇ ☇ ☇

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

Addition of mutations

conclusions

TD

MRCA 20

• For a branch of length t, the number of

mutation m is often approximated by a Poisson distribution with parameter (µt) : P(m∣t) =

(µt)m e −µt m!



• Example for microsatellites under a SMM : the

☇ 19 21 ☇ 18 ☇ 19

effect of each mutation is a gain or a loss of a motif (repeat) for each mutation - Random genetic type for the MRCA (drawn from stationary distribution) - Then add effect of each mutation

21



20

20

20

19

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

Addition of mutations

conclusions

TD

MRCA 20

• For a branch of length t, the number of

mutation m is often approximated by a Poisson distribution with parameter (µt) : P(m∣t) =

(µt)m e −µt m!



• Example for microsatellites under a SMM : the

☇ 19 21 ☇ 18 ☇ 19

effect of each mutation is a gain or a loss of a motif (repeat) for each mutation → A polymorphic sample of 4 genes is obtained with allelic states 20, 20, 21, 19.

21



20

20

20

19

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

Addition of mutations

conclusions

TD

MRCA ATTGC

• For a branch of length t, the number of

mutation m is often approximated by a Poisson distribution with parameter (µt) : P(m∣t) =

(µt)m e −µt m!



☇ ATTCC TTTGC ☇ AATCC ☇

AAACC

• Example on DNA sequence markers ( 5 bp). - Choice of the ancestral sequence (ATTGC) - independent mutation on each site



TTTGC

TTAGC

TTAGC

TTAGC

AAACC

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

Addition of mutations

conclusions

TD

MRCA ATTGC

• For a branch of length t, the number of

mutation m is often approximated by a Poisson distribution with parameter (µt) : P(m∣t) =

(µt)m e −µt m!



☇ ATTCC TTTGC ☇ AATCC ☇

AAACC

• Example on DNA sequence markers ( 5 bp).

→ A polymorphic sample of 4 genes is obtained with haplotypes TTTGC, TTAGC, TTAGC, AAACC.

TTTGC



TTAGC

TTAGC

TTAGC

AAACC

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

what to do with these simulated coalescent trees and polymorphism data ? (subject of the next courses) • Exploratory approaches - study the effects of various parameters on the shape of coalescent trees, on the distribution of polymorphism in a sample, and on various summary statistics (e.g. He, FST,. . . ) e.g. Effect of past changes in population size • Simulation tests - create simulated data sets to test the precision and robustness of genetic data analysis methods • Inferential approach - estimate populational evolutionary parameters (pop sizes, dispersal, demographic history) from polymorphism data

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Books

Simulation algorithms

conclusions

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

Some examples in R : with the package ’coalesceR’ by Renaud Vitalis

https ://r-forge.r-project.org/

conclusions

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

Gene genealogies are affected by demography

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

Gene genealogies are affected by demography With population growth, recent coalescent events are less frequent (large N) as compared to ancient coalescent events (small N). Hence external branches are longer, and internal branches shorter...

→ we expect a excess of low frequency mutations.

TD

Introduction

Coalescent theory

Trees and mutations

Coalescent advantages

Simulation algorithms

conclusions

Gene genealogies are affected by demography With population decline, recent coalescent events are more frequent (small N) as compared to ancient coalescent events (large N). Hence external branches are shorter, and internal branches longer...

→ we expect a deficit of low frequency mutations.

TD