Introduction
Definitions
What’s a marker
Mutation models
Module de Master 2 Biostatistique: mod` eles de g´ en´ etique des populations
Genetic markers for the study of the genetic polymorphism in natural populations Rapha¨el Leblois & Fran¸cois Rousset Centre de Biologie pour la Gestion des populations (CBGP, UMR INRA)
December 2013
Introduction
Definitions
What’s a marker
Mutation models
Population Genetics (reminder...)
• Infer allelic and genotypic frequencies,
to study their distribution within and between populations • Understand the evolution of gene and genotype frequencies
within and between populations due to the different evolutionary forces : mutation, drift, gene flow, selection
Introduction
Definitions
What’s a marker
Mutation models
Population Genetics (reminder...)
• Theoretical : necessary to test verbal hypotheses and models,
and to produce new theoretical hypotheses and models • “Experimental” : test theoretical models and hypotheses
under controlled conditions • “Empirical” : study the distribution of polymorphism in
natural populations → infer the demographic and adaptive history of natural populations
Introduction
Definitions
What’s a marker
Mutation models
Polymorphism in natural populations : different form of variation • Within individuals
• Between individuals/within
population Collection of snails from a polymorphic population of Cepaea nemoralis in Poland. This illustrates the variety of shell colours (Yellow, Pink, Brown) and banding (0, 1, 5) typically found.
Introduction
Definitions
What’s a marker
Mutation models
Polymorphism in natural populations : different form of variation • Between populations 17 000 polymorphic sites in 876 DNA sequences from 96 individuals
Introduction
Definitions
What’s a marker
Mutation models
Some definitions
• Gene : copy of a genetic information, carried by a sequence of
nucleotides. A diploid individual has two copies of a gene. • Locus : location of a gene on a chromosom. • Allele (“allelic state”) : class of equivalent homologous genes
(i.e. in the same state). From those definitions : At a given locus, a diploid individual has two homologous genes, which can belong to the same allelic class if he is homozygous.
Introduction
Definitions
What’s a marker
What is a genetic marker ? Genetic markers are used to describe the genetic polymorphism and its distribution within and between individuals, populations, or species, (because we do not have direct access to individual genomes). • A good genetic marker should : • have a simple mode of inheritance (e.g. Mendelian) • be polymorphic • be co-dominant • be neutral (only for demographic inferences)
Mutation models
Introduction
Definitions
What’s a marker
The different genetic markers (1) PCR
• Microsatellite markers : repetitions of short DNA motives, many loci dispersed in the genome • • • •
Medelian inheritance high polymorphism co-dominant “neutral”
Mutation models
Introduction
Definitions
What’s a marker
Mutation models
The different genetic markers (2) • SNPs : Single Nucleotide Polymorphisms • Medelian inheritance • low polymorphism 0 / 1 = ancestral / derived states • many many SNPs in the genome • co-dominant • “neutral” or “selected” (good for the study of selection)
SNPs are the typical markers of the “next generation sequencing technics” (NGS) epoch.
Introduction
Definitions
What’s a marker
Mutation models
The different genetic markers (4)
• DNA sequences : acces to the nucleotide
sequence of “short” DNA fragments. • Medelian inheritance • intermediate polymorphism depending on
the length • co-dominant but difficult to “phase” • “neutral” or “selected” : intra vs intergenic
sequences
Phasing : For diploid organisms, each observed polymorphism (SNP) on the DNA sequence needs to be attributed to a given strand (i.e. maternal or paternal DNA) to get the two haplotypes carried by each individual
Introduction
Definitions
What’s a marker
Mutation models
The different genetic markers (5)
• Whole genomes : ideally the best genetic data ! • Medelian inheritance • high polymorphism • co-dominant but difficult to “phase” • “neutral” and “selected” Next generation sequencing (NGS) technics are clearly revolutionizing population genetics. Data are no more limiting but existing methods can not deal with such large data sets, they must be adapted and new methods must be developed. See course 5 : CoalHMM models
Introduction
Definitions
What’s a marker
Mutation models
Modelling of the mutation processes of the different genetic markers
• To use population genetic models with the data obtained from
genetic markers, we need to take into account : • the mutation processes creating the DNA variants • but also the difference between the DNA variants and what we
can observe given the molecular technics used
→ need to define mutation models 2 contrasting examples of mutation processes : the case of microsatellites vs. DNA sequence data
Introduction
Definitions
What’s a marker
Mutation models
Modelling of the mutation processes of the different genetic markers • What is the main cause of mutation at microsatellite loci : • sequences of repeated short DNA motifs, e.g. (CA)10 • creation of DNA loops during replication
Introduction
Definitions
What’s a marker
Mutation models
Modelling of the mutation processes of the different genetic markers
• What is the main cause of mutation at microsatellite loci : • sequences of repeated short DNA motifs, e.g. (CA)10 • creation of DNA loops during replication
→ (1) the new mutated allele have gained or loss (CA) motif(s) → (2) it is a relatively frequent process, leading to high mutation rates (e.g. 5 · 10−4 per generation)
Introduction
Definitions
What’s a marker
Mutation models
The different mutation models
• Several mutation models have been developped for allelic
data : • The infinite allele model (IAM) assumes that every new allele
created by mutation is different from the existing ones. Identity in state is thus equivalent to identity by descent. • The K-allele model (KAM) assumes there are K possible allelic states and that mutations from one to all other states are equiprobable (1/(K − 1)).
Introduction
Definitions
What’s a marker
Mutation models
The different mutation models
• Several mutation models have been developped for allelic
data : • The stepwise mutation model (SMM) was designed to analyze
alleles characterized by their electrophoretic mobility. It assumes that each mutation increase or reduce the mobility by one “step”. Application to microsatellite markers is direct by considering loss or gain of one repeated motif (e.g. (TG)).
Introduction
Definitions
What’s a marker
Mutation models
Modelling of the mutation processes of the different genetic markers
• Main mutation processes occurring on DNA sequences are : - Single nucleotide changes (i.e. ATGC) - Insertions / deletions of one or more nucleotides - mutations are rare : 10−8 to 10−12 per nucleotide per generation
Introduction
Definitions
What’s a marker
Mutation models
Modelling of the mutation processes of the different genetic markers • Main mutation processes occurring on DNA sequences are : - Single nucleotide changes (i.e. ATGC) - Insertions / deletions of one or more nucleotides - mutations are rare : 10−8 to 10−12 per nucleotide per generation • Several models exist for nucleotide evolution : - The simplest : the Infinitely many Site Model (ISM) An infinitely long sequence → each mutation occur on a different site Each polymorphic site can have 2 states : ancestral (0) and derived (1) Haplotypes of a sample can be written as a series of 0/1 : e.g. 101011011
Introduction
Definitions
What’s a marker
Mutation models
Modelling of the mutation processes of the different genetic markers
• Main mutation processes occurring on DNA sequences are : - Single nucleotide changes (i.e. ATGC) - Insertions / deletions of one or more nucleotides - mutations are rare : 10−8 to 10−12 per nucleotide per generation • Several models exist for nucleotide evolution : - The simplest : the Infinitely many Site Model (ISM) - Many other models exist...with 0 to 10 parameters... e.g. combining different nucleotide transition rates, specific insertion/deletions rates, time variables rates, etc. They are usually calibrated using phylogenetics and fossil data.
Introduction
Definitions
What’s a marker
Mutation models
Modelling of the mutation processes of the different genetic markers More generally, mutation processes acting on the genetic markers can be modeled using Markov chains and a transition probability matrix between alleles or haplotypes (mutation matrix U). For a KAM with K = 6 states, the transition matrix is u u u 1 − u K u−1 K −1 K −1 K −1 u u u u K −1 1 − u K −1 K −1 K −1 u u u u 1 − u K −1 K −1 K −1 U ≡ (uij ) = K u−1 u u 1 − u K u−1 K −1 K −1 K −1 u u u u K −1 1−u K −1 K −1 K −1 u K −1
u K −1
u K −1
u K −1
u K −1
u K −1 u K −1 u K −1 u K −1 u K −1
1−u
Introduction
Definitions
What’s a marker
Mutation models
Modelling of the mutation processes of the different genetic markers More generally, mutation processes acting on the genetic markers can be modeled using Markov chains and a transition probability matrix between alleles or haplotypes (mutation matrix U). For a KAM with K = 6 states, the transition matrix is 1 − u u/5 u/5 u/5 u/5 u/5 u/5 1 − u u/5 u/5 u/5 u/5 u/5 u/5 1 − u u/5 u/5 u/5 U ≡ (uij ) = u/5 u/5 u/5 1 − u u/5 u/5 u/5 u/5 u/5 u/5 1 − u u/5 u/5 u/5 u/5 u/5 u/5 1 − u
Introduction
Definitions
What’s a marker
Mutation models
Modelling of the mutation processes of the different genetic markers More generally, mutation processes acting on the genetic markers can be modeled using Markov chains and a transition probability matrix between alleles or haplotypes (mutation matrix U). For a SMM with 6 states, the transition 1−u u 0 u/2 1 − u u/2 0 u/2 1 − u U ≡ (uij ) = 0 0 u/2 0 0 0 0 0 0
matrix is 0 0 0 0 0 0 u/2 0 0 1 − u u/2 0 u/2 1 − u u/2 0 u 1−u
Introduction
Definitions
What’s a marker
Mutation models
Modelling of the mutation processes of the different genetic markers
More generally, mutation processes acting on the genetic markers can be modeled using Markov chains and a transition probability matrix between alleles or haplotypes (mutation matrix U). Those mutation matrix are everywhere in population genetics to go from identity by descent (i.e. no mutation since the common ancestor of two genes, IAM, ISM) to identity in state (i.e. the observed allelic type of two genes is the same).
Introduction
Definitions
What’s a marker
Mutation models
Books Notion of genetic markers and mutation models are in all good books of population genetics... especially developed from a biological point of view in