modèles de génétique des populations - Genetic ... - Raphael Leblois

Theoretical : necessary to test verbal hypotheses and models, and to produce new theoretical hypotheses and models. • “Experimental” : test theoretical models ...
2MB taille 0 téléchargements 30 vues
Introduction

Definitions

What’s a marker

Mutation models

Module de Master 2 Biostatistique: mod` eles de g´ en´ etique des populations

Genetic markers for the study of the genetic polymorphism in natural populations Rapha¨el Leblois & Fran¸cois Rousset Centre de Biologie pour la Gestion des populations (CBGP, UMR INRA)

December 2013

Introduction

Definitions

What’s a marker

Mutation models

Population Genetics (reminder...)

• Infer allelic and genotypic frequencies,

to study their distribution within and between populations • Understand the evolution of gene and genotype frequencies

within and between populations due to the different evolutionary forces : mutation, drift, gene flow, selection

Introduction

Definitions

What’s a marker

Mutation models

Population Genetics (reminder...)

• Theoretical : necessary to test verbal hypotheses and models,

and to produce new theoretical hypotheses and models • “Experimental” : test theoretical models and hypotheses

under controlled conditions • “Empirical” : study the distribution of polymorphism in

natural populations → infer the demographic and adaptive history of natural populations

Introduction

Definitions

What’s a marker

Mutation models

Polymorphism in natural populations : different form of variation • Within individuals

• Between individuals/within

population Collection of snails from a polymorphic population of Cepaea nemoralis in Poland. This illustrates the variety of shell colours (Yellow, Pink, Brown) and banding (0, 1, 5) typically found.

Introduction

Definitions

What’s a marker

Mutation models

Polymorphism in natural populations : different form of variation • Between populations 17 000 polymorphic sites in 876 DNA sequences from 96 individuals

Introduction

Definitions

What’s a marker

Mutation models

Some definitions

• Gene : copy of a genetic information, carried by a sequence of

nucleotides. A diploid individual has two copies of a gene. • Locus : location of a gene on a chromosom. • Allele (“allelic state”) : class of equivalent homologous genes

(i.e. in the same state). From those definitions : At a given locus, a diploid individual has two homologous genes, which can belong to the same allelic class if he is homozygous.

Introduction

Definitions

What’s a marker

What is a genetic marker ? Genetic markers are used to describe the genetic polymorphism and its distribution within and between individuals, populations, or species, (because we do not have direct access to individual genomes). • A good genetic marker should : • have a simple mode of inheritance (e.g. Mendelian) • be polymorphic • be co-dominant • be neutral (only for demographic inferences)

Mutation models

Introduction

Definitions

What’s a marker

The different genetic markers (1) PCR

• Microsatellite markers : repetitions of short DNA motives, many loci dispersed in the genome • • • •

Medelian inheritance high polymorphism co-dominant “neutral”

Mutation models

Introduction

Definitions

What’s a marker

Mutation models

The different genetic markers (2) • SNPs : Single Nucleotide Polymorphisms • Medelian inheritance • low polymorphism 0 / 1 = ancestral / derived states • many many SNPs in the genome • co-dominant • “neutral” or “selected” (good for the study of selection)

SNPs are the typical markers of the “next generation sequencing technics” (NGS) epoch.

Introduction

Definitions

What’s a marker

Mutation models

The different genetic markers (4)

• DNA sequences : acces to the nucleotide

sequence of “short” DNA fragments. • Medelian inheritance • intermediate polymorphism depending on

the length • co-dominant but difficult to “phase” • “neutral” or “selected” : intra vs intergenic

sequences

Phasing : For diploid organisms, each observed polymorphism (SNP) on the DNA sequence needs to be attributed to a given strand (i.e. maternal or paternal DNA) to get the two haplotypes carried by each individual

Introduction

Definitions

What’s a marker

Mutation models

The different genetic markers (5)

• Whole genomes : ideally the best genetic data ! • Medelian inheritance • high polymorphism • co-dominant but difficult to “phase” • “neutral” and “selected” Next generation sequencing (NGS) technics are clearly revolutionizing population genetics. Data are no more limiting but existing methods can not deal with such large data sets, they must be adapted and new methods must be developed. See course 5 : CoalHMM models

Introduction

Definitions

What’s a marker

Mutation models

Modelling of the mutation processes of the different genetic markers

• To use population genetic models with the data obtained from

genetic markers, we need to take into account : • the mutation processes creating the DNA variants • but also the difference between the DNA variants and what we

can observe given the molecular technics used

→ need to define mutation models 2 contrasting examples of mutation processes : the case of microsatellites vs. DNA sequence data

Introduction

Definitions

What’s a marker

Mutation models

Modelling of the mutation processes of the different genetic markers • What is the main cause of mutation at microsatellite loci : • sequences of repeated short DNA motifs, e.g. (CA)10 • creation of DNA loops during replication

Introduction

Definitions

What’s a marker

Mutation models

Modelling of the mutation processes of the different genetic markers

• What is the main cause of mutation at microsatellite loci : • sequences of repeated short DNA motifs, e.g. (CA)10 • creation of DNA loops during replication

→ (1) the new mutated allele have gained or loss (CA) motif(s) → (2) it is a relatively frequent process, leading to high mutation rates (e.g. 5 · 10−4 per generation)

Introduction

Definitions

What’s a marker

Mutation models

The different mutation models

• Several mutation models have been developped for allelic

data : • The infinite allele model (IAM) assumes that every new allele

created by mutation is different from the existing ones. Identity in state is thus equivalent to identity by descent. • The K-allele model (KAM) assumes there are K possible allelic states and that mutations from one to all other states are equiprobable (1/(K − 1)).

Introduction

Definitions

What’s a marker

Mutation models

The different mutation models

• Several mutation models have been developped for allelic

data : • The stepwise mutation model (SMM) was designed to analyze

alleles characterized by their electrophoretic mobility. It assumes that each mutation increase or reduce the mobility by one “step”. Application to microsatellite markers is direct by considering loss or gain of one repeated motif (e.g. (TG)).

Introduction

Definitions

What’s a marker

Mutation models

Modelling of the mutation processes of the different genetic markers

• Main mutation processes occurring on DNA sequences are : - Single nucleotide changes (i.e. ATGC) - Insertions / deletions of one or more nucleotides - mutations are rare : 10−8 to 10−12 per nucleotide per generation

Introduction

Definitions

What’s a marker

Mutation models

Modelling of the mutation processes of the different genetic markers • Main mutation processes occurring on DNA sequences are : - Single nucleotide changes (i.e. ATGC) - Insertions / deletions of one or more nucleotides - mutations are rare : 10−8 to 10−12 per nucleotide per generation • Several models exist for nucleotide evolution : - The simplest : the Infinitely many Site Model (ISM) An infinitely long sequence → each mutation occur on a different site Each polymorphic site can have 2 states : ancestral (0) and derived (1) Haplotypes of a sample can be written as a series of 0/1 : e.g. 101011011

Introduction

Definitions

What’s a marker

Mutation models

Modelling of the mutation processes of the different genetic markers

• Main mutation processes occurring on DNA sequences are : - Single nucleotide changes (i.e. ATGC) - Insertions / deletions of one or more nucleotides - mutations are rare : 10−8 to 10−12 per nucleotide per generation • Several models exist for nucleotide evolution : - The simplest : the Infinitely many Site Model (ISM) - Many other models exist...with 0 to 10 parameters... e.g. combining different nucleotide transition rates, specific insertion/deletions rates, time variables rates, etc. They are usually calibrated using phylogenetics and fossil data.

Introduction

Definitions

What’s a marker

Mutation models

Modelling of the mutation processes of the different genetic markers More generally, mutation processes acting on the genetic markers can be modeled using Markov chains and a transition probability matrix between alleles or haplotypes (mutation matrix U). For a KAM with K = 6 states, the transition matrix is  u u u 1 − u K u−1 K −1 K −1 K −1 u u u  u  K −1 1 − u K −1 K −1 K −1  u u u u 1 − u K −1  K −1 K −1 U ≡ (uij ) =  K u−1 u u 1 − u K u−1  K −1 K −1 K −1  u u u u  K −1 1−u K −1 K −1 K −1 u K −1

u K −1

u K −1

u K −1

u K −1

u K −1 u K −1 u K −1 u K −1 u K −1

1−u

        

Introduction

Definitions

What’s a marker

Mutation models

Modelling of the mutation processes of the different genetic markers More generally, mutation processes acting on the genetic markers can be modeled using Markov chains and a transition probability matrix between alleles or haplotypes (mutation matrix U). For a KAM with K = 6 states, the transition matrix is   1 − u u/5 u/5 u/5 u/5 u/5  u/5 1 − u u/5 u/5 u/5 u/5     u/5 u/5 1 − u u/5 u/5 u/5    U ≡ (uij ) =   u/5 u/5 u/5 1 − u u/5 u/5    u/5 u/5 u/5 u/5 1 − u u/5  u/5 u/5 u/5 u/5 u/5 1 − u

Introduction

Definitions

What’s a marker

Mutation models

Modelling of the mutation processes of the different genetic markers More generally, mutation processes acting on the genetic markers can be modeled using Markov chains and a transition probability matrix between alleles or haplotypes (mutation matrix U). For a SMM with 6 states, the transition  1−u u 0  u/2 1 − u u/2   0 u/2 1 − u U ≡ (uij ) =   0 0 u/2   0 0 0 0 0 0

matrix is  0 0 0 0 0 0   u/2 0 0   1 − u u/2 0   u/2 1 − u u/2  0 u 1−u

Introduction

Definitions

What’s a marker

Mutation models

Modelling of the mutation processes of the different genetic markers

More generally, mutation processes acting on the genetic markers can be modeled using Markov chains and a transition probability matrix between alleles or haplotypes (mutation matrix U). Those mutation matrix are everywhere in population genetics to go from identity by descent (i.e. no mutation since the common ancestor of two genes, IAM, ISM) to identity in state (i.e. the observed allelic type of two genes is the same).

Introduction

Definitions

What’s a marker

Mutation models

Books Notion of genetic markers and mutation models are in all good books of population genetics... especially developed from a biological point of view in