Download Evolutionary Genetics: Part 1 Polymorphism in DNA

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Introduction to evolution wikipedia , lookup

Koinophilia wikipedia , lookup

Mutation wikipedia , lookup

Transcript
Evolutionary Genetics: Part 1
Polymorphism in DNA
S. chilense
S. peruvianum
Winter Semester 2012-2013
Prof Aurélien Tellier
FG Populationsgenetik
Color code
Color code:
Red = Important result or definition
Purple: exercise to do
Green: some bits of maths
Population genetics
Evolution = changes between generations of frequency of
characters, traits or alleles
Why study the genetics of populations?
the population is the main unit at which selection acts !!!!!!!!!
Useful definitions
DNA has 4 bases Adenine, Guanine, Cytosine, Thymine
Haploid= organisms with one set of non-paired chromosomes
Diploids, Polyploids
Tetraploid for Maize,
Up to Hexaploid for Wheat
Decaploid for strawberries
Pentapoid to Duodecaploid for sugarcane
Chromosomal location of a gene is a locus
Several alleles can be observed at a locus (one from mother and one from father)
The complete set of alleles in a species or population = gene pool
The occurrence of one allele in proportion to total in gene pool = allele frequency
Population genetics
Definition: population genetics is the study of the frequencies of alleles in
populations as well as their temporal and spatial changes
Populations and species show variability:
what type and how much genetic variation exist within populations/species?
what are the forces that influence the amount of variation within
populations?
First question: what is the variability at the genetic (DNA) level?
Polymorphism in DNA: Sanger sequencing
Polymorphism in DNA: Illumina NGS sequencing
www.seqanswers.com
Polymorphism in DNA: Illumina NGS sequencing
Polymorphism in DNA: Illumina NGS sequencing
Polymorphism in DNA: Illumina NGS sequencing
Polymorphism in DNA: Illumina NGS sequencing
www.seqanswers.com
Polymorphism in DNA: Illumina NGS sequencing
Polymorphism in DNA: new new generation…
Polymorphism in DNA: how it looks like
Some definitions
Coding / non-coding DNA
Some definitions
For the coding DNA: start codon, stop codon
Some definitions: point mutations
Point mutation = change in the DNA base (e.g. T becomes G)
Insertions – deletions = removal or insertion of bases
exercise 1.1 and 1.2
use DNASP and the file 055twolines.fas
Some definitions: point mutations
Consequences on the protein:
Synonymous mutations: do not change the codon and the Amino Acid
Non-synonymous mutations: change the Amino Acid
Silent sites = non-coding regions + synonymous sites
- Frameshift mutation: change reading frame (due to indels)
- Nonsense mutation: stop codon is introduced
Positions in the codon:
-A position is fourfold degenerate if any nucleotide change specifies the same AA
(only 3rd position of a codon) ex: Glycine codons
-Twofold degenerate if two our of 4 changes specifies the same AA (ex 3rd position of
Glutamic Acid)
-Threefold degenerate? 3rd position Isoleucine
-Non-degenerate: any change specifies a different AA
Practical exercise
Using database to find info on sequences
Exercise: Download the file: data-plants.fas
1) Open it with DNASP. What do you see?
2) Go to www.ncbi.nih.gov to the BLAST tools
3) Look at the options, these are plant sequences. Can you retrieve where the
sequences are from?
4) You will be directed to the results of the BLAST: lets look at them by moving the
cursor on the lines. Scores for aligments.
5) What are the best hits? Then you will be directed to the GenBank directory of
sequences. What information are there for these sequences?
Practical exercise
Practical exercise
Practical exercise
Then we will insert information from the database into the sequences in DNASP.
Place the coding region using the: data -> assign coding regions
Then you see changes in the way DNASP shows the sequences (see example)
Open also in Mesquite
Can you find out how many changes are silent, synonymous or non-synonymous?
How many SNPs are there? (Single Nucleotide Polymorphisms = mutational changes)
Different types of data
Patterns of diversity can be observed in populations, species or among species.
Phylogenetic trees fall in the between species comparison class
Type of data: DNA sequences (SNPs), proteines sequences, microsatellites
Microsatellites are short stretches of repeated DNA:
TATATATATATATA
What matters is the number of motif repeats
One can look at their size using electrophoresis gel, but they contain less
information than SNP data.
Their mutation rate is also higher due to the ripping of the Polymerase on them.
Questions to ask when looking at data
-Are the sequences already aligned?
-Are the data from one population or more than one? Or different species?
-Are the data from sexually or asexually reproducing organisms?
-Are the sequences from coding or non-coding, or both DNA?
-Are the data from one locus or several loci?
-Do we see all sites or only the variable ones (SNP, indels or both)?
-Do we see all sequences or only the different ones?
-Are the data from microsatellites or SNPs?
=> Go to Exercise 1.3 for also the Solanum data
Population genetics: 4 evolutionary forces
random genomic processes
(mutation, duplication, recombination, gene conversion)
molecular diversity
natural
selection
random spatial
process (migration)
random demographic
process (drift)
Population genetics investigates the laws governing the genetic structure of
populations, and changes in allele frequencies over time
Population genetics: 4 evolutionary forces
random genomic processes
(mutation, duplication, recombination, gene conversion)
molecular diversity
natural
selection
random spatial
process (migration)
phenotypic
variability
random demographic
process (drift)
We want to infer the role of the evolutionary forces from sequence data
(very useful tool is the coalescent theory)
Divergence and mutation rate
Molecular clock
when DNA is passed from one generation to the next there is a constant
probability called µ that a mutation occurs
because the polymerase is not error free
we assume the rate is constant (though over long periods of time this may not be
true)
Probability of mutation of what?
at a site (per site mutation rate)
at an entire locus (locus mutation rate)
genome wide mutation rate
µ is the probability per generation of a mutation, and (1- µ) is the probability that
no mutation occurs
Molecular clock
thus P[no mutation for t generations] = (1- µ)t
as long as µ << t we can use an approximation:
1+x≈ex
and
1 - x ≈ e -x
result: P[no mutation for t generations] = e –µt graph of the exponential?
Molecular clock
Maths 1 : Probability, Expectation and Variance,
exponential distribution
Molecular clock
T is the time to the next mutation
P[no mutation for t generations] = P[T ≥ t ] = e –µt has
E[T ] =
1
µ
and Var[T ] =
1
µ2
Separated by at least 2t generations, and split between Drosophila populations
= 100 000 generations (10 000 years)
Molecular clock
D is the number of polymorphic sites between the sequences, also called the
divergence
K mutations have appeared along the branches of the two descendants populations,
which have length of 2t
Molecular clock
Maths 2: estimator
assuming K = D
thus E[D] = E[K] = 2µt
so we obtain a first estimator for µ:
D
µˆ =
2t
So we can calculate the mutation rate if we know the time of divergence and
the divergence from molecular data!!!!
Exercise 1.4: calculate the mutation rate in Drosophila