Download SNP - Asia University, Taiwan

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Behavioural genetics wikipedia , lookup

Heritability of IQ wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Oncogenomics wikipedia , lookup

Gene wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Genetic engineering wikipedia , lookup

History of genetic engineering wikipedia , lookup

Non-coding DNA wikipedia , lookup

Copy-number variation wikipedia , lookup

Epistasis wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene desert wikipedia , lookup

Mutation wikipedia , lookup

Frameshift mutation wikipedia , lookup

Human genome wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genealogical DNA test wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genome (book) wikipedia , lookup

Designer baby wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Genome evolution wikipedia , lookup

RNA-Seq wikipedia , lookup

Public health genomics wikipedia , lookup

Genetic drift wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Point mutation wikipedia , lookup

Population genetics wikipedia , lookup

Molecular Inversion Probe wikipedia , lookup

Haplogroup G-M201 wikipedia , lookup

Microevolution wikipedia , lookup

Human genetic variation wikipedia , lookup

SNP genotyping wikipedia , lookup

Tag SNP wikipedia , lookup

Transcript
SNP and Variation
Ka-Lok Ng
Asia University
References
• http://www.mun.ca/biology/scarr/4241rm_chapter31.html
• http://www.bioinfo.rpi.edu/~bystrc/courses/biol4540/lectu
re21/lec21.pdf
Introduction
• Having sequenced the genomics  then studies the nature and
distribution of variation between individuals
• Variation at DNA level = nucleotide insertions, deletions, and
Single Nucleotide Polymorphism (SNP) or small nucleotide
polymorphisms
• SNP refers to any site where two or more different nucleotides are
segregating in population.
• Cluster of linked SNP’s = haplotype
• SNP’s and haplotype’s are increasingly important component in
biological studies which range from ecology and evolution to
biomedical (disease association study)
• These variations apply to characterization of population structure
and history or functional study of genes.
• They are indispensable for recombination mapping purposes
(linkage analysis) or used as positional markers for physical
mapping
• SNPs are the most common genetic variations occur once every
100 to 300 bases.
The Nature of Single Nucleotide Polymorphisms
Classification of SNP’s
• Most common = changing from one base to another
• This could either be transversions or transitions
• Could also be insertions and deletions, also termed “indels”
• Some geneticists see two-nucleotide changes and small
insertions/deletions of a few nucleotides as SNP’s, therefore
simple-nucleotide polymorphism may be a better description
• Microsatellites, longer sequence repeats, and any other
molecular polymorphism (transposable element insertions,
deletions, chromosome inversions and translocations, and
aneuploidy) are not regarded as SNPs
• Aneuploidy is an error in cell division that results in the "daughter"
cells having the wrong number of chromosomes. In some cases
there is a missing chromosome, while in others an extra.
Classification of SNP’s
• SNP’s classified on nature of affected nucleotide
• Noncoding SNP – 5’ or 3’ nontranscribed region (NTR), 5’ or 3’
untranslated region (UTR), intron, or intergenic
3.1 (Part 1) Human promoter SNPs that affect gene expression
•
•
•
•
Coding SNP – replacement polymorphisms (change the amino acid encoded for) or
synonymous polymorphisms (change the codon but not the amino acid)
Nonreplacement polymorphisms include both synonymous and noncoding
polymorphisms, but, could still affect gene function by having an effect on
transcriptional or translational regulation, splicing, or RNA stability.
This type of polymorphism is important in increased genetic variation (Fig 3.1).
Fig. 3.1 – a collection of over 140 human promoter SNPs that have been associated with
an effect on gene expression or TF binding, and in many cases, a clinical outcome
Fig. 3.1. Human promoter
SNPs that affect gene
expression. These are loci for
which a SNP has been
implicated in modulation of
transcript levels, either by
statistical association or using a
biochemical assay in cell lines
that are dispersed throughout
the human genome. The figure
shows where some of these
nonreplacement polymorphisms
lie and affect gene expression.
3.1 (Part 2) Human promoter SNPs that affect gene expression
Fig. 3.1. Human promoter SNPs that affect gene expression.
SNPs can also be classified as transitions or transversions
• Transitions – change purine to a purine (A  G) or a
pyrimidine to a pyrimidine (C  T)
• Transversions – change purine to pyrimidine and vice versa
(A or G  C or T and vice versa)
• Transitions tend to occur just as frequently as transversions
and are actually more prevalent (普遍的), despite
transversions having twice as many possible changes
• This holds broadly true for both coding and noncoding SNP’s
• In part a result of difference in ab initio (protein prediction)
mechanisms where certain types of mutations arise and are
repaired
• Due to the nature of the genetic code, transitions are less
likely to affect amino acids than transversions.
• This means transitions are thought to have a higher
probability of retaining the proper coding regions
number of transitions/number of transversions > 1 in coding region
Synonymous
– TGT  TGC results in Cys  Cys
Nonsynonymous: replacement
– TGT  TGG results in Cys  Trp
– can be conservative or nonconservative
– Nonsynonymous: nonsense mutation,
introduction of a stop codon
– TGT  TGA results in Cys  stop
– Nonsynonymous: read through
mutation
– TAA  TTA results in stop  Ile
SNP and disease
•
Sickle-cell anaemia – a disease caused by a specific SNP: an AT mutation (GTGAG  GTGTG)
in the b-globin gene changes a Glu  Val, creating a sticky surface on the haemoglobin molecule
that leads to polymerization of the deoxy form
SNP and blood groups – A, B and O alleles
•
A and B alleles differ by four SNP substitutions
•
They code for related enzymes that add different saccharide (sugar, general formula (CH2O)n)
units to an antigen on the surface of red blood cells (rbc)
Allele
A
B
O
•
Sequence
….gctggtgacccctt
….gctcgtcaccgcta
….cgtggt-acccctt
Saccharide
N-acetylgalactosamine
galacotse
--
The O allelle has undergone a mutation causing
a phase shift, and produce no enzyme. The rbc of type O
contain neither the A nor the B antigen, This is why people with
type O blood are universal donors in bolld transfusions.
The loss of activity of the protein does not seem to carry
any adverse consequences.
The ABO antigens are terminal sugars found at the end of long
sugar chains (oligosaccharides) that are attached to lipids on
the red cell membrane. The A and B antigens are the last
sugar added to the chain. The "O" antigen is the lack of A or B
antigens but it does have the most amount of next to last
terminal sugar that is called H antigen.
http://matcmadison.edu/is/hhps/mlt/mljensen/BloodBank/lectures/abo_bl
ood_group_system.htm
• In classical population genetic theory, genetic loci
are only regarded as polymorphic if the frequency of
the most common allele is < 95%  that is a 5%
changes
• Most SNP are first detected in a sample of fewer
than 10 individuals, so the frequency criterion is not
applied; all single nucleotide changes are described
initially as candidate SNPs.
• NCBI – dbSNP http://www.ncbi.nlm.nih.gov/SNP/index.html
• Seattle SNP http://pga.mbt.washington.edu
• From Fig. 3.1  chromosome 1 ‘FY’, and do a NCBI search
• NCBI  SNP  keyword  FY AND homo  refSNP ID: rs17851571
Comment - polymorphisms ≠ mutations
Confusion arises over the distinction between
polymorphisms and mutations, largely due to dual usage
of the term “mutation”.
All SNPs arise as mutations, in the sense that the
conversion of one nucleotide into another is a mutational
event. But by the time a seq. variant is observed in a
population, the event that created it is usually long past,
so the observed SNP is no longer a mutation – it is just a
rare seq. variant or a polymorphism.
Since the distinction only applies to a small fraction of all
SNPs, then the term polymorphism is more general.
Distribution of SNPs
• Distribution of SNP's lies within the domain of population
genetics
• Study of relationship between SNP's and phenotypic variation
lies in the domain of Quantitative Genetics
• Application of SNP  Quantitative trait loci (QTL), which
are loci that contribute to polygenic phenotypic variation
Neutral theory of molecular evolution
• Balance between mutation and genetic drift
• Rate of mutations introduced into a population = rate at which
polymorphisms are lost
• Most mutations whether deleterious, advantageous or neutral
in effect, are lost within a few generations
• The effect of selection – acts to reduce the frequency of
slightly deleterious alleles, but on occasion tends to favor a
new allele (positive selection) or maintain two or more
polymorphisms (balancing selection) at some loci
Three key concepts are important in characterizing SNP variation
• Allele frequency distribution
• Linkage disequilibrium
• Population stratification (層化)
Aspects of frequency distribution
• Population structure - example: SNP can be more frequent in one population than
another. As migration is a potent (有效的) source of diversity, isolation affects the rate at
which variation is lost (i.e. no variation) due to drift.
• Nucleotide Diversity - the average fraction of nucleotides that differ between a pair of
alleles chosen at random from a population
• Hs – lower nucleotide diversity, with an average of one SNP every kbp between the
chromosomes of any individuals
• Fly and maize – an order of magnitude greater polymorphism, with one SNP every 50100 bp
Linkage Disequilibrium and Haplotype Maps
• Linkage Disequilibrium (LD) – Non-random association of alleles
• LD allows mapping of disease loci in large population
• In humans - LD is commonly observed for several tens, and in many cases, ~100 kbps of
either side of SNP
• LD has an effect on haplotypes which display clustered distribution
• Broad approximation - Genome = tens of thousands of blocks
•
Each block = up to 100,000 bases
•
= 3 ~ 5 common haplotypes
•
Each haplotype = tens or hundreds of SNPs in LD
• International HapMap Project - Effort to map all common haplotypes in human genome
Population Stratification - the partitioning of genetic variation among population within
species
3.2 (Part 1) Nucleotide diversity in natural populations
Fig. 3.2 Nucleotide diversity in natural population. (A) Observed and expected
of SNP frequencies for 874 SNP's from 75 candidate human hypertension loci.
Rare alleles are the most frequent, and the number of SNPs in each frequency
class declines as the more rare allele becomes more common.
In a sample of several hundred alleles, the most common class of SNPs are
singletons (which appear only once in the sample), followed by doubletons, tripletons,
and so on. Only between 1/3 and ½ of all SNPs are “common” in the sense that the
more rare allele is present in more than 5% of the individuals.
3.2 (Part 2) Nucleotide diversity in natural populations
(B) LD (D’) decays with time (number of generations) in proportion to the recombination rate r.
(C) The level of nucleotide diversity is a function of recombination rate, and hence chromosomal
position, as in this example for fruit-fly.
(B) As number of generations ↑, frequency of SNP segregate ↑ (no more clustering)
 LD ↓
(C) as r ↑, nucleotide diversity ↑
NCBI – dbSNP http://www.ncbi.nlm.nih.gov/SNP/index.html
dbSNP accepts submissions for SNP, microsatellite repeats, and small-scale
deletion and insertion polymorphisms
dbSNP summary
for various species
dbSNP
Submitted data
1. The submitter HANDLE is a short tag that uniquely defines each submitting
laboratory in the database
2. A unique ssSNP identifier SNP order record, such as ss4923558, HANDLE =
YUSUKE
3. Keyword: ss4923558 AND homo
Keyword ss4923558 will return multiple records ! More than 11 rsSNP records
More than one submitter  more than one ssSNP  these ssSNP are clustered
into reference SNP identifier  rsSNP
dbSNP
Alleles: A/G
Ancestral Allele: G
Handle: YUSUKE, EGP_SNPS, PERLEGEN, ABI
Fasta seq.:
>gnl|dbSNP|rs3737559|allelePos=301|totalLen=601|taxid=9606|snpclass=1|alleles='A/G'|mol=
Genomic|build=126
• Gene View of SNP
•
•
•
Go to the bottom of the page
JBIC – sample size 1270, Allele frequency of A and G
Other populations have a smaller sample size
Click NCBI Assay ID  ss4923558 
•
•
•
Japanese Millennium Genome Project
Measured in a group of East Asian DNA
samples
There is no individual genotype data for
ss4923558
Click Handle|Submitter ID
YUSUKE|IMS-JST082810 
Allele frequency
G : 0.8929
A : 0.1071
Sample Size : 1270 (number of chromosomes)
Entrez SNP search terms
• http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Snp
SNP integration in Genome Browsers
Ensembl http://www.ensembl.org/index.html
rs3737559
Alleles
BioMart
• SNP rs3737559 is located in the following transcripts
• Genotype and Allele frequencies per population
The local DNA seq. within 100 kb on either side of the SNP is shown.
The different types of SNPs are color coded as to type (e.g. coding, intronic,
flanking or other). Deletion and insertion polymorphisms are indicated with a
triangle. The letters (K, M, R, S, W, Y) inside the SNP squares indicate the type
of SNP using IUPAC ambiguity codes.
UCSC Genome browser http://genome.ucsc.edu/cgi-bin/hgGateway
BRCA1 gene
SNP
NCBI Entrez Gene
Gene: BRAC1
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=search&term=brca1
SNP GeneView
The coding SNPs in the BRCA1 gene. Those that do not change the aa
are colored in green, those that result in a different aa are colored in red.
SNP association studies
Association studies
• A case group of people vs. a control group of people
• The case group - are diagnosed with some disease (e.g.
cystic fibrosis), react to some type of medicine, or are
even specially healthy (e.g. more than 100 years old)
• The control group are people that do not exhibit the
feature selected for the case group.
• For case-control studies, a selection of SNPs is
genotyped in both the case and control groups
• alleles (case group) > alleles (control group)  potential
markers for the observed phenotype
SNP and disease
• Functional variation – a SNP may be
assoicated with a nonsynonymous
substitution in a coding region