Download mnw2yr_lec17_2004

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Medical genetics wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

NUMT wikipedia , lookup

Gene therapy wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Oncogenomics wikipedia , lookup

Behavioural genetics wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Point mutation wikipedia , lookup

Segmental Duplication on the Human Y Chromosome wikipedia , lookup

Gene desert wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Metagenomics wikipedia , lookup

Skewed X-inactivation wikipedia , lookup

Transposable element wikipedia , lookup

Genealogical DNA test wikipedia , lookup

Gene wikipedia , lookup

Gene expression programming wikipedia , lookup

Genomic imprinting wikipedia , lookup

Genetic engineering wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Y chromosome wikipedia , lookup

Copy-number variation wikipedia , lookup

Chromosome wikipedia , lookup

Minimal genome wikipedia , lookup

Molecular Inversion Probe wikipedia , lookup

Karyotype wikipedia , lookup

Neocentromere wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Non-coding DNA wikipedia , lookup

Microsatellite wikipedia , lookup

History of genetic engineering wikipedia , lookup

Helitron (biology) wikipedia , lookup

X-inactivation wikipedia , lookup

RNA-Seq wikipedia , lookup

Designer baby wikipedia , lookup

Human genetic variation wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Pathogenomics wikipedia , lookup

Polyploid wikipedia , lookup

Microevolution wikipedia , lookup

Genomic library wikipedia , lookup

Human Genome Project wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Human genome wikipedia , lookup

Genome editing wikipedia , lookup

SNP genotyping wikipedia , lookup

Genomics wikipedia , lookup

Genome evolution wikipedia , lookup

Genome (book) wikipedia , lookup

Public health genomics wikipedia , lookup

Tag SNP wikipedia , lookup

Transcript
Genomics
An introduction
Aims of genomics I

Establishing integrated databases –
being far from merely a storage

Linking genomic and expressed
gene sequences
cDNA
Aims of genomics II

Describing every gene:
• function/expression
data/relationships/phenotype
• 3-d structure and features (introns/exons,
domains, repeats)
• similarities to other genes

Characterize sequence diversity in
population
Genomics can be:

Structural
– where it is?

Functional
– what it does?
– DNA microarrays:

Comparative
– finding important fragments
Mapping genomes

Past
– Genetic maps
Distance between simple markers
expressed in units of recombination
– Cytological maps
Stained chromosomes, observable
under microscope

Present
– Physical maps
Distance between nucleotides
expressed in bases
– Comparative map
Corresponding genes detection;
Regulatory sequence detection;
Genome sizes
Organism
DNA
length
Genes
Mycoplasma
genitalium
0.5 Mb
470
Deinococcus
radiodurans
3 Mb in 410 copies!
3 200
Escherichia coli
4.5 Mb
4 400
Saccharomyces
12 Mb
6 200
Caenorhabditis
elegans
97 Mb
22 000
Drosophila
melanogaster
120 Mb
18 000
Homo sapiens
3200 Mb
32 000
cerevisiae
Genetic differences among humans

Goals
– Genetic diseases
– Identifying criminals

Methods
– Genetic markers (fingerprints) and DNA sequence.
Repeats:
• Microsatellites (repeats of 1-12 nucleotides)
• Minisatellites (> 12)
– Other types of variation
• Genome rearrangements
• Single nucleotide mutations
Microsatellites and disease

Huntington’s disease
– Huntingtin gene of unknown (!) function
– Repeats #: 6-35: normal; 36-120: disease
•
Friedrich ataxia disease
– GAA repeat in non-coding (intron) region
– Repeats #: 7-34: normal; 35 up: disease
– Repeat expansion reduces expression of frataxin gene
SNP - Single Nucleotide Polymorphism

Definition
– SNP and phenotype

Occurrence in genome
– Rarity of most SNPs (agrees with
neutral molecular evolutionary theory)
– SNPs in human population:
Inter-genic regions
Coding regions
Every 1400bp
Every 1430bp
• High variance in genome!

Detection of SNPs: Hybridization
Sickle cell anemia
Sickle looks like this:
SNP on Beta Globin gene, which is
recessive:
• 2 faulty copies: red blood cells
change shape under stress anemia
• 1 faulty copy: red blood cells
change shape under heavy stress –
but gives resistance to malaria
parasite
SNPs and haplotypes
Passengers and their evolutionary
vehicles
SNP - Phase inference

In the data from sequencing the genome the origin of SNP is
scrambled
G G
...CT AC GT...
T A
Possibility 1
CTGACGGT...
...
CTTACAGT...
...
chromosome
...
chromosome
...

Possibility 2
CTGACAGT...
CTTACGGT...
Which SNPs are on the same chromosome (are in phase)?
SNP – phase inference
Determining the parent of origin for each SNP
G C
...CT AC GT...
A G
C A
CT AC GT...
T A
...
G G
CT AC GT...
T A
...
In this case:
GG
TA
Phase inference – the reason why many SNPs sequencing is done for child
and two parents.
Linkage Disequilibrium, intro
How hard is it to break a chromosome

An allele/trait/SNP A and a are on the same position in genome
(locus), thus on a single chromosome an individual can have
either of them – but not both
– fA - frequency of occurrences of trait A in population
– fa = 1- fA
– fB, fb = 1 - fB are frequency occurrences of B and b

Probabilities of occurences of both traits on the same
chromosome:
A
B
fAB
A
b
fAb
a
B
faB
fab
a
b

LD and genomic recombination
Linkage Disequilibrium, calculation




When these alleles are not correlated we expect them to occur
together by chance alone:
fAB = fA fB
fAb = fA fb
faB = fa fB
fab = fa fb
But if A and B are occurring together more often (disequilibrium
state), we can write
fAB = fA fB + D
fAb = fA fb - D
faB = fa fB - D
fab = fa fb + D
where D is called the measure of disequlibrium
Of course from definitions above we have D = fAB - fA fB
How can we use it?
Phase inference tells us how SNPs are
organized on chromosome
 Linkage disequilibrium measures the
correlation between SNPs

Back to SNPs
Daly et al (2001), Figure 1
Haplotypes - vehicles for SNPs


Daly et al (2001) were able to infer offspring haplotypes largely
from parents. They say that “it became evident that the region
could be largely decomposed into discrete haplotype blocks,
each with a striking lack of diversity“
The haplotype blocks:
– Up to 100kb
– 5 or more SNPs
For example, this block shows just two distinct haplotypes
accounting for 95% of the observed chromosomes
Haplotypes on the genome fragment
a)
b)
c)
Observed haplotypes with dotted lines wherever probability of switching to another line is > 2%
Percent of explanation by haplotypes
Contribution of specific haplotypes
Another genetic test
Does haplotypes exist?
-
Each row represents an SNP
-
Blue dot = major
yellow = minor
-
Each column represents a
single chromosome
-
The 147 SNPs are divided into
18 blocks defined by black
lines.
-
The expanded box on the right
is an SNP block of 26 SNPs
over 19kb of genomic DNA. The
4 most common of 7 different
haplotypes include 80% of the
chromosomes, and can be
distinguished with 2 SNPs
How much SNPs we can ignore?
…and still predict haplotypes with high accuracy?
Literature
Gibson, Muse „A Primer of Genome Science”
 N Patil et al . Blocks of limited haplotype
diversity revealed by high-resolution scanning
of human chromosome 21 Science 294
2001:1719-1723.
 M J Daly et al . High-resolution haplotype
structure in the human genome Nat. Genet.
29 2001: 229-232.
