Download lecture 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Major insights from the HGP on
1) Gene content
2) Proteome content
3) SNP identification
4) Distribution of GC content
5) CpG islands
6) Recombination rates
7) Repeat content
Nature (2001) 15th Feb Vol 409 special issue; pgs 814 & 875-914.
1) Gene content
30 - 40,000 protein-coding genes estimated
based on known genes and predictions
definite genes
possible genes
IHGSC
24,500
5000
Celera
26,383
12,000
Genes encode either
protein or
noncoding RNAs
rRNA, tRNA, snRNA, snoRNA
Nature (2001) 15th Feb Vol 409 special issue; pg 814-816 and 860-914.
Gene content….
More genes: Twice as many as drosophila /
C.elegans
Uneven gene distribution: Gene-rich and genepoor regions
More paralogs: some gene families have
extended the number of paralogs e.g.
olfactory gene family has 1000 genes
More alternative transcripts: Increased RNA
splice variants produced thereby expanding
the primary proteins by 5 fold (e.g.
neurexin genes)
Nature (2001) 409: pp 892
Gene content
Uneven gene distribution
Gene-rich
E.g. MHC on chromosome 6 has 60 genes
with a GC content of 54%
Gene-poor regions
82 gene deserts identified
? Large or unidentified genes
What is the functional significance of these
variations?
Genetics by Hartwell: pp 341-347
2) Proteome content
proteome more complex than invertebrates
Protein Domains (sections with identifiable
shape/function)
Domain arrangements in humans
largest total number of domains is 130
largest number of domain types per protein is 9
Mostly identical arrangement of domains
A
A
B
B
B
C
C
C
C
C
Nature (2001) 15th Feb Vol 409 special issue; pg 847
Protein X
2) Proteome content….
proteome more complex than invertebrates……
no huge difference in domain number in humans
BUT, frequency of domain sharing very high in human
proteins (structural proteins and proteins involved in
signal transduction and immune function)
However, only 3 cases where a combination of 3 domain
types shared by human & yeast proteins.
e.g carbomyl-phosphate synthase (involved in the first 3
steps of de novo pyrimidine biosynthesis) has 7 domain
types, which occurs once in human and yeast but twice in
drosophila
Nature (2001) 15th Feb Vol 409 special issue; pg 847
3) SNPs (single nucleotide polymorphisms)
More than 1.4million SNPs
identified
One every 1.9kb length on average
Densities vary over regions and
chromosomes
e.g. HLA region has a high SNP density,
reflecting maintenance of diverse
haplotypes over many millions of
years
Nature (2001) 15th Feb Vol 409 special issue; pgs 821-823 & 928
How does one distinguish sequence errors
from polymorphisms?
sequence errors
Each piece of genome sequenced at least 10 times
to reduce error rate (0.01%)
Polymorphisms
Sequence variation between individuals is 0.1%
To be defined as a polymorphism, the altered
sequence must be present in a significant
population
Rate of polymorphism in diploid human genome is about 1 in
500 bp
Nature (2001) 15th Feb Vol 409 special issue; pgs 821-823 & 928
3) SNPs……
 Sites that result from point mutations in
individual base pairs
 biallelic
 ~60,000 SNPs lie within exons and
untranslated regions (85% of exons lie within
5kb of a SNP)
 May or may not affect the ORF
 Most SNPs may be regulatory
Nature (2001) 15th Feb Vol 409 special issue; pg 821 & 928
http://www.genetics.gsk.com/kids/medicine01.htm
3) SNPs……and disease
3) SNPs……and risk of disease
3) SNPs……and drug prescription
4) Distribution of GC content
Genome wide average of 41%
Huge regional variations exist
E.g.distal 48Mb of chromosome 1p-47%
but chromosome 13 has only 36%
Confirms cytogenetic staining with G-bands
(Giemsa)
dark G-bands – low GC content (37%)
light G-bands – high GC content (45%)
Nature (2001) 15th Feb Vol 409 special issue; pg 876-877
5) CpG islands
CpG
Methyl CpG
methylated at C
TpG
Deamination
CpG islands show no methylation
Significance of CpG islands
1) Non-methylated CpG islands associated
with the 5’ ends of genes
2) Aberrant methylation of CpG islands is
one mechanism of inactivating tumor
suppressor genes (TSGs) in neoplasia
http://www.sanger.ac.uk/HGP/cgi.shtml
CpG islands
Greatly under-represented in human
genome
• ~28,890 in number
• Variable density
e.g. Y – 2.9/Mb but
16,17 & 22 have 19-22/Mb
Average is 10.5/Mb
Nature (2001) 15th Feb Vol 409 special issue; pg 877-888
6) Recombination rates
2 main observations
• Recombination rate increases with
decreasing arm length
• Recombination rate suppressed near
the centromeres and increases
towards the distal 20-35Mb
7) Repeat content
a) Age distribution
b) Comparison with other genomes
c) Variation in distribution of repeats
d) Distribution by GC content
e) Y chromosome
Nature (2001) 409: pp 881-891
Related documents