Download The Human Genome

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Gene desert wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Community fingerprinting wikipedia , lookup

Gene regulatory network wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Exome sequencing wikipedia , lookup

Genomic imprinting wikipedia , lookup

Gene wikipedia , lookup

Non-coding DNA wikipedia , lookup

Ridge (biology) wikipedia , lookup

Gene expression profiling wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

RNA-Seq wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Genomic library wikipedia , lookup

Molecular evolution wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
The Human
Genome
Some interesting facts
Biological system overview
Genes have
variability, which
causes a phenotype
Genes need to
be expressed
at the right
time in the
right place ~
5k – 10k
genes per
tissue
Proteins and
RNAs interact
in pathways
and networks
~8 interactions
pp
Genes encode
proteins
which may be
processed or
modified 100k – 500k
proteins
The human genome

Genome size: 3200 Mbp
24
chromosomes
+
mitochondrion
http://www.ensembl.org
Sequencing the genome


In 1953 James Watson and Francis Crick
discovered the structure of DNA - the code of
instructions for all life on earth
50 years later the human genome was
sequenced by hierarchical shotgun sequencing
Sequencing the genome




The human genome was sequenced by:
 The International Human Genome
Sequencing Consortium
 Celera Genomics
Technique –hierarchical shotgun sequencing
Draft sequences release in early 2001, but
~10% euchromatin missing and 150 000 gaps!
After finishing -rereleased in 2004 with 341
gaps and covering 99% of euchromatic genome
Sequencing time period
First human genome
took ~5 years and
cost ~$3 billion
Now, can sequence in
a few weeks for
~$5,000
BUT: doesn’t consider
cost and time for data
analysis!
International Human Genome Sequencing
Consortium 2001. Nature 409, 860 – 921.
Size of the genome
There are 100 trillion
(100,000,000,000,000) cells in your body.
 There are three billion (3,000,000,000)
base pairs in the DNA code within each
cell.
 The genome requires more than 3
gigabytes of computer storage space
 Full genome done by NGS costs
$100/genome per year to store

http://www.pbs.org/wgbh/nova/genome/facts.html
Interesting facts




If all the DNA in your body was put end to end, it
would reach to the sun and back over 600 times
(100 trillion times six feet/92 million miles).\
If unwound and tied together, the strands of DNA in
one cell would stretch almost six feet but would be
only 50 trillionths of an inch wide.
It would take a person typing 60 words per minute,
eight hours a day, around 50 years to type the
human genome.
If all three billion letters in the human genome were
stacked one millimeter apart, they would reach a
height 7,000 times the height of the Empire State
Building.
http://www.pbs.org/wgbh/nova/genome/facts.html
Some statistics

Only 1.5% of genome is coding
Other non-protein coding sequence is for other
kinds of “genes” or “lost genes”

A proportion of our genome is not our own!

 50%

repeat regions, most of viral origin!
 single most common protein is the "recipe" for
making Reverse Transcriptase
99.9% of our sequences are identical
Number of human genes



First estimates of between 20 000 and 150 000
genes
Seems to be between 20 000 and 30 000 genes
Expansion of the number of different protein
molecules due to:
 (a)
alternative splicing (30 to 50% increase);
 (b) post-translational modifications (5 to 10 fold
increase)

There could be about 1 million different
protein molecules in the human body
Gene numbers
21000
14000 genes
22000
19000 genes
2000-5000 genes
6000 genes
24000 genes
Latest genome build
Known protein-coding genes: 20,442
 Novel protein-coding genes: 434
 Pseudogenes: 15,007
 RNA genes: 12,523
 Gene exons: 649,964
 Gene transcripts: 181,744

Protein coding genes
Many of the genes are alternatively spliced
 Human genes have short exons (50
codons) and long introns (10k)
 Average gene length is 3000bp, max is 2.4
mill
 We know the function of less than half of
all the genes

Comparative genomics

Organism Genome
No. of
Comparing the
size (Mbp) genes
human genome to
Human 3000
21,000
others:
Mouse
2800
22,000
Fruit fly
180
14,000
Worm
97
19,000
Yeast
12
6000
Evolution of humans
Genes in common with other organisms
About 75% of
human genes have
non-human
homologues, ~70%
match mouse
proteins
International Human
Genome Sequencing
Consortium 2001. Nature
409, 860 – 921.
Functional composition
Humans have
more
multifunctional
genes, and genes
involved in cell-cell
communication
and signalling
International Human Genome
Sequencing Consortium 2001.
Nature 409, 860 – 921.
Human genome resources
Ensembl
 UCSC Genome Browse
 OMIM –human genes and inherited
disorders
 dbSNP -single nucleotide polymorphisms
 Genetic Map at NCBI
 Etc.

http://www.ncbi.nlm.nih.gov