Download Gaber`s Lecture

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Bio-Breaks
July, 2009
Genes to Genomes
Bio-Bre ea
ks
July 1st, 2009
Genes to Genomes
What does life do?
subunits
A
T
length
DNA
“Structure”
Deoxyribo-nucleic Acid
1
Bio-Breaks
July, 2009
Genes to Genomes
chromosome
The Genome
nucleus
And more
order
Genome: All of the genes
Genes: Instructions
Proteins: Workhorses
histones
Genes
Central Dogma:
DNA
sequence
DNA
RNA
Protein
“Structure”
More
order
“We wish to suggest a structure for … [DNA]. This structure has
novel features which are of considerable biological interest.”
Watson
Crick
DNA Replication
DNA
New Watson
Crick
Watson
Complementary
Base-pairing
New Crick
2
Bio-Breaks
July, 2009
Genes to Genomes
Base Pairing
Complementary
Base-pairing
A G T A C G
of DNA
T C A Typical
T G “sequence”
C
“velcro”
nucleotide
Sugar
Phosphate
3’
Base
3’
…
Base
…
Base
Base
Sugar
Sugar
Phosphate
Phosphate
3’
3’
Base
Phosphate
5’
DNA
Sugar
Phosphate
3’
3’
Double Helix
Phosphate
…
Sugar
Base
Sugar
5’
Phosphate
T A
G C
C
A
3’
Sugar
A T
C G
Base
…
Sugar
Base
3’
Phosphate
G
T
5’
Length: 4 base-pairs long (4 bp)
3
Bio-Breaks
July, 2009
Genes to Genomes
Instructions (DNA) for every function (proteins)
Gene Expression
Gene Structure - Chromatin
Sequence has great importance
Central Dogma:
DNA
RNA
Protein
Watson
New Crick
100 million base pairs long
Gene Product
(protein)
RNA: Ribo-nucleic acid
Sequence has great importance
5’
3’
5’
nucleotides
5’
Phosphate
DNA vs. RNA
Base
Sugar
3’
Guanine
Guanine
Adenine
Adenine
Thymine
Uracil
Cytosine
Cytosine
DNA
5’
3’
DNA
RNA
3’
RNA
4
Bio-Breaks
July, 2009
Genes to Genomes
Transcription
DNA
RNA
5’
Phosphate
A
U
Occurs in the nucleus
Sugar
Phosphate
C
G
G
C
A
T
Sugar
Phosphate
Sugar
Phosphate
Complementary
Base-pairing
Sugar
3’
Digital copy…
…of digital info
(19-bp)
146 base-pairs of DNA wrap around the octamer
8 histone proteins
(octamer)
H2A
H2A
H3
H2B
H3 H4
nucleosome
H3
H2A
H2A
H4
H2B
H4
H4
H2B
H3
H2B
H2A
H3 H4
H2A
H2B
H4 H3
H2B
H2A
DNA
H3
H2A
H2A
H3
H2A
H3 H4
H2A
H4 H3
H2B
H4 H3
C
l
ro
t
on
H4
H2B
H4
H2B
H2B
ion
g
Re
H2A
H2B
H4 H3
H2B
n
gio
e
R
ing
d
Co
a gene
Smallest gene: ILGFII (252 bp)
Largest gene: Titin (>300,000 bp)
5
Bio-Breaks
July, 2009
Genes to Genomes
a gene
Genes
Watson A
T T T A G G A G A C G A T T G G A T A C C T C T A G A G C
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Crick
T A A A T C C T C T G C T A A C C T A T G G A G A T C T C G
H2A
H3
H2A
H4
H4
H2B
H3
H2B
H2A
H3 H4
H2A
H2B
H4 H3
H2B
H2A
H3 H4
H2A
H3
H2A
H2A
H3
H2A
H4 H3
H2B
H4 H3
C
l
ro
t
on
H4
H2B
H4
H2B
H2B
H2A
H2B
H4 H3
H2B
n
gio
e
R
ing
d
Co
ion
g
Re
6
Bio-Breaks
July, 2009
Genes to Genomes
a gene
Control of Gene Expression
Transcription
Activators
RNA
Polymerase
(regulatory proteins)
Coding Region
TFII
DNA
Pol II
GATA-1
P
P
P
P
~30 nucleotides/second
Nucleotide sequence
TF H TFI IB
Kin28II
P
Control Region
Transcription
Nucleotide sequence
5’
3’
messenger RNA(s) (proper # of copies)
Translation
Met•Ser•Ser•Val•Asn•Ala•Asn•Gly•Gly•Tyr…..Xxx
Proteins are made from the
20 different amino acids
(Proper sequence of amino acids)
proper shape
proper amount
Transcription
Activators
(regulatory proteins)
mRNA
Basal transcription
machinery
TFII
DNA
Pol II
GATA-1
TF H TFI IB
P
P
Kin28II
P
P
Pol II
~30 nucleotides/second
P
Control Region
CCCCTATAGGGG
||||||||||||
GGGGATATCCCC
Feelin’ Groovy!
Sequence
specificity
TTAACCGGGATATTAACCGG
||||||||||||||||||||
AATTGGCCCTATAATTGGCC
Transcription factors: Sequence-specific binding proteins
7
Bio-Breaks
July, 2009
Genes to Genomes
How do we know the function of a gene?
WT &inMutant
DNA
Differences
DNA sequences
mutations - functional aspects
Met•Tyr•Leu•His•Ser•Ala•Asn•Gly•Gly•Tyr•Thr•Lys•Pro•Gln•Lys•Tyr…..
Normal (“wild-type”) protein
Trans________
Trans________
Wild-type Gene
A T T T C G G A G A C A A T T A T G T A C C T C C A T A G C
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
T A A A G C C T C T G T T A A T A C A T G G A G G T A T C G
Control Region
(promoter)
Mutations
Coding Region
Smallest change possible: 1bp
Mutant GENOTYPE
A T T T C G G A G A C A A T T A T G T A C C T C C A G A G C
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
T A A A G C C T C T G T T A A T A C A T G G A G G T C T C G
Mutant Gene
Mutant protein
G
Met•Tyr•Leu•Gln•Ser•Ala•Asn•Gly•Gly•Tyr•Thr•Lys•Pro•Gln•Lys•Tyr…..
Mutant protein fails to function, results in a mutant PHENOTYPE
C
Sequence has great importance
1 in 15,000 births
Normal sequence of PKU gene
A
T
MUTANT sequence of PKU gene
C
G
A
C
MUTANT sequence of PKU mRNA
MUTANT sequence of
PKU polypeptide
Phenylalanine
hydroxylase
Phenylalanine
Tyrosine
Mutant Phenylalanine
hydroxylase (non-functional)
Phenylalanine
X
Tyrosine
8
Bio-Breaks
July, 2009
Genes to Genomes
Human Genome Sequencing
GGGCGGCCCTAAATA…
AAATTTGGGCCCTATATTGCGCCATAATGGGCGGCCC
GGGATCCCATAAATTTGGGCCC
1
2
3
4
5
6
7
seven “reads”
each ~ 500
nucleotides long
one contig
(watson) GGGTTTCCCTTT
||||||||||||
(crick) CCCAAAGGGAAA
2 million reads of ~ 500 nucleotides/read
Grouped into contigs ~ 10,000 base pairs
> 29,000 contigs = ~ 3 Billion base pairs
Human Genome Sequencing
>3,000 genes
M
bp
AAATTTGGGCCCTATATTGCGCCATAATGGGCGGCCCTAAATA…
200-400 genes
47
M
bp
2
25
21
1
22 autosomes
1 sex chromosome
2 million reads of ~ 500 nucleotides/read
Grouped into contigs ~ 10,000 base pairs
> 29,000 contigs = ~ 3 Billion base pairs
9
Bio-Breaks
July, 2009
Genes to Genomes
Bufo bufo
6.9 Billion
Budding yeast
12 million
TCCAGTCCCCTCAAGTCCGAAGCCCCTACCCACTCTCACGCCAGGCAGGGGTGGGGGCCG
CCGGGGTCATATAACCGGGCCCCTTCTCTGCCTTGATGAGCTCCGTTAACGCAAATGGAG
GATATACCAAACCACAAAAATATGTGCCAGGGCCAGGTGATCCTGAACTTCCACCCCAAC
TATCCGAATTTAAAGATAAAACATCGGATGAAATCTTGAAAGAAATGAACAGAATGCCTT
TTTTCATGACCAAGTTGGATGAAACAGACGGTGCAGGTGGTGAAAACGTGGAGTTAGAAG
CTTTAAAGGCATTAGCTTATGAAGGCGAACCACACGAAATCGCTGAAAATTTCAAGAAGC
AAGGTAACGAACTATACAAAGCAAAAAGATTCAAGGATGCAAGGGAACTTTACTCAAAGG
GCTTGGCTGTAGAATGCGAAGATAAATCAATAAATGAGTCACTATATGCCAATAGAGCGG
CATGTGAGTTAGAGCTGAAAAATTACAGGAGGTGTATCGAGGACTGCAGTAAAGCTCTAA
CTATTAACCCCAAGAATGTTAAGTGCTACTATCGTACAAGCAAGGCTTTTTTCCAATTAA
ACAAGTTGGAGGAGGCCAAATCAGCCGCAACATTTGCCAATCAAAGGATTGACCCAGAGA
ACAAATCAATTTTGAATATGTTATCAGTGATTGATAGAAAAGAACAAGAATTGAAAGCAA
AAGAAGAAAAACAGCAAAGAGAAGCTCAGGAACGTGAAAACAAGAAAATTATGTTAGAGA
GCGCAATGACGCTGAGAAACATAACTAACATCAAAACTCACTCTCCAGTAGAGTTACTTA
ATGAGGGTAAAATAAGGCTAGAAGACCCAATGGATTTTGAATCTCAATTGATCTATCCCG
CATTAATTATGTACCCCACGCAAGATGAATTTGATTTTGTAGGTGAAGTAAGTGAGTTAA
CTACTGTGCAAGAACTTGTTGACCTAGTTTTGGAAGGGCCGCAAGAACGCTTCAAAAAAG
AAGGTAAGGAAAACTTCACACCAAAGAAAGTGTTGGTGTTCATGGAAACAAAGGCAGGTG
GTTTGATTAAAGCTGGTAAGAAACTGACATTTCACGATATCTTGAAGAAAGAGTCGCCAG
ATGTACCATTGTTCGATAACGCTTTGAAAATATATATTGTGCCAAAGGTAGAAAGTGAAG
GGTGGATTTCCAAGTGGGATAAGCAAAAAGCCTTAGAAAGAAGATCTGTGTGAGGGGGCC
CGGGGGACGTCTTCCCAGGGCTCACTAAAACCGGCCGGGAAGCCTGGGCTGCACTAGGAG
CCGGCGACCCTGGGGCGAGGGGCGGCCCGGAGCCCTGCGGGAGGAGCTGGCGGCCGCCCC
AGGTAGCAACCATCCTGCCTCCCGCTGGAGCGGCGTCTCCTCCCCGGGAGGAGGGCAGGG
Various Genome SizesAmoeba dubia
670 Billion
Muntiacus
muntjak
2.5 Billion
Homo sapiens
Plasmodium falciparum
25 Million
3 Billion
TCCAGTCCCCTCAAGTCCGAAGCCCCTACCCACTCTCACGCCAGGCAGGGGTGGGGGCCG
CCGGGGTCATATAACCGGGCCCCTTCTCTGCCTTGATGAGCTCCGTTAACGCAAATGGAG
Bioinformatics
= Molecular Biology + Computer Science
GATATACCAAACCACAAAAATATGTGCCAGGGCCAGGTGATCCTGAACTTCCACCCCAAC
TATCCGAATTTAAAGATAAAACATCGGATGAAATCTTGAAAGAAATGAACAGAATGCCTT
TTTTCATGACCAAGTTGGATGAAACAGACGGTGCAGGTGGTGAAAACGTGGAGTTAGAAG
Transcription start sites
CTTTAAAGGCATTAGCTTATGAAGGCGAACCACACGAAATCGCTGAAAATTTCAAGAAGC
AAGGTAACGAACTATACAAAGCAAAAAGATTCAAGGATGCAAGGGAACTTTACTCAAAGG
Transcription termination sites
GCTTGGCTGTAGAATGCGAAGATAAATCAATAAATGAGTCACTATATGCCAATAGAGCGG
CATGTGAGTTAGAGCTGAAAAATTACAGGAGGTGTATCGAGGACTGCAGTAAAGCTCTAA
CTATTAACCCCAAGAATGTTAAGTGCTACTATCGTACAAGCAAGGCTTTTTTCCAATTAA
Translation start sites
ACAAGTTGGAGGAGGCCAAATCAGCCGCAACATTTGCCAATCAAAGGATTGACCCAGAGA
ACAAATCAATTTTGAATATGTTATCAGTGATTGATAGAAAAGAACAAGAATTGAAAGCAA
AAGAAGAAAAACAGCAAAGAGAAGCTCAGGAACGTGAAAACAAGAAAATTATGTTAGAGA
Translation stop sites
GCGCAATGACGCTGAGAAACATAACTAACATCAAAACTCACTCTCCAGTAGAGTTACTTA
ATGAGGGTAAAATAAGGCTAGAAGACCCAATGGATTTTGAATCTCAATTGATCTATCCCG
CATTAATTATGTACCCCACGCAAGATGAATTTGATTTTGTAGGTGAAGTAAGTGAGTTAA
CTACTGTGCAAGAACTTGTTGACCTAGTTTTGGAAGGGCCGCAAGAACGCTTCAAAAAAG
AAGGTAAGGAAAACTTCACACCAAAGAAAGTGTTGGTGTTCATGGAAACAAAGGCAGGTG
GTTTGATTAAAGCTGGTAAGAAACTGACATTTCACGATATCTTGAAGAAAGAGTCGCCAG
ATGTACCATTGTTCGATAACGCTTTGAAAATATATATTGTGCCAAAGGTAGAAAGTGAAG
GGTGGATTTCCAAGTGGGATAAGCAAAAAGCCTTAGAAAGAAGATCTGTGTGAGGGGGCC
CGGGGGACGTCTTCCCAGGGCTCACTAAAACCGGCCGGGAAGCCTGGGCTGCACTAGGAG
CCGGCGACCCTGGGGCGAGGGGCGGCCCGGAGCCCTGCGGGAGGAGCTGGCGGCCGCCCC
AGGTAGCAACCATCCTGCCTCCCGCTGGAGCGGCGTCTCCTCCCCGGGAGGAGGGCAGGG
Bioinformatics
10
Bio-Breaks
July, 2009
Genes to Genomes
Bioinformatics = Molecular Biology + Computer Science
Gene Hallmarks
protein
ribosome
mRNA
UGA
AUG
Translation
mRNA
UGA
AUG
Translation start site
Transcription
Translation
STOP site
X X X T A T A X X X X X X X X X X X X A T G X X X X X X X X X X X X X X X X X X X X X X X X T G A X X X X X X X T T A A A X X X X
DNA sequence
Transcription start site
Protein-coding region
Transcription
termination site
TCCAGTCCCCTCAAGTCCGAAGCCCCTACCCACTCTCACGCCAGGCAGGGGTGGGGGCCG
CCGGGGTCATATAACCGGGCCCCTTCTCTGCCTTGATGAGCTCCGTTAACGCAAATGGAG
GATATACCAAACCACAAAAATATGTGCCAGGGCCAGGTGATCCTGAACTTCCACCCCAAC
TATCCGAATTTAAAGATAAAACATCGGATGAAATCTTGAAAGAAATGAACAGAATGCCTT
TTTTCATGACCAAGTTGGATGAAACAGACGGTGCAGGTGGTGAAAACGTGGAGTTAGAAG
CTTTAAAGGCATTAGCTTATGAAGGCGAACCACACGAAATCGCTGAAAATTTCAAGAAGC
AAGGTAACGAACTATACAAAGCAAAAAGATTCAAGGATGCAAGGGAACTTTACTCAAAGG
Transcription
GCTTGGCTGTAGAATGCGAAGATAAATCAATAAATGAGTCACTATATGCCAATAGAGCGG
Start Translation
CATGTGAGTTAGAGCTGAAAAATTACAGGAGGTGTATCGAGGACTGCAGTAAAGCTCTAA
Control
CTATTAACCCCAAGAATGTTAAGTGCTACTATCGTACAAGCAAGGCTTTTTTCCAATTAA
Sequence
Sequence
ACAAGTTGGAGGAGGCCAAATCAGCCGCAACATTTGCCAATCAAAGGATTGACCCAGAGA
ACAAATCAATTTTGAATATGTTATCAGTGATTGATAGAAAAGAACAAGAATTGAAAGCAA
AAGAAGAAAAACAGCAAAGAGAAGCTCAGGAACGTGAAAACAAGAAAATTATGTTAGAGA
GCGCAATGACGCTGAGAAACATAACTAACATCAAAACTCACTCTCCAGTAGAGTTACTTA
ATGAGGGTAAAATAAGGCTAGAAGACCCAATGGATTTTGAATCTCAATTGATCTATCCCG
CATTAATTATGTACCCCACGCAAGATGAATTTGATTTTGTAGGTGAAGTAAGTGAGTTAA
CTACTGTGCAAGAACTTGTTGACCTAGTTTTGGAAGGGCCGCAAGAACGCTTCAAAAAAG
STOP Translation
AAGGTAAGGAAAACTTCACACCAAAGAAAGTGTTGGTGTTCATGGAAACAAAGGCAGGTG
GTTTGATTAAAGCTGGTAAGAAACTGACATTTCACGATATCTTGAAGAAAGAGTCGCCAG
Sequence
ATGTACCATTGTTCGATAACGCTTTGAAAATATATATTGTGCCAAAGGTAGAAAGTGAAG
GGTGGATTTCCAAGTGGGATAAGCAAAAAGCCTTAGAAAGAAGATCTGTGTGAGGGGGCC
CGGGGGACGTCTTCCCAGGGCTCACTAAAACCGGCCGGGAAGCCTGGGCTGCACTAGGAG
CCGGCGACCCTGGGGCGAGGGGCGGCCCGGAGCCCTGCGGGAGGAGCTGGCGGCCGCCCC
AGGTAGCAACCATCCTGCCTCCCGCTGGAGCGGCGTCTCCTCCCCGGGAGGAGGGCAGGG
11
Bio-Breaks
July, 2009
Genes to Genomes
CATGGGTCATATAACCGGGCCCCTTCTCTGCCTTGATGAGCTCCGTTAACGCAAATGGAG
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
GTACCCAGTATATTGGCCCGGGGAAGAGACGGAACTACTCGAGGCAATTGCGTTTACCTC
Chromosome
Schematic
Level
of Gene
with ORFs
Control Region
(promoter)
Region encoding protein
CNS1
1,155 base pairs long
ARA1
APD1
TBS1
SPP3 RIB7
CNS1
RPB5
AMN1
SLI15
YBR159W
ICS2
Chromosome II
200
400
600
800
Length of DNA (x 1,000) (=kb)
22,000 genes
( DNA expressed as protein products)
Need experimental verification
Only ~ 2% of genome
Protein products of many are unknown
Most are closely similar to those of
other organisms
February 2001
Only 1% of human genes are
unique to humans
12
Bio-Breaks
July, 2009
Genes to Genomes
Comparative Genomics
Comparative Genomics I
(different organisms)
Species
Bioinformatics
Base Pairs
Genes
Human
3 Billion
22,000
Worm
100 Million
19,000
Fruit Fly
120 Million
14,000
Arabidopsis
125 Million
25,000
Baker’s yeast
12 Million
6,000
E. coli
4 Million
4,800
Nature September, 2005
No Alzheimer’s, little cancer
~ 99% of chimp DNA
same as human
35 million differences:
one every ~ 100 base
pairs
Krystii Melaine
Pan troglodytes
~ 29% of all proteins 71% proteins have some
identical
sequence difference
~ 1-2 aa differences
between most proteins
~ 24% of the differences are in regions
that control expression of a gene.
13
Bio-Breaks
July, 2009
Genes to Genomes
Human
Apes
a
a
b
c
b
f
d
e
e
f
g
Inversions on 9 chromosomes
Present in each of the 18 species of great apes
February 2001
d
c
g
h
i
j
h
i
j
k
k
Dog Genome
bioinformatics suite
Genomes
Organism “Sequenced”
Human
~ $ Billion
Eukaryotes
> 150
Bacteria
> 800
Archaea
~ 50
Viruses
> 200
July 2004
2.9 Billion BP
Today: ~ $10,000-$50,000/human genome
Tasha
~ $30 Million
2.4 Billion BP
14
Related documents