Download Proteins

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Opportunities and Challenges
in Computational Biology
Biology easily has 500 years of exciting problems to work on
-Donald E. Knuth
Molecular Biology Background
Biological Data
DNA:
• Self-replicating
• Codes for proteins
Proteins:
• Perform most functions in living organisms
DNA: Sequence of nucleotides
Nucleotide: Deoxyribose sugar + Phosphate +
O
Base
Nucleotides: A, T, G, and C
C
HN
O
O
P
O
C
C
5’
CH2
O
O
O
1’ C
3’ 2’
C 4’
C
C
H
OH H
CH3
CH
N
5’
3’
5’
P
P
P
3’
A
C
G
T
G
C
3’
P
P
P
5’
3’
5’
For computational purposes,
DNA = A sequence over alphabet {A,C,G,T}
5’ A T T C G G G A A T G C A T G C C A 3’
3’ T A A G C C C T T A C G T A C G G T 5’
Genome: Entire genetic constitution of a
living organism
Chromosome: Linear strand of DNA
Gene: A contiguous stretch of DNA that
codes for a protein
Species
Bacteriophage λ
Escherichia Coli
(bacterium)
Saccharomyces
Cerviciae (yeast)
Caenorhabditis elegans
(worm)
Drosophila
melanogaster (fruit fly)
Homo sapiens (human)
Number of
Genome Size
Chromosomes
1
5 X 104
1
5 X 106
32
1 X 107
12
1 X 108
8
2 X 108
46
3 X 109
Proteins: Chains of amino acid residues.
There are 20 different amino acids.
Functions:
• Tissue building blocks (Structure proteins)
• Catalysts (enzymes)
• Oxygen transport
• Antibody defense
R1
Cα
+H
3N
C
H
O
R3
N
C
Cα
Φ Cα ψ
N
H
O
R2
O-
C
O
First
Position
G
A
C
U
G
Second
A
Position
C
U
Third
Position
Gly
Gly
Gly
Gly
Glu
Gu
Asp
Asp
Ala
Ala
Ala
Ala
Val
Val
Val
Val
G
A
C
U
Arg
Arg
Ser
Ser
Lys
Lys
Asn
Asn
Thr
Thr
Thr
Thr
Met
Ile
Ile
Ile
G
A
C
U
Arg
Arg
Arg
Arg
Gln
Gln
His
His
Pro
Pro
Pro
Pro
Leu
Leu
Leu
Leu
G
A
C
U
Trp
STOP
Cys
Cys
STOP
STOP
Tyr
Tyr
Ser
Ser
Ser
Ser
Leu
Leu
Phe
Phe
G
A
C
U
Protein Synthesis (DNA  Protein)
DNA
Transcription
mRNA
Translation
Protein
Example
RNA:
AUG GGA GAG CUA UGA
Protein:
Met
Gly
Glu
Leu
STOP
Summary
What Can Be Done
Experimentally?
• DNA sequences of length up to 700-800
bp can be read (Sanger’s method).
• DNA samples can be amplified (PCR).
• Protein sequences can be determined.
• Structure of proteins can be determined
using X-ray crystallography (expensive,
tedious, time-consuming).
Challenges in Computational
Biology
1. Find the genomes of all organisms.
2. Identify and annotate genes.
3. Find the sequences, three dimensional
structures and functions of all proteins.
4. Find sequences of proteins that have desired
three dimensional structures.
5. Compare DNA sequences and proteins
sequences for similarity.
6. Understand gene expression, expression
regulation, and genetic networks.
7. Study the evolution of sequences and species.
Related documents