Download Chapter 1: Bio Primer - Columbia CS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Protein adsorption wikipedia , lookup

RNA-Seq wikipedia , lookup

SR protein wikipedia , lookup

Protein wikipedia , lookup

Messenger RNA wikipedia , lookup

Molecular cloning wikipedia , lookup

Transcription factor wikipedia , lookup

Non-coding RNA wikipedia , lookup

Cell-penetrating peptide wikipedia , lookup

Replisome wikipedia , lookup

Promoter (genetics) wikipedia , lookup

RNA polymerase II holoenzyme wikipedia , lookup

Eukaryotic transcription wikipedia , lookup

Genetic code wikipedia , lookup

Epitranscriptome wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Gene regulatory network wikipedia , lookup

Biochemistry wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Proteolysis wikipedia , lookup

Molecular evolution wikipedia , lookup

Biosynthesis wikipedia , lookup

Non-coding DNA wikipedia , lookup

Point mutation wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Gene expression wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Deoxyribozyme wikipedia , lookup

List of types of proteins wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Transcript
Chapter 1: Bio Primer
1.1 Cell Structure; DNA; RNA; transcription; translation; proteins
Prof. Yechiam Yemini (YY)
Computer Science Department
Columbia University
COMS 4761 --2007
Overview




Cell structure and mechanisms
DNA; RNA; Transcription; Regulation
Translation; protein; sequence & structure
References:
 B. Alberts et al, “Molecular Biology of The Cell”, 4th edition, Garland
Science.
 R. Horton et al, “Principles of Biochemistry”, 3rd Edition, Prentice
Hall.
 J.D. Watson et al, “Molecular Biology of The Gene”, 5th edition,
Pearson Benjamin Cummings.
 NCBI Introductory overview:
http://www.ncbi.nih.gov/About/primer/index.html
 Animation sites:
o http://www.johnkyrk.com/
o http://vcell.ndsu.nodak.edu/~christjo/vcell/animationSite
COMS 4761 --2007
2
1
Organisms Are Made of Cells
COMS 4761 --2007
3
Prokaryotes & Eukaryotes Have Different Cells
 Prokaryotes: single cell organisms without nucleus
E.g., Bacteria: E-coli, H-Pylori
 Eukaryotes: single/multi-cell organisms with nucleus
E.g., Yeast, plants, drosophila, humans
Earth formed -4.5B yrs
Prokaryotic
bacteria
-3.5B yrs
-1.5B yrs
Nucleated
cells
Multi-cellular -0.5B yrs
eukaryotes
© Pearson; Benjamin
COMS
Cummings
4761 --2007
4
2
Prokaryotes
Single cell; size 0.2-2µm
Eukaryotes
Single or multi cell; cell size 10-100µm
No nucleus
Nucleus
Structure
One membrane at cell boundary Multiple membranes/compartments
DNA
No organelles
No cytoskeleton
Organelles: mitochondria, Golgi, chloroplasts
Cytoskeleton
Single circular DNA
Two or more chromosomes
Genes code proteins
Genes have large non-coding regions (introns)
90% of DNA encodes proteins 95-97% non-coding DNA
Proteins
~105-6 base pairs
~107-9 base pairs
DNA is loosely organized
DNA is tightly packed (chromatin + histones)
Cell division through fission
1-2k protein species
Mitosis
5-20k protein species
~106 proteins per cell
~109 proteins per cell
COMS 4761 --2007
5
Cells Are Made of Macromolecules
Small molecules: 3%
Macromolecules: 26%
Sugars
Polysaccharides
Fatty Acids
Fats, Lipids, Membranes
Amino Acids
Proteins
Nucleotides
Nucleic Acids (DNA, RNA)
Molecules
% weight
Water
Inorganic ions
Sugars
Amino acids
Nucleotides
Fatty acids
Other small molecules
Macromolecules (proteins, DNA, RNA, polysaccharides)
COMS 4761 --2007
70%
1%
1%
0.4%
0.4%
1%
0.2%
26%
6
3
DNA Structure
COMS 4761 --2007
7
The Central Dogma of Biology
DNA
Transcription
RNA
Translation
Protein
 DNA stores hereditary information
 DNA is transcribed into RNA
 RNA is translated into proteins
 Proteins perform the key functions of cells
COMS 4761 --2007
8
4
DNA Consists of Sequences of Nucleotides
 DNA strands are sequences of nucleotides
Backbone
T
+
T
Sugar Phosphate Base
Nucleotide
A
C
T
T
A
C
G
C
 Bases: Adenine, Guanine, Thymine, Cytosine
 DNA is organized in complementary double strands
 Hydrogen bonds hybridize complementary pairs: AT, CG
5’-end
Hydrogen bonds
3’-end
T
A
G
C
A
T
T
A
T
A
G
C
C
G
COMS 4761 --2007
G
C
9
DNA Forms A Double Helix
Helix full turn: 10.5bp
Vertical hydrogen bonds
support the structure
Major and minor grooves
provide access by proteins
(e.g., transcription factors)
COMS 4761 --2007
10
5
DNA Is Tightly Packed
 DNA is 2m long; needs to fold
into 10-6m nucleus
 Chromatin beads fold around
4 histones
 Transcription needs to unpack
the DNA to copy it
COMS 4761 --2007
11
Sample Bioinformatics Challenges
Sequencing the genome
Discovering sequence similarity
Discovering genes
Analyzing evolutionary relationships
Discovering other important structures
Distinguishing exons from introns
Regulatory structures: (promoters & transcription factors)
Regions expressing micro RNA
….
COMS 4761 --2007
12
6
Transcription
COMS 4761 --2007
13
Schematics
DNA
Transcription
mRNA
Translation
Protein
COMS 4761 --2007
14
7
Overview
A. Assembling transcription complex
B. Transcribing DNA to mRNA
C. Removing introns
COMS 4761 --2007
15
Animation
The Transcription Process
COMS 4761 --2007
16
8
Transcription Details
http://cwx.prenhall.com/horton/medialib/
From PDB
COMS 4761 --2007
17
Transcription Factors
 TFs bind to promoters regions
and to RNA polymerases
 TFs regulate the rate of
transcription (up/down)
 Regulation is yet to be well
understood
COMS 4761 --2007
18
9
Transcription Is Regulated
COMS 4761 --2007
http://cwx.prenhall.com/horton/medialib/
19
Example The Lac Operon
Lac consists of 3 genes; commonly transcribed
Used by bacteria to transport and metabolize lactose
cAMP activates
transcription to
initiate transport
& metabolism of
lactose
COMS 4761 --2007
20
10
Lac Activation
Low-level sugar  generate cAMP
 cAMP  binds with CRP; adjusts its alpha helix to fit the
DNA grooves and binds with it
CRP-cAMP  accelerates polymerase binding
Lac
Lac
COMS 4761 --2007
http://cwx.prenhall.com/horton/medialib/
21
Splicing The Introns
COMS 4761 --2007
http://cwx.prenhall.com/horton/medialib/
22
11
From Genes To
Networks
Regulation is organized in
networks
Top: gene network
regulating the body
development of sea urchin
Middle: a promoter region
Bottom: interaction of two
modules
COMS 4761 --2007
23
Regulatory Networks Can Be Complex
Genetic regulatory network controlling the development of the body plan of the sea urchin embryo
Davidson et al., Science, 295(5560):1669-1678.
COMS 4761 --2007
24
12
Sample Bioinformatics Challenges
 Discovering and analyzing transcription factors
Evolutionary analysis; motifs finding
Discovering the structure of regulatory networks
Analyzing the operations of regulatory networks
Designing synthetic regulatory networks
COMS 4761 --2007
25
Translation
COMS 4761 --2007
26
13
RNA Encodes Protein Sequences
DNA
Transcription
RNA
Translation
Protein
 Proteins are sequences of amino-acids (AA)
 Translation uses RNA sequence as a template to construct AA sequence
 The coding problem:
 Code sequence of 20 amino-acids using 4 nucleic acids
 2 nucleic acids can code only 42=16 amino-acids
 Codon: sequence of 3 nucleic acids; encodes amino acid
 Translation: translate mRNA codons to amino acids
 Start/Stop codons define an open reading frame(ORF)
 Translation requires reading/identifying codons and forming a respective protein
sequence
COMS 4761 --2007
27
The Genetic Code
U
U
C
A
A
G
UUU Phenylalanine
UUC Phe
UUA Leucine
UUG Leu
UCU Serine
UCC Ser
UCA Ser
UCG Ser
UAU Tyrosine
UAC Ty
CUU Leu
CUC Leu
CUA Leu
CUG Leu
CCU Proline
CCC Pro
CCA Pro
CCG Pro
CAU Histidine
CAC His
CAA Glutamine
CAG Gln
CGU Arginine
CGC Arg
CGA Arg
CGG Arg
AAU Asparagine
AAC Asn
AAA Lysine
AAG Lys
AGU Serine
AGC Ser
AGA Arg
AGG Arg
GAU Aspartate
GAC Asp
GAA Glutamate
GAG Glu
GGU Glycine
GGC Gly
GGA Gly
GGG Gly
AUU Isoleucine
AUC Ile
AUA Ile
AUG
G
C
ACU Threonine
ACC Thr
ACA Thr
Methionine ACG Thr
GUU Valine
GUC Val
GUA Val
GUG Val
GCU Alanine
GCC Ala
GCA Ala
GCG Ala
UAA Stop
UAG Stop
COMS 4761 --2007
UGU Cysteine
UGC Cys
UGA Stop
UGG Tryptophan
28
14
tRNA Provides Translation Units
 Anticodon 3’ CGA 5’ binds to
codon
5’ GCU 3’ of mRNA
 It translates GCU to Alanine
COMS 4761 --2007
http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/T/Translation.html
29
Translation Basics
 Initiation:
 Ribosome binds to mRNA; moves
in 5’3’ until it finds Start codon
AUG
 Elongation
 Ribosome recruits tRNA to match
next codon
 tRNA binds its AA into peptide
bond with protein
 Ribosome releases tRNA and
moves to next codob
 Termination
 Until a Stop codon is reached
 Release factor releases
polypeptide from ribosome
COMS 4761 --2007
http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/T/Translation.html
30
15
Animation
Translation of RNA into proteins
COMS 4761 --2007
31
Proteins Are Sequences of Amino Acids
 Proteins are constructed through peptide bonds
 Proteins are folded into complex conformations
 Proteins perform functions by binding
Transcription factors and polymerase bind to DNA
Enzymes bind to molecules to accelerate their reactions
Globins bind to oxygen to transport it
Antibodies bind to pathogens
COMS 4761 --2007
32
16
Example: Hemoglobin
COMS 4761 --2007
33
Sickle-Cell Anemia: A Single Nucleotide Change
Codon 6 in β-globin
COMS 4761 --2007
Sickle structure
34
17
Evolution of β-Globin
(α-globin cluster is coded
by chromosome 16 )
COMS 4761 --2007
35
The Evolution of α-Globin Across Species
COMS 4761 --2007
36
18
Protein Structures
COMS 4761 --2007
37
Protein Structure Is Of Central Importance
 Structure is found through complex crystallography
 X-ray diffraction; NMR
 The holy-grail: compute structure from sequence
 Ab-initio: compute structure directly from sequence
 Homology techniques: use similarity to known proteins
 Structure is conserved across wide variations
 Small number of fold families (α-helix, β-sheets…)
 There are rules (e.g., hydrophobic AA are packed inside)
 Nature folds proteins very fast
 So why is it so difficult to predict structure?
COMS 4761 --2007
38
19
SwissProt vs. PDB Statistics
PDB ~30k structures
COMS 4761 --2007
39
Proteins Interact Via Active Sites
 Protein interactions are defined by active sites
E.g., antibody with pathogen
E.g., drug design
 Proteins use geometry: ligands latch with holes
 Proteins use physics: electrical fields
 How can protein-protein interactions be computed?
COMS 4761 --2007
40
20
Sample Bioinformatics Challenges
Analyzing protein sequence similarity
Evolutionary conservation/changes
Computing structure from sequences
Analyzing structure homologies
Analyzing protein-2-protein interactions
Inferring function from structure
COMS 4761 --2007
41
The Cell Cycle
COMS 4761 --2007
42
21
Cells Operate In Cycles
 G0 Phase
 cell is at rest
 G1 Phase (4hrs)
 Cell either progresses into synthesis or
 leaves cell cycle to differentiate
 S Phase (10hrs)
 DNA Synthesis
 Checkpoint determines integrity of DNA
 G2 Phase (4hrs)
 Cell prepares for Mitosis
 Checkpoint determines integrity of DNA
 DNA is repaired or cell dies (Apoptosis)
 Mitosis (2hrs)
 Chromosomes are separated
 Cell divides
COMS 4761 --2007
43
The Cell Cycle is Regulated
 Transition among
phases is controlled by
a regulatory network
 Checkpoints are used
to assure quality
COMS 4761 --2007
44
22
Evolution
COMS 4761 --2007
45
Optimizing Functionality
 DNA is substantially conserved through evolution
 Evolution = mutation + selection
Mutation = single nucleotide polymorphism (SNP);
duplication of entire DNA segments
mating; recombination
Selection = optimize fitness of species
 Examples
Metabolic nets learn to optimize energy budget (Alon 05)
 Functional similarity Sequence similarity
COMS 4761 --2007
46
23