Download Lecture slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Biochemical cascade wikipedia , lookup

Alternative splicing wikipedia , lookup

Ridge (biology) wikipedia , lookup

Expression vector wikipedia , lookup

Transposable element wikipedia , lookup

Messenger RNA wikipedia , lookup

Gene desert wikipedia , lookup

Gene therapy wikipedia , lookup

Real-time polymerase chain reaction wikipedia , lookup

Gene nomenclature wikipedia , lookup

Paracrine signalling wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Genetic engineering wikipedia , lookup

Secreted frizzled-related protein 1 wikipedia , lookup

Genetic code wikipedia , lookup

Epitranscriptome wikipedia , lookup

Community fingerprinting wikipedia , lookup

Non-coding DNA wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcription factor wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Point mutation wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene wikipedia , lookup

Eukaryotic transcription wikipedia , lookup

RNA-Seq wikipedia , lookup

RNA polymerase II holoenzyme wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Gene expression wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Gene regulatory network wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Transcript
Genes: Regulation and
Structure
Many slides from
various sources,
including S. Batzoglou,
Cells respond to environment
Various external
messages
Heat
Responds to
environmental
conditions
Food
Supply
Genome is fixed – Cells are dynamic
• A genome is static
 Every cell in our body has a copy of same genome
• A cell is dynamic
 Responds to external conditions
 Most cells follow a cell cycle of division
• Cells differentiate during development
Gene regulation
• Gene regulation is responsible for dynamic cell
• Gene expression varies according to:




Cell type
Cell cycle
External conditions
Location
Where gene regulation takes place
• Opening of chromatin
• Transcription
• Translation
• Protein stability
• Protein modifications
Transcriptional Regulation
•
Strongest regulation happens during transcription
•
Best place to regulate:
No energy wasted making intermediate products
•
However, slowest response time
After a receptor notices a change:
1. Cascade message to nucleus
2. Open chromatin & bind transcription factors
3. Recruit RNA polymerase and transcribe
4. Splice mRNA and send to cytoplasm
5. Translate into protein
Transcription Factors Binding to DNA
Transcription
regulation:
Certain transcription
factors bind DNA
Binding recognizes
DNA substrings:
Regulatory motifs
Promoter and Enhancers
• Promoter necessary to start transcription
• Enhancers can affect transcription from afar
Regulation of Genes
Transcription Factor
(Protein)
RNA polymerase
(Protein)
DNA
Regulatory Element
Gene
Regulation of Genes
Transcription Factor
(Protein)
RNA polymerase
DNA
Regulatory Element
Gene
Regulation of Genes
New protein
RNA
polymerase
Transcription Factor
DNA
Regulatory Element
Gene
Example: A Human heat shock protein
--158
SP1
CCAAT
AP2
HSE
CCAAT
SP1
TATA AP2
0
GENE
promoter of heat shock hsp70
• TATA box:
positioning transcription start
• TATA, CCAAT:
constitutive transcription
• GRE:
glucocorticoid response
• MRE:
metal response
• HSE:
heat shock element
Gene expression
DNA
CCTGAGCCAACTATTGATGAA
transcription
RNA
CCUGAGCCAACUAUUGAUGAA
translation
Protein
PEPTIDE
The Genetic Code
Eukaryotes vs Prokaryotes
• “Typical” human &
bacterial cells drawn to
scale.
• Eukaryotic cells are
characterized by
membrane-bound
compartments,
which are absent in
prokaryotes.
Brown Fig 2.1
BIOS Scientific Publishers Ltd, 1999
Prokaryotic genes – searching for ORFs.
- Small genomes have high gene density
Haemophilus influenza – 85% genic
- No introns
- Operons
One transcript, many genes
- Open reading frames (ORF) –
contiguous set of codons, start with Met-codon, ends with
stop codon.
Example of ORFs.
There are six possible ORFs in each sequence for both directions of
transcription.
Eukaryotes vs Prokaryotes
• “Typical” human &
bacterial cells drawn to
scale.
• Eukaryotic cells are
characterized by
membrane-bound
compartments,
which are absent in
prokaryotes.
Brown Fig 2.1
BIOS Scientific Publishers Ltd, 1999
Gene structure
exon1
intron1
exon2
intron2
exon3
transcription
splicing
translation
exon = protein-coding
intron = non-coding
Codon:
A triplet of nucleotides
that is converted to one
amino acid
Gene structure
exon1
intron1
exon2
intron2
exon3
transcription
splicing
translation
exon = coding
intron = non-coding
Finding genes
Exon 1
5’
Start codon
ATG
Intron 1
Exon 2
Intron 2
Splice sites
Exon 3
3’
Stop codon
TAG/TGA/TAA
atg
caggtg
ggtgag
cagatg
ggtgag
cagttg
ggtgag
caggcc
ggtgag
tga
0. We can sequence the mRNA
• Expressed Sequence Tag (EST) sequencing is
expensive
• It has some false positive rates (aberrant splicing)
• The method sequences all RNAs and not just those
that code for genes
• This is difficult for rare genes (those that are
expressed rarely or in low quantities.
• Still this is an invaluable source of information (when
available)
Biology of Splicing
(http://genes.mit.edu/chris/)
1. Consensus splice sites
Donor: 7.9 bits
Acceptor: 9.4 bits
(Stephens & Schneider, 1996)
(http://www-lmmb.ncifcrf.gov/~toms/sequencelogo.html)
2.
Recognize “coding bias”
• Each exon can be in one of three frames
ag—gattacagattacagattaca—gtaag Frame 0
ag—gattacagattacagattaca—gtaag Frame 1
ag—gattacagattacagattaca—gtaag Frame 2
Frame of next exon depends on how many nucleotides are left over
from previous exon
• Codons “tag”, “tga”, and “taa” are STOP
 No STOP codon appears in-frame, until end of gene
 Absence of STOP is called open reading frame (ORF)
• Different codons appear with different frequencies—
coding bias
2.
Recognize “coding bias”
Amino Acid
Isoleucine
Leucine
Valine
Phenylalanine
Methionine
Cysteine
Alanine
Glycine
Proline
Threonine
Serine
Tyrosine
Tryptophan
Glutamine
Asparagine
Histidine
Glutamic acid
Aspartic acid
Lysine
Arginine
Stop codons Stop
SLC
I
L
V
F
M
C
A
G
P
T
S
Y
W
Q
N
H
E
D
K
R
DNA codons
ATT, ATC, ATA
CTT, CTC, CTA, CTG, TTA, TTG
GTT, GTC, GTA, GTG
TTT, TTC
ATG
TGT, TGC
GCT, GCC, GCA, GCG
GGT, GGC, GGA, GGG
CCT, CCC, CCA, CCG
ACT, ACC, ACA, ACG
TCT, TCC, TCA, TCG, AGT, AGC
TAT, TAC
TGG
CAA, CAG
AAT, AAC
CAT, CAC
GAA, GAG
GAT, GAC
AAA, AAG
CGT, CGC, CGA, CGG, AGA, AGG
TAA, TAG, TGA
Can map 61 non-stop codons to frequencies & take log-odds ratios
3.
Genes are “conserved”
Approaches to gene finding
• Homology
 Procrustes
• Ab initio
 Genscan, Genie, GeneID
• Comparative
 TBLASTX, Rosetta
• Hybrids
 GenomeScan, GenieEST, Twinscan, SLAM…
HMMs for single species gene finding:
Generalized HMMs
HMMs for gene finding
intergene
exon
intron
exon
intron
exon
intergene
GTCAGAGTAGCAAAGTAGACACTCCAGTAACGC
GHMM for gene finding
duration
T A A T A T G T C C A C GGG T A T T G A G C A T T G T A C A C GGGG T A T T G A G C A T G T A A T G A A
Exon1
Exon2
Exon3
Observed duration times
Better way to do it: negative binomial
• EasyGene:
Prokaryotic
gene-finder
Larsen TS, Krogh A
• Negative binomial with n = 3
Splice Site Models
• WMM: weight matrix model = PSSM (Staden 1984)
• WAM: weight array model = 1st order Markov (Zhang & Marr 1993)
• MDD: maximal dependence decomposition (Burge & Karlin 1997)
decision-tree like algorithm to take significant pairwise dependencies into
account
Splice site detection
Donor site
5’
3’
Position
%
A
C
G
T
-8 … -2 -1
26
26
25
23
…
…
…
…
0
1
2
… 17
60 9 0 1 54 … 21
15 5 0 1 2 … 27
12 78 99 0 41 … 27
13 8 1 98 3 … 25