* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lecture slides
Biochemical cascade wikipedia , lookup
Alternative splicing wikipedia , lookup
Ridge (biology) wikipedia , lookup
Expression vector wikipedia , lookup
Transposable element wikipedia , lookup
Messenger RNA wikipedia , lookup
Gene desert wikipedia , lookup
Gene therapy wikipedia , lookup
Real-time polymerase chain reaction wikipedia , lookup
Gene nomenclature wikipedia , lookup
Paracrine signalling wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Genetic engineering wikipedia , lookup
Secreted frizzled-related protein 1 wikipedia , lookup
Genetic code wikipedia , lookup
Epitranscriptome wikipedia , lookup
Community fingerprinting wikipedia , lookup
Non-coding DNA wikipedia , lookup
Gene expression profiling wikipedia , lookup
Transcription factor wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Point mutation wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Eukaryotic transcription wikipedia , lookup
RNA polymerase II holoenzyme wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Gene expression wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Gene regulatory network wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou, Cells respond to environment Various external messages Heat Responds to environmental conditions Food Supply Genome is fixed – Cells are dynamic • A genome is static Every cell in our body has a copy of same genome • A cell is dynamic Responds to external conditions Most cells follow a cell cycle of division • Cells differentiate during development Gene regulation • Gene regulation is responsible for dynamic cell • Gene expression varies according to: Cell type Cell cycle External conditions Location Where gene regulation takes place • Opening of chromatin • Transcription • Translation • Protein stability • Protein modifications Transcriptional Regulation • Strongest regulation happens during transcription • Best place to regulate: No energy wasted making intermediate products • However, slowest response time After a receptor notices a change: 1. Cascade message to nucleus 2. Open chromatin & bind transcription factors 3. Recruit RNA polymerase and transcribe 4. Splice mRNA and send to cytoplasm 5. Translate into protein Transcription Factors Binding to DNA Transcription regulation: Certain transcription factors bind DNA Binding recognizes DNA substrings: Regulatory motifs Promoter and Enhancers • Promoter necessary to start transcription • Enhancers can affect transcription from afar Regulation of Genes Transcription Factor (Protein) RNA polymerase (Protein) DNA Regulatory Element Gene Regulation of Genes Transcription Factor (Protein) RNA polymerase DNA Regulatory Element Gene Regulation of Genes New protein RNA polymerase Transcription Factor DNA Regulatory Element Gene Example: A Human heat shock protein --158 SP1 CCAAT AP2 HSE CCAAT SP1 TATA AP2 0 GENE promoter of heat shock hsp70 • TATA box: positioning transcription start • TATA, CCAAT: constitutive transcription • GRE: glucocorticoid response • MRE: metal response • HSE: heat shock element Gene expression DNA CCTGAGCCAACTATTGATGAA transcription RNA CCUGAGCCAACUAUUGAUGAA translation Protein PEPTIDE The Genetic Code Eukaryotes vs Prokaryotes • “Typical” human & bacterial cells drawn to scale. • Eukaryotic cells are characterized by membrane-bound compartments, which are absent in prokaryotes. Brown Fig 2.1 BIOS Scientific Publishers Ltd, 1999 Prokaryotic genes – searching for ORFs. - Small genomes have high gene density Haemophilus influenza – 85% genic - No introns - Operons One transcript, many genes - Open reading frames (ORF) – contiguous set of codons, start with Met-codon, ends with stop codon. Example of ORFs. There are six possible ORFs in each sequence for both directions of transcription. Eukaryotes vs Prokaryotes • “Typical” human & bacterial cells drawn to scale. • Eukaryotic cells are characterized by membrane-bound compartments, which are absent in prokaryotes. Brown Fig 2.1 BIOS Scientific Publishers Ltd, 1999 Gene structure exon1 intron1 exon2 intron2 exon3 transcription splicing translation exon = protein-coding intron = non-coding Codon: A triplet of nucleotides that is converted to one amino acid Gene structure exon1 intron1 exon2 intron2 exon3 transcription splicing translation exon = coding intron = non-coding Finding genes Exon 1 5’ Start codon ATG Intron 1 Exon 2 Intron 2 Splice sites Exon 3 3’ Stop codon TAG/TGA/TAA atg caggtg ggtgag cagatg ggtgag cagttg ggtgag caggcc ggtgag tga 0. We can sequence the mRNA • Expressed Sequence Tag (EST) sequencing is expensive • It has some false positive rates (aberrant splicing) • The method sequences all RNAs and not just those that code for genes • This is difficult for rare genes (those that are expressed rarely or in low quantities. • Still this is an invaluable source of information (when available) Biology of Splicing (http://genes.mit.edu/chris/) 1. Consensus splice sites Donor: 7.9 bits Acceptor: 9.4 bits (Stephens & Schneider, 1996) (http://www-lmmb.ncifcrf.gov/~toms/sequencelogo.html) 2. Recognize “coding bias” • Each exon can be in one of three frames ag—gattacagattacagattaca—gtaag Frame 0 ag—gattacagattacagattaca—gtaag Frame 1 ag—gattacagattacagattaca—gtaag Frame 2 Frame of next exon depends on how many nucleotides are left over from previous exon • Codons “tag”, “tga”, and “taa” are STOP No STOP codon appears in-frame, until end of gene Absence of STOP is called open reading frame (ORF) • Different codons appear with different frequencies— coding bias 2. Recognize “coding bias” Amino Acid Isoleucine Leucine Valine Phenylalanine Methionine Cysteine Alanine Glycine Proline Threonine Serine Tyrosine Tryptophan Glutamine Asparagine Histidine Glutamic acid Aspartic acid Lysine Arginine Stop codons Stop SLC I L V F M C A G P T S Y W Q N H E D K R DNA codons ATT, ATC, ATA CTT, CTC, CTA, CTG, TTA, TTG GTT, GTC, GTA, GTG TTT, TTC ATG TGT, TGC GCT, GCC, GCA, GCG GGT, GGC, GGA, GGG CCT, CCC, CCA, CCG ACT, ACC, ACA, ACG TCT, TCC, TCA, TCG, AGT, AGC TAT, TAC TGG CAA, CAG AAT, AAC CAT, CAC GAA, GAG GAT, GAC AAA, AAG CGT, CGC, CGA, CGG, AGA, AGG TAA, TAG, TGA Can map 61 non-stop codons to frequencies & take log-odds ratios 3. Genes are “conserved” Approaches to gene finding • Homology Procrustes • Ab initio Genscan, Genie, GeneID • Comparative TBLASTX, Rosetta • Hybrids GenomeScan, GenieEST, Twinscan, SLAM… HMMs for single species gene finding: Generalized HMMs HMMs for gene finding intergene exon intron exon intron exon intergene GTCAGAGTAGCAAAGTAGACACTCCAGTAACGC GHMM for gene finding duration T A A T A T G T C C A C GGG T A T T G A G C A T T G T A C A C GGGG T A T T G A G C A T G T A A T G A A Exon1 Exon2 Exon3 Observed duration times Better way to do it: negative binomial • EasyGene: Prokaryotic gene-finder Larsen TS, Krogh A • Negative binomial with n = 3 Splice Site Models • WMM: weight matrix model = PSSM (Staden 1984) • WAM: weight array model = 1st order Markov (Zhang & Marr 1993) • MDD: maximal dependence decomposition (Burge & Karlin 1997) decision-tree like algorithm to take significant pairwise dependencies into account Splice site detection Donor site 5’ 3’ Position % A C G T -8 … -2 -1 26 26 25 23 … … … … 0 1 2 … 17 60 9 0 1 54 … 21 15 5 0 1 2 … 27 12 78 99 0 41 … 27 13 8 1 98 3 … 25