Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Genetics I (prokaryotes) IT Carlow Bioinformatics September 2006 Biochemistry • How biology works • Mechanisms Genetics • How things are inherited • Why you are like your parents but also different • Where genes, pathways, wings, flippers come from • How things develop from zygote • But it’s all molecular biology nowadays Genetics • The interesting stuff • “Nothing in biology makes sense except in the light of evolution” Dobzhansky • “Nothing in bioinformatics makes sense except in the light of evolution” Higgs & Attwood • Evolution = change in gene frequency over time • What is gene? What is frequency? What is change? What is time? What is life? Genome size and differences • • • • • • • Species Genome size BP Human 3,000,000,000 Yeast 16,000,000 E.coli 4,000,000 Gene = 1000bp = 300AA All descended from LUCA How? Genome size genes 25,000 6,500 4,000 DNA • Double helix – – – – 10Å radius (1 or better 1.2nm) 34Å for single turn 3.4Å for single base (0.34nm) 10 bp per turn • E.coli 4Mb how many Å3/nm3 of DNA? • Size of E.coli? About 1x2m • Thinking exercise: % E.coli vol is DNA? Mutation • • • • • • DNA damage from UV light, coal-tar Replicative failure (DNApol is good but..) Humans 2.5 *10-8/bp/cell div E.coli 1*10-7/bp/div Humans/Chimps 1% diff but 35m diffs You have 1014 cells now from start of 1 Bases A G Purines R big Purines Are biG T Pyrimidines Y CUT tinY C Base pairs Tm! Weak Strong Mutation 2 • E.coli has mutations • Humans have somatic and germline mutations • Point mutation – missense • transition R – R, Y – Y, C – T, G – A • transversion R – Y A-T C-G C-A – nonsense TGA, TAG, TAA – Non-coding • Splice, 5’ 3’, Intron Mutation 3 • Insertions and deletions – – – – One bp is sometimes called “point” Frameshift ATGCCCTGCAATGAC ATGCCCCTGCAATGAC Ooops Methylation of C Mutations 4 • Chromosomal rearrangement – Inversion – Translocation • Chromosome copy – Aneuploidy (Down’s) – Polyploidy (tetraploid) – Whole genome duplication WGD • Mutational hotspots • Repeats GCGCGCGCGC slip = microsatellites Genetic code The “Universal” Genetic Code. Phe UUU UUC Leu UUA UUG Ser UCU UCC UCA UCG Tyr UAU UAC ter UAA ter UAG Cys UGU UGC ter UGA Trp UGG Leu CUU CUC CUA CUG Pro CCU CCC CCA CCG His CAU CAC Gln CAA CAG Arg CGU CGC CGA CGG Ile AUU AUC AUA Met AUG Thr ACU ACC ACA ACG Asn AAU AAC Lys AAA AAG Ser AGU AGC Arg AGA AGG Val GUU GUC GUA GUG Ala GCU GCC GCA GCG Asp GAU GAC Glu GAA GAG Gly GGU GGC GGA GGG Willie Taylor’s AAs Mutations 5 • Synonymous usually 3rd base • Non-synonymous – Conservative AAA – AGA Lys - Arg – Radical AAA – UAU Lys - Tyr • CpG methylation mutational hotspot • CpG islands 5’ mamm housekeeping genes Mutations Quiz T A U C G 3’ 5’ Exon Intron Which mutations AUTCG are most likely to be baaaad? Mutations & evolution • Most bacteria have a characteristic mutational bias. • This will give a species specific G+C ratio – E.coli 50% – B.subtilis 40% – Extreme Mycoplasma, Micrococcus • Many bacteria have strand bias because the Okazaki enzymes have a different bias • Hi GC and Lo GC gram positive. Quiz “answers” • • • • • • • Location 5’ Synon NonSyn Intron 3’ Rate (subst/site/year*10-9) 2.36 4.65 0.88 Synon Not neutral? 3.7 4.46 (Pseudogene) 4.85 Substitution • A mutation that’s been sieved by selection • Selection is a population/probability term • Probability that a mutation will a) survive? b) become polymorphism? c) replace existing? • Depends on population size Bacterial genes/genomes • E.coli about 4000 genes, 4 Mbases • Tightly packed, usually no overlap – Viruses ++ tightly packed, overlapping genes • Origin of replication – Usually near dnaA • DNA polymerase – – – – Binds and copies Needs gyrase, helicase etc. 5’-3’ strand = read through 3’-5’ strand read in chunks: Okazaki fragments Operons • Jacob and Monod (and Lwoff) • Lac operon i p o z y a • lacZ lacY lacA induced and transcribed together • lacI adjacent but separate transcript • MolBiol? Measure mRNA levels, -gal • Evol? Co-transcription for better control Odd operons • Easy explanation when only E. coli and B. subtilis available • But M. jannaschii (first archaea sequenced) – Linked, cotranscribed but biochemically mad • Fallout from genome sequencing • tRNA complement informs expression Bioinformatic consequences • RNA polymerase needs binding site • Promoter site upstream from transcrip start • -35 -10 TTGACANNNNNNNNNNNNNNNNNTATATT • • • • Site directed mutagenesis can parse the info Remember lacZ,Y,A cotranscribed Then 3’ trailer after last stop codon Try to think of 3-D picture Gene structure • • • • • • Upstream control regions Start codon Open Reading Frame (ORF) Stop codon UGA UAG UAA 3’ downstream So gene prediction is “easy” Consequences • This view of how the process works – Colours our view of sequences • Central dogma: – DNA makes RNA makes PROTEIN makes everything else • RNA makes DNA means inheritance of acquired characteristics (Lamarck). • Leads to a particular definition of “gene” Translation • • • • • • • • Transcription gives you mRNA Translation gives you protein In bacteria transcrp transl simultaneous Ribosome – complex (cottageloaf) of two subunits 50S and 30S = 70S 30S 21 proteins rpsX and 16S RNA 50S 34 proteins rplX and 23S+5S RNA Needs tRNA, mRNA Ribosome binding site RBS upstream from ATG Summary • What we know about the genetics can help us identify genes bioinformatically – DNA signatures (RBS, Promoter) – Start - ORF - stop pattern – Consistent codon usage • Have we predicted a real gene? – Is it present as mRNA?