* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Genomics of sensory systems
Biochemistry wikipedia , lookup
Magnesium transporter wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Biosynthesis wikipedia , lookup
Gene expression wikipedia , lookup
Genomic imprinting wikipedia , lookup
Metalloprotein wikipedia , lookup
Interactome wikipedia , lookup
Non-coding DNA wikipedia , lookup
Multilocus sequence typing wikipedia , lookup
Proteolysis wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Ridge (biology) wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Homology modeling wikipedia , lookup
Genetic code wikipedia , lookup
Protein structure prediction wikipedia , lookup
Gene expression profiling wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Lecture #4 : Comparing genes 9/14/09 This week Homework #2 due on Wed Email with questions Email me answers or hand in in class Wed - I will be at Dept of Biology retreat Lecture will be given by Kelly O’Quin - expert in phylogenetics He will go over homework so it must be done before class Questions for today 0. More BLAST 1. Where do we get high quality gene sequences? 2. How do genes evolve? 3. How do we compare genes? How to find genes Start with genes which are known from model organisms Use these to pull out genes from genomes Compare genes to learn about sensory evolution Blast - Genbank What database do you want to search? What do you want to compare? What program do you want to do the searching? Types of blast queries Query Database Type Nucleotide Nucleotide Blastn, Megablast, Discont megablast Protein Protein Blastp, Psi-blast, Phiblast Translated nucleotide Protein Protein Blastx Translated nucleotide Translated nucleotide Tblastn Translated nucleotide Tblastx Defaults Database Program Confirm Nucleotide BLAST = DNA nucleotide query vs nucleotide database Choices for programs Megablast Highly similar sequences >95% Word length 28 Discontiguous megablast Pretty similar seqs Word length 11 Blastn Dissimilar seqs Word length 11 Translated blast = protein query vs translated database BLAST a genome Request ID AWJ4D4B7012 BLASTing is fun This is meant to be enjoyable Be a genome explorer Find out what kind of data is out there Find out what kind of data isn’t there QUESTIONS????? Q1. There is so much data in Genbank. How do you find GOOD data? Example Bovine rhodopsin - 1st G protein coupled receptor to be sequenced Search Genbank with text 49 entries Bovine opsin Bovine rhodopsin Searching for genes Searching by text is fraught with peril Genbank has too many links Pull up many things that are not what you want BLAST is better approach NCBI has also made records which combine all similar sequences into one NCBI has done some of the work They have hand-curated data for some species to make a set of reference sequences Nucleotide sequences NMxxxxxxx Protein sequences NPxxxxxx For human rhodopsin NM000539 NP000530 These are the gold standard for sequences Homologene Homologs Two genes which arise in the common ancestor of two organisms and are passed down Implies genes perform same function in two organisms Therefore they can be compared to learn about evolution These 4 primates have many genes which are homologs and have been passed down from primate ancestor Human Chimp Macaque Bushbaby Homologene search for rhodopsin Homologene Three primary sequence portals: 1. NCBI 3. DNA database of Japan 2. Ensembl - European Bioinformatics Institute (EBI) Select just genes Scroll down to find the gene you want Location Links to transcript and protein Orthologues are predicted and linked OMIM - Online mendelian inheritance in man Good places to find genes Model organisms: NCBI homologene Genes from models and other organisms: Sanger Ensembl gene families NOTE: These are often predicted from genome sequences If there is a sequence in NCBI homologene, it may be different (and more accurate) than Sanger predictions OMIM is a good reference Q2. How do genes change through time? Change in actual sequence Mutation Recombination Change in frequency of a sequence Selection - “survive” better Drift - get passed on by chance Migration - move between populations Mutation vs selection Mutation = sequence change ATGCCGTGACGT ATGCCTTGACGT Selection/drift/migration = sequence frequency changes across a number of individuals ATGTG ATGTG ATGTG ATGTG ATGTG ATGTG ATGTG ATGTG ATGTG ATGTG ATGTT ATGTG ATGTG ATGTG ATGTG ATGTG ATGTG ATGTG ATGTG ATGTG ATGTT ATGTT ATGTT ATGTT ATGTT ATGTT ATGTT ATGTT ATGTT ATGTT Evolution as tinkerer Changes are typically small Mutation is source of new sequence Not all mutations are created equal Some occur more often than others Other forces shift frequency of particular sequence Triplet amino acid code F, phe F, phe L, leu L, leu TTT TTC TTA TTG S, S, S, S, ser ser ser ser TCT TCC TCA TCG Y, tyr Y, tyr O, stop B, stop TAT TAC TAA TAG C, cys C, cys J, stop W, trp TGT TGC TGA TGG L, L, L, L, leu leu leu leu CTT CTC CTA CTG P, P, P, P, pro pro pro pro CCT CCC CCA CCG H, his H, his Q, gln Q, gln CAT CAC CAA CAG R, R, R, R, arg arg arg arg CGT CGC CGA CGG I, ile I, ile I, ile M, met ATT ATC ATA ATG T, T, T, T, thr thr thr thr ACT ACC ACA ACG N, N, K, K, asn asn lys lys AAT AAC AAA AAG S, S, R, R, ser ser arg arg AGT AGC AGA AGG V, V, V, V, GTT GTC GTA GTG A, A, A, A, ala ala ala ala GCT GCC GCA GCG D, D, E, E, asp asp glu glu GAT GAC GAA GAG G, G, G, G, gly gly gly gly GGT GGC GGA GGG val val val val Mutation causes nucleotide change What about AA sequence? Synonymous change Syn = same AA stays same Nonsynonymous Not same AA changes change Amino acid code F, phe F, phe L, leu L, leu TTT TTC TTA TTG S, S, S, S, ser ser ser ser TCT TCC TCA TCG Y, tyr Y, tyr O, stop B, stop TAT TAC TAA TAG C, cys C, cys J, stop W, trp TGT TGC TGA TGG L, L, L, L, leu leu leu leu CTT CTC CTA CTG P, P, P, P, pro pro pro pro CCT CCC CCA CCG H, his H, his Q, gln Q, gln CAT CAC CAA CAG R, R, R, R, arg arg arg arg CGT CGC CGA CGG I, ile I, ile I, ile M, met ATT ATC ATA ATG T, T, T, T, thr thr thr thr ACT ACC ACA ACG N, N, K, K, asn asn lys lys AAT AAC AAA AAG S, S, R, R, ser ser arg arg AGT AGC AGA AGG V, V, V, V, GTT GTC GTA GTG A, A, A, A, ala ala ala ala GCT GCC GCA GCG D, D, E, E, asp asp glu glu GAT GAC GAA GAG G, G, G, G, gly gly gly gly GGT GGC GGA GGG val val val val Amino acid (AA) types Non-polar A, F, G, I, L, M, P, V, W Polar N, Q, S, T, Y Charged, + H, K, R Charged, D, E Other C Often changing AA within a group does not affect protein function Selection Stabilizing selection - Acts to keep protein function the same Synonymous change more frequent than nonsynonymous Amino acid changes occur within group much more common than between Non polar nonpolar Polar polar Similarity matrix A = alanine C = cysteine D = aspartic acid E = glutamic acid F = phenylalanine G = glycine H = histidine Comparing sequences Can do at either nucleotide or AA level Gather sequences from a bunch of different organisms Need to align them so that sites which perform the same function can be compared Aligning sequences Sequences may differ in length Often have differences at amino- or carboxyterminus of the protein Need a way to align parts of protein that are performing the same function Example - RH2 opsin in fishes Goldfish MNGTEGNNFYVPLSNR Medaka MENGTEGKNFYIPMNNR Zebrafish MNGTEGSNFYIPMSNR Killifish MGYGPNGTEGNNFYIPMSNK TroutMQNGTEGSNFYIPMSNR Halibut MVWDGGIEPNGTEGKNFYIPMSNR Cod MRMEANGTEGKNFYIPMSNR Tetraodon MVWDGGIEPNGTEGKNFYIPMSNR Align sequences Zebrafish Trout Medaka Cod Halibut Tetraodon Goldfish Killifish * identical : conserved . semi-conserved M--------NGTEGSNFYIPMSNR M------Q-NGTEGSNFYIPMSNR M------E-NGTEGKNFYIPMNNR M----RMEANGTEGKNFYIPMSNR MVWDGGIEPNGTEGKNFYIPMSNR MVWDGGIEPNGTEGKNFYIPMSNR M--------NGTEGNNFYVPLSNR M---GYG-PNGTEGNNFYIPMSNK * *****.***:*:.*: Amino acid (AA) types Non-polar A, F, G, I, L, M, P, V, W Polar N, Q, S, T, Y Charged, + H, K, R Charged, D, E Other C Often changing AA within a group does not affect protein function