* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lecture
Whole genome sequencing wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Metagenomics wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Oncogenomics wikipedia , lookup
Point mutation wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Genetic engineering wikipedia , lookup
Ridge (biology) wikipedia , lookup
Genomic library wikipedia , lookup
Transposable element wikipedia , lookup
Copy-number variation wikipedia , lookup
Genomic imprinting wikipedia , lookup
Non-coding DNA wikipedia , lookup
Gene therapy wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Public health genomics wikipedia , lookup
Human Genome Project wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Gene nomenclature wikipedia , lookup
Human genome wikipedia , lookup
History of genetic engineering wikipedia , lookup
Gene expression programming wikipedia , lookup
Pathogenomics wikipedia , lookup
Gene desert wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genome (book) wikipedia , lookup
Minimal genome wikipedia , lookup
Gene expression profiling wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Microevolution wikipedia , lookup
Helitron (biology) wikipedia , lookup
Genome editing wikipedia , lookup
Designer baby wikipedia , lookup
What do you with a whole genome sequence? Translate it into all 6 reading frames…… Identify all of the stop codons..… And the start codons…… Can then identify all Open Reading Frames (ORFs) But are all real genes? Three major prokayotic gene modelers: Generation uses predominantly 6-mer statistics to recognize coding regions; it uses a proximity rule-based start call with ATG and GTG as potential starts. Glimmer uses interpolated Markov models (IMMs) to identify the coding regions; it uses ATG, GTG, and TTG as potential starts. Critica uses blastn to produce alignments from the entire dataset and derives dicodon statistics to recognize coding sequences. It uses an SD sensor with ATG, GTG, and TTG as potential starts. Now what? BLAST genes: To assign functions based on similarity with known genes BLAST Basic Local Alignment Search Tool finds regions of local similarity between sequences >my favorite gene Atgtcgctagctagctsctagctag Database of many gene sequences GenBank is one example Answers the questions— Is there a match? And how good is it? What are the genes doing? Function is assigned based on degree of similarity of an already characterized gene in the database 2 potential problems with this approach Transitive catastrophe Gene A Assigned function based on mutant phenotype or biochemical characterization of protein product Gene B From genome sequence: 70% identity to gene A Gene C From genome sequence: 60% identity to gene B Gene D From genome sequence: 70% identity to gene C But--Gene D has only 20% identity to gene A! Would like to propagate function only to orthologous genes Homolog– genes sharing a common origin note: two genes are homologs or they or not no such thing as %homology or “more homologous” Two main kinds of homologs Orthologs-genes orginating from a single ancestral gene in the last common ancestor of the compared genomes Paralogs-genes related via duplication X,Y,Z are genes in the same family A, B, C are three species Two more complicated cases: Xenologs-genes orginating from a HGT of an ortholog in a distant lineage Pseudoparalogs- homologous genes that appear to paralogs in a single genome analysis but have arisen due to a combination of vertical and lateral descent How to identify orthologs: One way: Reciprocal BLAST analysis >Genome A gene1 AGTGCATGTCCC Database: >Genome A gene 2 Genome B TGTGCGTAGTCCAAA AND >Genome B gene1 GGTTTTTACA Database: >Genome B gene 2 Genome A AAACCTCTCTGA ASK: are two genes each other’s Best BLAST hit? Can be confounded by lineage specific gene loss What if there is nothing at all similar in the database? 20% Call it a “hypothetical” gene If it has a match but that is to another hypothetical gene? 4% 4% 2% 1% 4% 1% 2% 1% 32% Conserved Hypothetical “conserved hypothetical” 25% Hypothetical 4% DNA Replication & Repair Energy Metabolism Lipid Metabolism Amino Acid Metabolism Carbohydrate Metabolism Cofactor Metabolism 1% Nucleotide Metabolism Transcription Translation Transport Unassigned