* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lecture 6
Long non-coding RNA wikipedia , lookup
RNA interference wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Non-coding RNA wikipedia , lookup
Oncogenomics wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Ridge (biology) wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Copy-number variation wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Transposable element wikipedia , lookup
Genomic imprinting wikipedia , lookup
Primary transcript wikipedia , lookup
Genomic library wikipedia , lookup
Gene therapy wikipedia , lookup
Genetic engineering wikipedia , lookup
Public health genomics wikipedia , lookup
Gene expression programming wikipedia , lookup
Minimal genome wikipedia , lookup
Gene nomenclature wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Human genome wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Metagenomics wikipedia , lookup
Point mutation wikipedia , lookup
Gene desert wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Pathogenomics wikipedia , lookup
Non-coding DNA wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genome (book) wikipedia , lookup
Gene expression profiling wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genome editing wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genome evolution wikipedia , lookup
Designer baby wikipedia , lookup
Microevolution wikipedia , lookup
Introduction to Bioinformatics II Lecture 6 By Ms. Shumaila Azam • Gene: A sequence of nucleotides coding for protein • Gene Prediction Problem: Determine the beginning and end positions of genes in a genome. Gene Prediction: Computational Challenge aatgcatgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgc taatgcatgcggctatgcaagctgggatccgatgactatgctaagctgggatccgatgacaatgcatgcg gctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggct atgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggatccga tgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcg gctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgc ggctatgcaagctgggatcctgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctg ggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcat gcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctat gctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcgg ctatgctaagctcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgaca atgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctat gctaatgcatgcggctatgctaagctcggctatgctaatgaatggtcttgggatttaccttggaatgctaag ctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatg catgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggc tatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatg cggctatgctaagctcatgcgg Gene! Central Dogma: DNA -> RNA -> Protein DNA CCTGAGCCAACTATTGATGAA transcription RNA CCUGAGCCAACUAUUGAUGAA translation Protein PEPTIDE Gene Prediction • Gene finding is one of the first and most important steps in understanding the genome of a species once it has been sequenced. • In computational biology gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. – protein-coding genes – RNA genes – regulatory regions Gene Prediction • Statistical analysis of the rates of homologous recombination of several different genes could determine their order on a certain chromosome, and information from many such experiments could be combined to create a genetic map specifying the rough location of known genes relative to each other. • Determining that a sequence is functional should be distinguished from determining the function of the gene or its product. – in vivo experimentation through gene knockout – bioinformatics research are making it increasingly possible to predict the function of a gene based on its sequence alone. Extrinsic approaches • In extrinsic (or evidence-based) gene finding systems, the target genome is searched for sequences that are similar to extrinsic evidence in the form of the known sequence of a messenger RNA (mRNA) or protein product. • Given an mRNA sequence, it is trivial to derive a unique genomic DNA sequence from which it had to have been transcribed. • Given a protein sequence, a family of possible coding DNA sequences can be derived by reverse translation of the genetic code. Extrinsic approaches • Once candidate DNA sequences have been determined, it is a relatively straightforward algorithmic problem to efficiently search a target genome for matches, complete or partial, and exact or inexact. • BLAST is a widely used system designed for this purpose. Ab initio approaches • Ab Initio gene prediction is an intrinsic method based on gene content and signal detection. • Because of the inherent expense and difficulty in obtaining extrinsic evidence for many genes, it is also necessary to resort to Ab initio gene finding. • genomic DNA sequence alone is systematically searched for certain tell-tale signs of protein-coding genes. • These signs can be broadly categorized as either signals, specific sequences that indicate the presence of a gene nearby, or content, statistical properties of protein-coding sequence itself. Ab initio approaches (prokaryotes) • In the genomes of prokaryotes, genes have specific and relatively well-understood promoter sequences (signals). • the sequence coding for a protein occurs as one contiguous open reading frame (ORF). • one would expect a stop codon approximately every 20–25 codons, or 60–75 base pairs, in a random sequence. • These characteristics make prokaryotic gene finding relatively straightforward, and well-designed systems are able to achieve high levels of accuracy. Open Reading Frame Finder (Input) Output Ab initio approaches (Eukaryotes) • Ab initio gene finding in eukaryotes, especially complex organisms like humans, is considerably more challenging. • First: the promoter and other regulatory signals in these genomes are more complex and less well-understood. • Two classic examples of signals identified by eukaryotic gene finders are CpG islands and binding sites for a poly(A) tail. • Second: splicing mechanisms Combined approaches • combine extrinsic and ab initio approaches by mapping protein and EST data to the genome to validate ab initio predictions. Comparative genomics approaches • As the entire genomes of many different species are sequenced, a promising direction in current research on gene finding is a comparative genomics approach. • This is based on the principle that the forces of natural selection cause genes and other functional elements to undergo mutation at a slower rate than the rest of the genome. • Genes can thus be detected by comparing the genomes of related species. • This approach was first applied to the mouse and human genomes GeneMarkS