* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download On line (DNA and amino acid) Sequence Information
Document related concepts
On line (DNA and amino acid) Sequence Information Lecture 9 Introduction • • • • • • Annotation of genes Basic bioinformatics Databases NCBI home page Query and return results DNA sequence results page Protein sequence results page Bioinformatcs Databases • The Biological data, generated by various labs, is submitted and stored in specific databases is : • The data can be: – Nucleotide: DNA and mRNA (cDNA) – Proteins sequences • The main nucleotide sequence databases are: – United states: Genebank (NCBI) – Europe: Nucleotide sequence database (EMBL) – Japan: DNA databank of Japan. (DDJB) • These databases also contain sequences related to: – Expressed sequence tags (ESTs) small (800 bp) of mRNA that be used to see what genes are expressed… Protein Databases • The main protein databases is: • Uniprot (DB) databases contains data from three related databases sites: – SWISS-PROT (most up-to date information) – Trembl: (translation of coding sequences.) – PIR database [protein information resource] • Both the nucleotide and protein databases contain much more detail than just sequences. The data is generated is referred to gene annotated data. The Annotation of genes • Once the gene sequence’s have been determined then the data must be annotated, This basic annotated data includes: (Klug 2010) – Identify regulatory regions – Identify coding sequences (cds); the exons/ introns (if a sequence; eukaryotic)…. – The amino acid sequence for the gene. – Other organisms where the DNA sequence/ AA sequence is to found – Journals/Reference to where data came from. – Links to other databases that contain information about the gene, Global Sequence 5 Bioinformatics Database • To faciliate finding annotated data about genes and protein information there are a number of sites containing specific search engines; – NCBI has ENTREZ – EMBL has the EBI search page previously SRS engine – The SIB ExPaSy search engine (This is more fosuces on protein related information. ) • Consider the following query: – What is the DNA and amino acid sequence for the following gene: Human BTEB – Type the following into the search text box: – Human[orgamism] AND BTEB[title] NCBI Entrez search page BTEB NCBI Nucleotide Record Coding section of gene The Exon intron structure is also available in graphic form Further information • On the right hand column you will find links to online analytical resources; e.g. BLAST (psiblast) (a tool to search for similar sequences contained in the database): • Information on the amino acid sequence obtained for the CDs of the gene. The text box also provides a link to information on the protein in the uniprot database. An EMBL nucleotide record • Annotated data can also be found in the EMBL database: • BTEB EMBL record.: shows the main record. • Clicking on the “text” link at the top right hand corner will give the essential features of the gene. BTEB-EMBL-EBI_text_record. • An ExPASy database search gives the following information for this gene: Type BTEB and then BTEB and Human The BTEB Protein record A link to a graphic representation of the protein and the relevant annotated data can be found at: BTEB Human Protein Other databases databases • The nucleotide (Genbank and EMBL) and protein (Uniprot) contain the “raw data” and are referred to as “primary databases”. – More specific databases derive data from these and are referred to as secondary database; examples include protein family and sequence similarity databases such as PROSITE and PRINTS – There are databases which contain information about specific organisms such as e. coli using Genome online database (GOLD) Other databases – Databases for specific types of sequences such as those associated with promoters and other regulatory elements. dbEST ; Homologous structure alignment database. – Structural databases from the Protein Data Bank – On-line Mendelian inheritance of man (OMIM) which contains information on human genes and genetic disorders. • The nucleic acids research journal January edition provides up-to-date analysis of current online bioinformatics databases: Nucleic acid research database edition Other important information sources • PUBMED: Literature research: journal articles/ conference proceedings/ books etc. – – – – Search under many fields: keyword, author…. Returns: journal articles/abstracts Two types: general/review. BTEB pubmed search found at: • http://www.ncbi.nlm.nih.gov/pubmed?term=BTEB&cmd=De tailsSearch • The user can register a NCBI account to manage their activity and store findings of: gene searches; pubmed searches…. This information can be download, emailed…. BTEB pubmed search result Exercise • The EMBL-EBI record: BTEB_”text”_record. • The NCBI : BTEB NCBI Nucleotide Record • The DDJB: BTEB flatfile Record • Exercise: write a briefy report comparing and contrasting the core elements of both records: refer to page 8-16 in Bioinformatics: A practical guide to the analysis of genes and proteins 3rd edition ; Book can be found in the library. Exercise • Search for the following gene “DNA” sequence: – Human Leukocyte Elastase gene linear DNA [ hint should be 5292 bp long]. – Retrieve the record and download and save the fasta file.