* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Sequence Analysis
Survey
Document related concepts
Metabolic network modelling wikipedia , lookup
Deoxyribozyme wikipedia , lookup
United Kingdom National DNA Database wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Genetic code wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Smith–Waterman algorithm wikipedia , lookup
Helitron (biology) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Point mutation wikipedia , lookup
Multiple sequence alignment wikipedia , lookup
Transcript
Bioinformatics – a definition ? The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology OR Biologists doing “stuff” with computers? Here we consider the use of Bioinformatics tools rather than their design and construction Here we consider the access and analysis of data and information items rather than their generation, storage or annotation Software Tools for Sequence Analysis General Packages: Packages that offer a comprehensive range of bioinformatics tools for sequence analysis. Most researchers would expect to use such packages at some time. Specialised Packages Packages that offer tools for a particular type of analysis. Used intensely by researchers in the relevant area, not at all by everyone else. WWW Resources Tools whose nature inclines them to be primarily accessed over the network. These categorisations are very general Many specialist programs are incorporated into the general packages. Most things can be done at a web site somewhere. Sequence Analysis – an Overview Sequencing Project Management Database Retrieval Restriction Mapping Primer Design Nucleic Acid Sequences DNA/RNA Folding Database Retrieval Nucleic Acid Sequence Analysis Protein Sequences Database Similarity Searching Seeking Coding regions Translation to amino acids Pairwise Sequence Comparison Protein Sequence analysis Prediction of Function Phylogeny Motifs and Patterns Multiple Sequence Alignment Structure prediction Structure analysis Software Tools for Sequence Analysis General Packages: GCG Wisconsin Package Commercial UNIX only WWW and X GUIs Comprehensive Widely available Open source UNIX only Several GUIs (java, WWW, X) Comprehensive Similar structure to the GCG package Windows, MacOS X, UNIX Open source Excellent GUI including interactive graphical output Not comprehensive but allows access to EMBOSS Software Tools for Sequence Analysis General Packages: Commercial Expensive Windows PCs or Macintoshes Good GUIs Public Domain Windows, Macintosh, UNIX Modern intuitive GUI Access remote databases Other options Sequence Analysis – an Overview Sequencing Project Management Database Retrieval Restriction Mapping Primer Design Nucleic Acid Sequences DNA/RNA Folding Database Retrieval Nucleic Acid Sequence Analysis Protein Sequences Database Similarity Searching Seeking Coding regions Translation to amino acids Pairwise Sequence Comparison Protein Sequence analysis Prediction of Function Phylogeny Motifs and Patterns Multiple Sequence Alignment Structure prediction Structure analysis Software Tools for Sequence Analysis Specialised Packages Sequencing Project Management “The Phred - Phrap Package” By Phil Green et al Free academic licence Excellent base call confidence estimation (phred) Excellent large scale contig assembler (phrap) Available by anonymous ftp Excellent GUI Excellent contig editor Excellent finishing tools Simple confidence estimation Contig assembler – not good for big projects BUT phred and phrap can be accessed from Staden GUI Software Tools for Sequence Analysis Specialised Packages DNA/RNA Folding Free for academic use Can be installed locally or run via a WWW page Michael Zuker`s Programs Incorporated into the GCG general package Protein Structure Analysis Nominal fee for academic use LINUX, IRIX, Windows Whatif by Gert Vriend Software Tools for Sequence Analysis Specialised Packages Protein Structure Analysis – for very rich people SYBYL IRIX, HP-UX, LINUX Insight II IRIX, AIX, LINUX Both systems are very impressive @ very expensive Software Tools for Sequence Analysis Specialised Packages Phylogeny Available by anonymous ftp Windows, Macintosh, UNIX PHYLIP Incorporated into the EMBOSS general package Commercial, but reasonable UNIX, VMS, DOS and windows Incorporated into the GCG general package Sequence Analysis – an Overview Sequencing Project Management Database Retrieval Restriction Mapping Primer Design Nucleic Acid Sequences DNA/RNA Folding Database Retrieval Nucleic Acid Sequence Analysis Protein Sequences Database Similarity Searching Seeking Coding regions Translation to amino acids Pairwise Sequence Comparison Protein Sequence analysis Prediction of Function Phylogeny Motifs and Patterns Multiple Sequence Alignment Structure prediction Structure analysis Software Tools for Sequence Analysis WWW Resources Database Retrieval Sequence Retrieval System Retrieves MUCH more than sequences Core elements free to academic sites Bioscience AG Implemented in many places It is possible to integrate analysis tools Elements of SRS are incorporated into EMBOSS Software Tools for Sequence Analysis WWW Resources Database Retrieval Retrieves MUCH more than sequences Access to NCBI databases only Entrez client software available by anonymous ftp Most general packages include tools to access local sequence databases EMBOSS programs can access sequences from remote SRS servers Software Tools for Sequence Analysis Database Similarity Searching WWW Resources Very popular, very widely available Not sensitive – But extremely fast FASTA Popular, widely available Not sensitive – much slower than blast Can be installed locally or run via a WWW page BOTH blast & fasta Available by anonymous ftp (blast, fasta) DNA/Protein query V DNA/Protein database Incorporated into the GCG general package Software Tools for Sequence Analysis Database Similarity Searching WWW Resources Fully sensitive Slow algorithm – fast computers MPsrch Protein V Protein only Major use when blast/fasta fail Exclusively a WWW resource Software Tools for Sequence Analysis WWW Resources Structure prediction Was consensus service now JNet only JNet available by anonymous ftp Older service, similar approach to JNet Burkhard Rost Main element is called PHD Both JPred and PHD work best from aligned protein families Simpler methods predicting from single sequences in most general packages Software Tools for Sequence Analysis WWW Resources Other WWW services General Services: EBI Pasteur Institute And many more Protein sequence analysis Expasy Gene finding genscan at the MIT (Free academic license) Simple gene finding in most general packages Primer design primer3 at the MIT(Available by anonymous ftp) Primer design in most general packages Primer design in EMBOSS is primer3 Databases Database are available from WWW sites and highly interlinked Clinical and Mutation OMIM MGMD Bibliographic PubMed Raw Sequence As accessed for “sequence retrieval” Databases Sequence Databases Contain both raw sequence data and annotation DNA Sequences (European Molecular Biology Laboratory) GenBank (NCBI) Refseq (NCBI) DNA Data Bank of Japan Protein Sequences Refseq (NCBI) PIR Trembl (GenPept) Databases Database are available from WWW sites and highly interlinked Clinical and Mutation OMIM MGMD Bibliographic PubMed Raw Sequence As accessed for “sequence retrieval” Alignments and Patterns As generated by analysis software Databases Alignments and Patterns Alignments Aligned protein families Comprised of a number of sections Aligned protein domains Automatically generated from protein sequence databases Conserved “blocks” of protein alignments Used to compute scoring schemes for protein comparisons Databases Alignments and Patterns Patterns Patterns are largely derived from the conserved portions of aligned protein families Representations of single motifs Now comprised of both simple patterns and HMM profiles Representations of patterns of motifs (fingerPRINTS) Databases Database are available from WWW sites and highly interlinked Clinical and Mutation OMIM MGMD Bibliographic PubMed Raw Sequence As accessed for “sequence retrieval” Alignments and Patterns As generated by analysis software Structural PDB Integrated Ensembl The End. [email protected]