Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Bioinformatics and Phylogenetic Analysis Edgar Scott Multicampus Bioinformatics Education Specialist What is Bioinformatics Interdisciplinary field that combines principles and techniques from computer science, probability and statistics, and linguistics to the study of genomic and proteomic sequences. Biological database for storing and organizng DNA and protein sequences Computational tools for analyzing sequences Phylogenetic Analysis and Bioinformatics Phylogenetics – study of evolutionary relationships Phylogenetic trees used to represent evolutionary relationships Use of protein or DNA sequences to detect relationships versus morphological characters Bioinformatics provides both sequence repositories and sequence analysis software. Overview Acquiring Data Set Text searching at the National Center for Biotechnology Information (NCBI) Sequence similarity and homology Sequence similarity searching with Basic Local Alignment Search Tool (BLAST) Analyzing Data Set Phylogenetic Analysis with Molecular Evolutionary Genetics Analysis (MEGA) 3.1 software Build multiple sequence alignments of sequences using ClustalW Build phylogenetic trees Text Searching at NCBI NCBI maintains provides molecular information and bioinformatic tools to the scientific community GenBank – an archival DNA and protein sequence database RefSeq – a curated DNA and protein sequence database Entrez Gene – a gene centered database Sequence Similarity and Homology Homology – sequence that share a common ancestral sequence Paralogs – arise via gene duplication Orthologs – arise via speciation event Xenologs – arise via gene transfer Evolutionarily related sequences have similar sequences. Sequence differences correspond to amount of change that has occurred since they last shared a common ancestral sequence. Sequence Alignments Sequence Alignment – a process that identifies a series of characters or character patterns that are in the same order in both sequences. Pairwise Global alignment Pairwise Local alignment Optimal alignment – an alignment between sequences in which the number of matching characters are maximized and the mismatching characters are minimized. Quantifying alignments Alignment score of the optimal alignment Percent identity scores Percent similarity scores Sequence Similarity Searching Basic Local Alignment Search Tool (BLAST) Blastp, Blastn, Blastx, Tblastn, & TblastX Local alignments are reported Expectation Value – the number of times an investigator can expect to find an alignment that has an alignment score as good or better than the alignment score under consideration. Steps to Build a Tree Build a multiple sequence alignment of data set. Analyze multiple sequence alignment using either distance based methods or character based methods. Molecular Evolutionary Genetics Analysis (MEGA) 3.1 Phylogenetic Analysis program Constructs multiple sequence alignment using ClustalW Provides tree building methods Distance based Methods Character based Method UPGMA Neighbor-joining method Minimum Evolution Maximum Parsimony Provides a great help document! Multiple Sequence Alignment Multiple Sequence Alignment – an alignment between three or more sequences. Computationally classified as NP-hard Programs ClustalW – fast, applies a progressive method T-Coffee – slower, applies an advanced progressive method Dialign – slow, applies an iterative method Combine – combines multiple sequence alignments Tree Building methods UPGMA, Neighbor-Joining, Minimum Evolution Distance based methods Analyze the multiple sequence alignment to calculate a distance matrix. Clustering algorithm analyzes the distance matrix to determine which sequences should be clustered. Maximum parsimony Character based method Analyze the multiple sequence alignment to create a tree whose tree length has been minimized. Tree Reliability Bootstrapping – method for assessing the reliability of trees. Steps The original data set is resampled several times (e.g. 1000). For each resampling, a tree is built The trees created from the resampling iterations are compared to the original tree. Review Acquiring Data Set Text searching at the National Center for Biotechnology Information (NCBI) Sequence similarity and homology Sequence similarity searching with Basic Local Alignment Search Tool (BLAST) Analyzing Data Set Phylogenetic Analysis with Molecular Evolutionary Genetics Analysis (MEGA) 3.1 software Build multiple sequence alignments of sequences using ClustalW Build phylogenetic trees