Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Bioinformatics Toolbox Yaohang Li Department of Computer Science North Carolina A&T State University Bioinformatics Toolbox Extends MATLAB to provide an integrated software environment – Genome Analysis – Proteome Analysis Applications – Drug Discovery – Genetic Engineering – Biological Research Functionalities of Bioinformatics Toolbox Data Analysis Functions – – – – – Connecting to Web accessible databases Reading and converting between multiple data formats Determining statistical characteristics of data Manipulating and aligning sequences Modeling patterns in biological sequences using Hidden Markov Model (HMM) profiles – Reading, normalizing, and visualizing microarray data – Creating and manipulating phylogenetic tree data – Interfacing with other bioinformatic software Functionalities of Bioinformatics Toolbox Prototype and Develop Algorithms Visualize Data – – – – Sequence alignments Gene expression data Phylogenetic trees Protein structure analysis Share and Deploy Applications – Create stand-alone applications – GUI interface Installation Required Software – MATLAB – Statistics Toolbox Additional Software – – – – – – Signal Processing Toolbox Image Processing Toolbox Optimization Toolbox Neural Network Toolbox Database Toolbox MATLAB Compiler Data Formats and Databases Web-based databases – GenBank (getgenbank) – GenPept (getgenpept) – European Molecular Biology Laboratory EMBL (getembl) – Protein Sequence Database PIR-PSD (getpir) – Protein Data Bank PDB (getpdb) Raw Data – Read data generated from gene sequencing instruments Reading/Writing Data Formats – Sequence data – Multiply Aligned Sequences – Gene Expression Data from Microarrays Sequence Analysis Sequence Analysis – Find information about a nucleotide or amino acid sequence – Using computational methods Tasks – – – – Identify genes Determine the similarity of two genes Determine the protein coded by a gene Determine the function of a gene by finding a similar gene in another organism with a known function Example – Sequence Statistics – Sequence Alignment Sequence Statistics Task – Starting with a DNA sequence, calculate statistics for the nucleotide content Example: Determining Nucleotide Content – Task Studying the human mitochondrial genome While many genes that code for mitochondrial proteins are found in the cell nucleus, the mitochondrial has genes that code for proteins used to produce energy – Procedure Find the nucleotide sequence for the genome Look at the nucleotide content for the entire sequence Determine open reading frames and extract specific gene sequences Determining Nucleotide Content Step 1: – Use Matlab help browser to explore NCBI website Step 2: – Search NCBI website for information Step 3: – Select a result page Getting Sequence Information into MATLAB MATLAB provides an integrated environment for bringing sequence information into MATLAB Get sequence information from a Web database You can also load the sequence from a MAT file Get information about the sequence Determining Nucleotide Composition Knowledge – Sections of a DNA sequence with a high percent of A+T nucleotides usually indicates intergenic parts of the sequence – Low A+T and higher G+C nucleotide percentages indicate possible genes – High CG dinucleotide content is located before a gene Statistics functions of bioinformatics toolbox – Determine if the sequence has the characteristics of a protein-coding region Determining Nucleotide Composition (II) Count the nucleotide – basecount.basecount(mitochondria) – In the reverse complement of a sequence Basecount(seqrcomplement(mitochondria)) – Show the pie chart Determining Codon Composition Background – Trinucleotides (codon) code for an amino acid – 64 possible codons – Knowing the percentage of codons in a sequence can be helpful when comparing with tables for expected codon usage Bioinformatics toolbox – Count condons in a nucleotide sequence codoncount(mitochondria) Amino Acid Conversion and Composition Determining the relative amino acid composition – Characteristic profile for the protein Amino acid composition Atomic composition Molecular weight Convert a nucleotide sequence to an amino acid sequence Amino Acid Conversion and Composition (cont.) Count the amino acids in the protein sequence – aacount(ND2AASeq, ‘chart’, ‘bar’) Determine the atomic composition and molecular weight of the protein Sequence Alignment Task – Determine the similarity between two sequences Example – Starting with a DNA sequence for a human gene, locate and verify a corresponding gene in a model organism Comparing Amino Acid Sequences Convert the DNA sequence to Amino acid sequences Draw a dot plot comparing human and mouse amino acid sequence Global Alignment Align two amino acid sequences – Using Needleman- Wunsch algorithm DNA Microarray Data Analysis DNA Microarray – A parallel snapshot of gene activities – Simultaneously measure the activity and interactions of genes – Insights into mechanisms of living systems Scientific Tasks – Identification of coexpressed genes – Discovery of sample or gene groups with similar expression patterns – Identification of genes whose expression patterns are highly differentiating with respect to a set of discerned biological entities – Study of gene activity patterns under various stress conditions Microarray Analysis Microarray Data – Research the function of cells – Compare the differences between healthy and diseased tissue – Observe changes with the application of drugs Example – Visualizing Microarray Data – Analyzing Gene Expression Profiles Statistics of Microarray Look at the distribution of data in each of the blocks Other Functions Phylogenetic Tree Tool Protein Structure Analysis Data Visualization