Download Matlab_Bioinformatics_Toolbox

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsatellite wikipedia , lookup

Helitron (biology) wikipedia , lookup

Transcript
Bioinformatics Toolbox
Yaohang Li
Department of Computer Science
North Carolina A&T State University
Bioinformatics Toolbox

Extends MATLAB to provide an integrated
software environment
– Genome Analysis
– Proteome Analysis

Applications
– Drug Discovery
– Genetic Engineering
– Biological Research
Functionalities of
Bioinformatics Toolbox

Data Analysis Functions
–
–
–
–
–
Connecting to Web accessible databases
Reading and converting between multiple data formats
Determining statistical characteristics of data
Manipulating and aligning sequences
Modeling patterns in biological sequences using
Hidden Markov Model (HMM) profiles
– Reading, normalizing, and visualizing microarray data
– Creating and manipulating phylogenetic tree data
– Interfacing with other bioinformatic software
Functionalities of
Bioinformatics Toolbox

Prototype and Develop Algorithms
 Visualize Data
–
–
–
–

Sequence alignments
Gene expression data
Phylogenetic trees
Protein structure analysis
Share and Deploy Applications
– Create stand-alone applications
– GUI interface
Installation

Required Software
– MATLAB
– Statistics Toolbox

Additional Software
–
–
–
–
–
–
Signal Processing Toolbox
Image Processing Toolbox
Optimization Toolbox
Neural Network Toolbox
Database Toolbox
MATLAB Compiler
Data Formats and Databases

Web-based databases
– GenBank (getgenbank)
– GenPept (getgenpept)
– European Molecular Biology Laboratory EMBL
(getembl)
– Protein Sequence Database PIR-PSD (getpir)
– Protein Data Bank PDB (getpdb)

Raw Data
– Read data generated from gene sequencing instruments

Reading/Writing Data Formats
– Sequence data
– Multiply Aligned Sequences
– Gene Expression Data from Microarrays
Sequence Analysis

Sequence Analysis
– Find information about a nucleotide or amino acid
sequence
– Using computational methods

Tasks
–
–
–
–

Identify genes
Determine the similarity of two genes
Determine the protein coded by a gene
Determine the function of a gene by finding a similar
gene in another organism with a known function
Example
– Sequence Statistics
– Sequence Alignment
Sequence Statistics

Task
– Starting with a DNA sequence, calculate statistics for
the nucleotide content

Example: Determining Nucleotide Content
– Task


Studying the human mitochondrial genome
While many genes that code for mitochondrial proteins are
found in the cell nucleus, the mitochondrial has genes that code
for proteins used to produce energy
– Procedure



Find the nucleotide sequence for the genome
Look at the nucleotide content for the entire sequence
Determine open reading frames and extract specific gene
sequences
Determining Nucleotide
Content

Step 1:
– Use Matlab help browser to explore NCBI website

Step 2:
– Search NCBI website for information

Step 3:
– Select a result page
Getting Sequence Information
into MATLAB

MATLAB provides an integrated environment for
bringing sequence information into MATLAB
 Get sequence information from a Web database

You can also load the sequence from a MAT file
 Get information about the sequence
Determining Nucleotide Composition

Knowledge
– Sections of a DNA sequence
with a high percent of A+T
nucleotides usually indicates
intergenic parts of the
sequence
– Low A+T and higher G+C
nucleotide percentages
indicate possible genes
– High CG dinucleotide
content is located before a
gene

Statistics functions of
bioinformatics toolbox
– Determine if the sequence
has the characteristics of a
protein-coding region
Determining Nucleotide Composition (II)

Count the nucleotide
– basecount.basecount(mitochondria)
– In the reverse complement of a sequence
 Basecount(seqrcomplement(mitochondria))
– Show the pie chart
Determining Codon Composition

Background
– Trinucleotides (codon) code for
an amino acid
– 64 possible codons
– Knowing the percentage of
codons in a sequence can be
helpful when comparing with
tables for expected codon usage

Bioinformatics toolbox
– Count condons in a nucleotide
sequence

codoncount(mitochondria)
Amino Acid Conversion and
Composition

Determining the relative amino acid composition
– Characteristic profile for the protein




Amino acid composition
Atomic composition
Molecular weight
Convert a nucleotide sequence to an amino acid
sequence
Amino Acid Conversion and
Composition (cont.)

Count the amino acids in the protein sequence
– aacount(ND2AASeq, ‘chart’, ‘bar’)

Determine the atomic composition and molecular
weight of the protein
Sequence Alignment

Task
– Determine the similarity between two
sequences

Example
– Starting with a DNA sequence for a human
gene, locate and verify a corresponding gene in
a model organism
Comparing Amino Acid Sequences

Convert the DNA sequence to Amino acid
sequences

Draw a dot plot comparing human and
mouse amino acid sequence
Global
Alignment

Align two amino acid
sequences
– Using Needleman-
Wunsch algorithm
DNA Microarray Data Analysis

DNA Microarray
– A parallel snapshot of gene activities
– Simultaneously measure the activity and
interactions of genes
– Insights into mechanisms of living systems

Scientific Tasks
– Identification of coexpressed genes
– Discovery of sample or gene groups with similar
expression patterns
– Identification of genes whose expression patterns
are highly differentiating with respect to a set of
discerned biological entities
– Study of gene activity patterns under various
stress conditions
Microarray Analysis

Microarray Data
– Research the function of cells
– Compare the differences
between healthy and diseased
tissue
– Observe changes with the
application of drugs

Example
– Visualizing Microarray Data
– Analyzing Gene Expression
Profiles
Statistics of Microarray

Look at the distribution of data in each of
the blocks
Other Functions

Phylogenetic Tree Tool
 Protein Structure Analysis
 Data Visualization