Download Tools for BioInformatics - Computer Science

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

List of types of proteins wikipedia , lookup

Protein wikipedia , lookup

Protein phosphorylation wikipedia , lookup

Magnesium transporter wikipedia , lookup

Protein (nutrient) wikipedia , lookup

Protein moonlighting wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

JADE1 wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Protein structure prediction wikipedia , lookup

Homology modeling wikipedia , lookup

Tools for BioInformatics
Eileen Kraemer
Computer Science Dept.
The University of Georgia
Types of Tools
Lab samples
Production Sequencing Software
Sequence data
Databases, Database Search Tools
Production Sequencing
used throughout the sequencing
procedure from preparation of the DNA
through to the finishing of clones.
Example: Sanger Centre,
Shotgun Sequencing of typical human
Data collection
Transfer to UNIX
Gel image processing
Sequence pre-processing
DNA Fragment Assembly
Finishing Services
Quality Control and Assesment
& more -- see links at:
See: for both:
Non-human and human genome projects
PomBase is a compilation of data relating to
the organism Schizosaccharomyces pombe
 Wormpep predicted proteins from the C.
elegans genome sequencing project.
Annotation Tools
 Annotation of sequences with info such as homologies
to known genes, possible gene locations, gene signals
such as promoters, etc.
 Example: Genotator (Nomi Harris) -- developing a
workbench for automatic sequence annotation and
annotation viewing and editing. The goal is to run a
series of sequence analysis tools and display the results
in such a way that the various predictions can be
compared, and researcher makes decision of what to
Database Software
ACEDB is an acronym for "A
Caenorhabditis elegans DataBase". It can
refer to a database and data concerning
the nematode C. elegans, or to the
database software alone.
Other groups may adapt existing, or
create own. For example, David Hall’s
workflow project at UGA for Neurospora
Types of Tools
Gene Prediction
Caution: accuracy <= ~ 70%
Good review: Snyder and Stormo,
(chapter 11 of the book Nucleic Acid and Protein Sequence
Analysis: A Practical Approach, second edition, 1994. )
Gene Prediction
GRAIL(Xgrail, JavaGrail, etc.)
Fexon, Hexon
Genefinder (University of Washington)
Predicts coding regions
Uses a neural network which combines a series
of coding prediction algorithms.
recognizes coding potential within a fixed
size (100 base) window; evaluates coding
potential without looking for additional
later versions incorporate additional info
 human and other species
Based on inhomogeneous Markov models
predicts coding and non-coding regions
based on statistical patterns in
dinucleotide frequences … more next
week from Mark B.
Sequence Alignment
Pairwise alignments
Multiple sequence alignments
Pairwise Alignments
 SIM (Protein only) - k best non-intersecting alignments
 ALIGN - optimal global alignment with no short-cuts
 LALIGN - calculates the N-best local alignments
 LFASTA - local similarity searches showing local
alignments (EERIE)
 BLAST 2 - local alignment using BLAST (NCBI)
 LAP2 - local DNA to protein alignment with LAP2 (MTU)
Multiple Sequence
 ClustalW 1.7 (DNA/Protein) - Global progressive (BCM)
 CAP Sequence Assembly (DNA) - Contig Assembly
 MAP (DNA/Protein) - Global progressive in linear space
 PIMA 1.4 (Protein only) - Pattern-Induced (local) Multiple
Alignment (BCM)
 MSA 2.1 (Protein only) - Near-optimal sum-of-pairs
global (WashU)
 BLOCK MAKER (Protein only) - Finds conserved blocks
in seq sets (FHCRC)ClustalW 1.7 (DNA/Protein) Global progressive (BCM)
 MEME 2.2 (DNA/Protein) - Multiple EM for Motif
Elicitation (SDSC)
Similarity Searching
a nucleotide or protein sequence sent to the
BLAST server is compared against and a
summary of matches is returned to the user.
allows all combinations of DNA or protein
query sequences with searches against DNA
or protein databases:
BLAST variations
 blastp compares an amino acid query sequence against a
protein sequence database.
 blastn compares a nucleotide query sequence against a
nucleotide sequence database.
 blastx compares the six-frame conceptual translation
products of a nucleotide query sequence (both strands) against
a protein sequence database.
 tblastn compares a protein query sequence against a
nucleotide sequence database dynamically translated in all
six reading frames (both strands).
 tblastx compares the six-frame translations of a nucleotide query
sequence against the six-frame translations of a nucleotide
sequence database.
Types of Tools
Protein Structure
Ab initio -- based on energy minimization
fold recognition -- sequence -> secondary
structure, then align secondary structures
with corresponding secondary structures
in related proteins, etc.
statistical -- based on “hidden patterns”;
similar patterns -> similar structure
Protein Secondary
Structure Prediction
Coils - prediction of coiled coil regions
nnPredict - uses a 2 layer neural network
PSSP / SSP - segment-oriented prediction
PSSP / NNSSP - nearest-neighbor prediction
SAPS - statistical analysis of protein sequences
Paircoil - coiled coil regions of pairwise residue
Protein Hydrophilicity /Hydrophobicity
SOPM - self optimized prediction method
Types of Tools
Protein Function
Pfam groups of similar function proteins aligned
and HMMs generated for each “cluster”
HMM generated for unknown function protein
and compared to HMMs of known proteins for
predicted function classification
Pfam components
 PROTEIN HMM SEARCH - Analyze a protein query
sequence to find Pfam domain matches.
 DNA HMM SEARCH - Analyze a DNA query sequence
to find Pfam domain matches. (Uses the GeneWise
server at the Sanger Centre.)
 BROWSE PFAM - View Pfam annotation and
 TEXT SEARCH - Query Pfam by keywords.
 BROWSE SWISSPFAM - View the domain organization
of any SWISSPROT/TrEMBL sequence according to
Types of Tools
Across organisms …
Phylogeny Reconstruction
Construct evolutionary trees based on
divergences that occur in related
parsimony, minimum distance, etc.
parsimony -- construct tree so that number
of mutation events is minimized
PHYLIP, PAUP, others, some interactive
Visualization Tools
Database viewers
Sequence viewers
Molecular viewers
Physical Mapping Software
used to physically locate genetic markers.
 FPC Software for FingerPrinting Contigs.
 Image 3.x Software for processing fingerprint gel
 RHServer This web interface positions one or more
markers on the 1998 International Gene Map (GB4).
 SAM System for Assembling Markers. SAM takes as input
a set of clones and their associated markers, and
outputs a partially ordered marker map.
 Z-RHMAPPER Extensions to the RHMAPPER (Whitehead)
Radiation Hybrid Mapping Package.
Good Resources
Pedro’s BioMolecular Research
BCM pages
Sanger Center
Mining Co. Web Site
& many others