Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Figure omitted because of copyright reason A printed version can be found at Leung YF, Lam DSC, Pang CP. The miracle of microarray data analysis. Genome Biol. 2001 Aug 29; 2: 4021.1-4021.2. “ I think you should be more explicit here in step two” ~ Normal science consists largely of moppingup operations. Experimentalists carry out modified versions of experiments that have been carried out may times before ~ Thomas S. Kuhn The FAQ of biologist: What is the best microarray analysis software? Different kinds of microarray software  Image analysis software  Data mining software – Statistics software • R packages for microarray analysis      SNPs analysis software Database/ LIMS software Public Expression Database Primer design Software for further data mining: annotation, promoter analysis & pathway reconstruction Softwares won’t discuss today  Hardware control softwares – Arrayer controlling – ArrayMaker – Scanner controlling/ Image acquisition A statistics on current microarray softwares 28 Feb 2002 Jan 2001 Image analysis 17 17 Data mining 39 R packages SNP analysis 14 1 Database/ LIMS 14 4 Public Database 16 8 Accessory 8 - Further data mining 9 - 116 29 Total * Extracted from http://ihome.cuhk.edu.hk/~b400559/arraysoft.html Image analysis software  Spot recognition  Segmentation – Foreground calculation – Background calculation  Spot quality measures Major Image analysis softwares          AIDA array ArrayPro ArrayVision Dapple F-scan GenePix Pro 3.0.5 ImaGene 4.0 Iconoclust Iplab  Lucidea Automated Spotfinder  Phoretix Array3  P-scan  QuantArray 3.0  ScanAlyze 2  Spot  TIGR Spotfinder  UCSF Spot Examples of common used image analysis software  ScanAlyze 2 (Mike Eisen, LBNL)  GenePix Pro 3.0.5 (Axon Instruments)  QuantArray 3.0 (Packard Instrument)  ImaGene 4.0 (Biodiscovery) Spot recognition  ArrayPro from Media Cybernetics  Automate and fast grid, subgrid and spot finding algorithms Segmentation  Purpose – classification between foreground and background – – – – Fixed circle Adaptive circle Adaptive shape Histogram method Segmentation  Using extra dye – DAPI, avoid morphology assumption UCSF Spot Spot quality measure  E.g. QuantArray 3.0 – Diameter – Spot Area – Footprint – Circularity – Spot Signal/Noise – Spot Uniformity – Background Uniformity – Replicate Uniformity  Problem: lacking rigorous spot quality definition and experimental verification Future Image analysis software  Rigorous quality mearsures definition  Extra dye for better segmentation  Automated analysis Data mining software  Main purposes 1. Filtering and normalization 2. Statistical inference of differentially expressed genes 3. Identification of biologically meaningful patterns, i.e. expression profile; expression fingerprint/ signature 4. Visualization 5. Other analysis like pathway reconstruction etcs. Different categories  Turnkey system  Comprehensive software  Specific analysis software  Extension/ accessory of other software Major data mining software  AIDA Array  AMADA  ANOVA program for microarray                 data ArrayMiner arraySCOUT ArrayStat BRB ArrayTools CHIPSpace Cleaver CIT CLUSFAVOR Cluster Cyber T DNA-arrays analysis tools dchip Expression Profiler Expressionist Freeview & FreeOView Gene Cluster                    GeneLinker Gold GeneMaths GeneSight GeneSpring Genesis Genetraffic J-Express MAExplorer Partek R cluster Rosetta Resolver SAM SpotFire Decision Site SNOMAD TIGR ArrayViewer TIGR Multiple Experiment Viewer TreeView Xcluster Xpression NTI Turnkey system  Definition: A computer system that has been customized for a particular application. The term derives from the idea that the end user can just turn a key and the system is ready to go.  For microarray, this includes everything from OS, server software, database, client software, statistics software and even hardware  Examples – Genetraffic (Iobion) • Using Open Source softwares - LINUX, the R statistical language, PostgreSQL, and Apache Web server – Rosetta resolver (Rosetta Biosoftware) • Sun Fire server and drive array, Oracle 8i, Rosetta server and client side software Turnkey system  Advantages – performance – Security – Support multiple users – Incorporate the experiment and data standards in design  Disadvantages – Expensive – Not suitable for small labs – Require dedicated supporting staff – Close system Comprehensive software  Definition: Software incorporate many different analyses for different stage in a single package.  Examples – – – – Cluster (Mike Eisen, LBNL) GeneMaths (Applied Maths) GeneSight (Biodiscovery) GeneSpring (Silicon Genetics) Comprehensive software  Cluster – Filter data – Adjust datanormalization, log transform etc – Clustering – Self-Organizing Maps (SOMs) – Principle Component Analysis (PCA)  GeneSpring – & Promoter analysis – Gene annotation with public database information – Scripting tools – Access Open DataBase Connectivity (ODBC) databases Comprehensive software  GeneMaths – & Bootstrap analysis for clustering – Fast clustering algoritms – Access Open DataBase Connectivity (ODBC) database  GeneSight – & confidence analysis for replicated data – statistical analysis for significant genes – Graphical data set builder Comprehensive software  Advantages – Standardized operation – Generate various analysis easily – Shorter learning curve for biologist – Script language for automated process control – Some brilliant ideas or analysis within particular software – “False” Sense of security? Comprehensive software  Disadvantages – Inflexible to latest analysis development – Generate various analysis too easily – Implicit data analysis/ statistics background and definitions – Proprietary script language – Data compatibility with other softwares – Necessity to design and maintain your own database – Commercial softwares can be expensive! – Adding particular analysis because of marketing purpose, extra spending on unnecessary functions – Sometimes only available in a few computing platforms Specific analysis software  Definition: Software performing a few/ one specific analysis  Examples – GeneCluster (Whitehead Institute Centre for genome research) – INCLUSive - INtegrated CLustering, Upstream Sequence retrieval and motif Sampler (Katholieke Universiteit Leuven) – SAM – Significance Analysis of Microarrays (Stanford University) Specific analysis software  GeneCluster – performing normalization, filter and SOM Specific analysis software  INCLUSive - INtegrated CLustering, Upstream Sequence retrieval and motif Sampler  SAM – finding statistical significant differentially expressed gene Specific analysis software  Advantages – Better statistical background reference, usually with literature support  Disadvantages – Non-standardized environment – java, web, excel… etc – Data compatibility problem – Data preprocessing problem Extension/ accessory of other software  Definition: extension of other software’s capability  Examples: – Freeview: Visualization and Optimization of Gene Clustering Dendrograms for Cluster – ArrayMiner: extension of GeneSpring Statistics softwares  Excel  MATLAB  Octave  SAS  SPSS  S-PLUS  Statistica R Statistics softwares  Advantages – Highly flexible – High level, multivariate analyses are either standard or easily programmable  Disadvantages – Usually command line driven, impossible to learn intuitively (a disadvantage??) – Require a much better understanding of the statistical data analysis to follow the steps (a disadvantage??) R-packages  A language and environment for statistical computing and graphics.  Highly compatible to S/ S-plus  Open source under GNU General Public Licence  Runs on many UNIX/ Linux/ windows family and MacOS platform  There are growing number of microarray analysis softwares (packages) written in R R-packages  Dedicated for microarray analysis – – – – – – – – – affy Bioconductor SMA extension Cyber T GeneSOM Permax OOMAL (S-Plus) SMA YASMA  General packages – cclust – cluster – mclust – multiv – mva – …etc! R-packages  SMA - Statistical Microarray Analysis (Terry Speed, UC Berkeley)  Bioconductor R-packages  SMA – perform intensity and spatial dependent normalization – Replicated array data analysis by an empirical bayes approach R-packages  Result of replicated data output B vs M plot R-packages  Bioconductor – open source software project to provide infrastructure in terms of design and software to assist biologists and statisticians for analysing genomic data, with primary emphasis on inference using DNA microarrays – Most software produced by the Bioconductor project will be in the form of R libraries • Variation 1: provide basic infrastructure support that will help other developers produce high quality software • Variation 2: provide innovative methodology for analyzing genomic data – Provide some form of graphical user interface for selected libraries – A mechanism for linking together different groups with common goals Future Data mining software  Standardized, open-source (free) platform? – EMBOSS - European Molecular Biology Open Software Suite.  More supervised analysis package and pathway prediction package?  Plugin modules – J-express – GeneSpring Mutation analysis software  Chip based SNP or chromosomal aberration analysis (arrayCGH)  Various forms of protocols, e.g. primer extension, ligase chain reaction, MALDITOF-MS, hybridization..etc  Result is in the form of base calling or allelic imbalance  Example – genorama Database  Definition: large collection of data organized especially for rapid search and retrieval  Two categories – Within laboratory/ institute database; LIMS – Public expression database  Standardized definition of data – Minimum Information About a Microarray Experiment (MIAME) • • • • • • Experimental design Array design Samples Hybridizations Measurements Normalization controls Database/ LIMS software  The database within your lab/ institute  The quality of in house data management will affect the quality of final public data repository  Database structure may be relatively simple Major Database/ LIMS software          AMAD ARGUS ArrayDB ArrayInformatics Clonetracker GeNet Genetraffic GeneX MAD  Maxd  NOMAD  Partisan Array LIMS  Phoretix Array2 Database  Rosetta Resolver  SMD Public Expression Database  Necessities – Provide raw data to validate published array result and develop new analysis tools – Further understanding of your data – Compare among different groups, meta-data mining – Source for specialty array design  Different categories – Generic – Species specific – Disease specific  The importance of data standardization Major public gene expression databases  3D-GeneExpression        Database ArrayExpress BodyMap ChipDB ExpressDB Gene Expression Omnibus (GEO) Gene Expression Database (GXD) Gene Resource Locator  GeneX  Human Gene Expression       Index (HuGE Index) RIKEN cDNA Expression Array Database (READ) RNA Abundance Database (RAD) Saccharomyces Genome Database (SGD) Standford Microarray Database (SMD) TissueInfo yeast Microarray Global Viewer (yMGV) Primer/ probe design  Array designer  GAP (Genome- wide Automated Primer finder servers)  OligoArray  Primer3  ProbeWiz Server Other useful software for further data mining  Data annotation – – – – DRAGON Gene Ontology PubGene Resourcerer  Promoter analysis – – – – AlignACE INCLUSive MEME Sequence Logo  Pathway reconstruction – GenMAPP – PathFinder  Data annotation – Link GI to a particular name – Literature mining to infer network  Network reconstruction – Cluster + promoter analysis – statistical inference from experimental data Some suggestions for biologists who are serious in microarray study  Communicate or even collaborate with Statisticians, Mathematicians and bioinformaticians  Learn a high level statistical language, e.g. R  Learn programming, e.g. C  Learn database, e.g. SQL  Learn Linux  Revise your statistics, probability and may be even calculus  Lucky…?! Picture omitted because of copyright reason Conclusion – the future  A unified open environment for standard analysis and development  The best microarray analysis software? ~ Exploratory data analysis can never be the whole story, but nothing else can serve as the foundation stone -- as the first step. ~ John. W. Tukey