Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Large Scale Biological Data Handling and Analysis Using Open Source Alternatives Dr Karol Kozak ETH Zurich May 2009, Paris OME your name Large Scale Experiments and Informatics Instrument management Data acquisition Data flow Image processing Normalization, QC STORAGE Data storage Archiving Data mining Bioinformatics your name Detection points Data Automation: - Library Handling - Robotics - Microscopy - Image processing - Cell reliability - Hit Definition your name Database architecture (LIMS) Database architecture WEB USER INTERFACE Command Line Client DB SCHEME Import / Export Read/Write Search WEBAPPLICATION MySQL/ PostgreSQL Oracle/DB/SQL Server External Bioinformatics Databases : NCBI, PubMed, SCOPE, PDB, Harvester, … your name STORAGE Link to microscope your name STORAGE Link to microscope your name STORAGE Link to microscope Olympus SCAN R your name Data archiving Tape ~100MB Microscope –format BD ThermoFisher Zeiss Leica Olympus MD Image processing - tiff - lsm - lei Convert to jpeg 50 x smaller In Screening Process JPEG JPEG Results Database DB Data Management Tool After Screen JPEG + numeric results plots ~2MB your name *.tiff In Screening Process ETH NAS Users Tape LMC interface- user Buffer server 2x25TB LMC (6 weeks) JPEG JPEG Results Database DB Data Management Tool After Screen JPEG ETH NAS User storage + numeric results plots ~0.5MB your name Dataflow architecture siRNA Inhibit. Library Pre-processing Screening process Library checker Booking system Library DB Protocol Equipment manager Plates with compounds Microscope images DB We needScreen software (Library Handling, QC, Data Mining, Visualization, Export, Flexibility) HitBase Screen results Image analysis Published screens IMAGE SERVER Data mining Bioinformatics Post-processing your name HC/DC your name Read data from database your name Link with image storage your name Data mining STORAGE LIMS Microscope File system Statistics -One/two parameters -Normalization -Correlation -Compare distribution -Ex. Statistical tests, Zscore, Z’, etc Pattern recognition – Machine Learning -Multi-parameters -Clustering -Dimensionality reduction -Check parameter importance your name Classification - Unsupervised learning (Cluster analysis, Clustering) seeks to discover the classes ? your name Classification problem ? - Supervised learning (Classification) assumes classes are known your name 2 class problem Positive control Negative control ? your name Train data for supervised learning 5 people your name Train data for supervised learning WEKA/R-Project nodes (KNIME) Niels Landwehr, Mark Hall, Eibe Frank (2005). Logistic Model Trees. your name 92% Accuracy HC/DC Type I Input data 1 or 2 Input(s) and 1 or 2 Output(s) Input data Output data Output data (mostly data readers) 1 Input (mostly for visualization) Type III 1 Output Type II Each node has submenu (right mouse button) Settings: Parameters Rules Algorithms rules Filters Settings your name HC/DC KNIME nodes HC/DC nodes WEKA/R-Project nodes (KNIME) your name Next-Generation sequencing nodes Roche 454 Type I Input data 1 or 2 Input(s) and 1 or 2 Output(s) Output data your name Next-Generation sequencing nodes SOLEXA Illumina Type I Input data 1 or 2 Input(s) and 1 or 2 Output(s) Output data your name Proteomics Type I Input data 1 or 2 Input(s) and 1 or 2 Output(s) Output data your name Drag-and-drop your name Pioneering study your name Plot image parameters/descriptors your name Compound screening your name Menu HC/DC your name Library Handling Data Automation: - Volume - Concentration - Dilutio - Splitting - Take liquid - Barcode your name Workflows for HCS your name Workflows for HCS your name Workflows for HCS your name Workflows for HCS your name Link to OME/OMERO your name Acknowledgements & Partners: Contribution in development: ETH Zurich MPI-CBG, Dresden Eberhard Krausz (former) & team LMC-RISC: Gabor Csucs & team SystemX Adrian Honegger & team Eugenio Fava & team Marc Bickle & team MPI-IB, Berlin Nikolaus Machuy & team your name WebPage HCDC: http://hcdc.ethz.ch Workshops - Webinar: 04.06.2009 2-4pm Workshops HCDC+ KNIME: 16-17.10.2009 2 days ETH Zurich your name