Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Quantitative comparative linguistics wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Microevolution wikipedia , lookup
Designer baby wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Pathogenomics wikipedia , lookup
Gene expression profiling wikipedia , lookup
Genomic library wikipedia , lookup
Bioinformatics for Microarray Studies at IBS Pei-Ing Hwang, Ph.D. Mar. 24, 2005 3/24/2005 TIGP 1 Different aspects for life science research genomics transcriptomics proteomics 3/24/2005 TIGP 2 Building blocks for DNA or RNA DNA: A, T, G, C RNA: A, U, G, C 3/24/2005 TIGP 3 DNA: deoxyribonucleic acid Double stranded 3/24/2005 Antiparallel TIGP 4 Why microarray? Gene Expression To simultaneously study multiple genes To obtain an overview of gene expression at transcriptional level under specific experimental conditions To study gene interaction network from the transcriptional aspect Genome 3/24/2005 SNP detection To find out recombination site in the chromosome/genome Hopefully to discover the gene responsible for a genetic disease TIGP 5 Outline Introduction to Microarray experiments Experiences at IBS for the cDNA arrays Data generated with microarray DNA annotation Data Analysis Data Management 3/24/2005 TIGP 6 About Microarray Technology-1 Up to hundreds of thousands of spots in a fixed area on a glass slide or a membrane One species of DNA molecules per one spot Spot is also named as “feature” DNA fixed on the chip or membrane is also called “probe The sequence or/and function of each DNA species on the spot is known . 3/24/2005 TIGP 7 About Microarray Technology-2 Making use of “hybridization method” A : T, U G:C Image processing Data analysis Result interpretation from biology aspect 3/24/2005 TIGP 8 Types of Microarray Types of DNA immobilized on the solid support Manufacturing methods Printing vs. photolithography Solid support cDNA vs. oligonucleotides Glass slides Membrane Nucleotide labeling (slide scanning condition) 3/24/2005 One color vs. two colors TIGP 9 GeneChip® Array Manufacuturing Figure 1. Affymetrix uses a unique combination of photolithography and combinatorial chemistry to manufacture GeneChip® Arrays. 3/24/2005 TIGP 10 Microarray printing machine http://arrayit.com/Products/MicroarrayI/NanoPrint/Nano-Print-new-600.jpg 3/24/2005 TIGP 11 Procedure for one-channel array 3/24/2005 TIGP 12 Experimental Procedure for 2-channel Microarray 3/24/2005 TIGP 13 Data Analyses Feature intensity acquisition Image analyses: To identify differentially expressed genes Normalization (global, local, print-tip, btwn array etc.) Clustering or Classification Analyses from biology aspect Significant genes Transcriptional regulation study Cellular pathway or network finding 3/24/2005 TIGP 14 Experiences at IBS for the cDNA arrays 3/24/2005 TIGP 15 About IBS tomato arrays ~13000 spots/features per chip 1 clone per spot cDNA clones from ~a dozen of various cDNA libraries At least two different protocols were followed and six different vectors were used More than ten technicians involved 3/24/2005 TIGP 16 Bioinformatics for Microarray at IBS (cont’d) IBS tomato EST database construction Installation, management and maintenance of data analyses software Reference information searching Batch Submission of EST sequences 3/24/2005 TIGP 17 Bioinformatics Needs for Microarray Studies at IBS Pre-arraying cDNA info collection, vector trimming, sequence annotation, EST submission……..etc. Array data management information management Gene set characterization, data storage, data retrieval Post-hybridization management 3/24/2005 data analysis and array data analyses, storage of the scanning result, biologyoriented bioinformatics analyses TIGP 18 Bioinformatics Service Work for Microarray studies at IBS Data pre-processing for the cDNAs Clone id assignment Sequence trimming gene annotation Function classification Data sheet preparation for commercial software to analyze microarray data Gal file preparation for GenePixPro Master Gene List preparation for GeneSpring 3/24/2005 TIGP 19 Vector trimming cDNA clones sequencing Assembly Database Function annotation PCR Biological meaning : Spotfire, GeneSpring Pathway analysis Transcription network Data analysis: Normalization, Variance Clustering Gene-gene interaction 3/24/2005 TIGP GenePix Feature intensities normalization 20 Pre-array Bioinformatics 1. Clone id generation clones from labs 2. Vector Trimming 3. Sequence assembly sequencing 4. Seq annotation (BLAST) 5. EST submission to NCBI 6. Database construction Raw EST seq Data Processing and Management 3/24/2005 TIGP 21 Clone id generation Data centralization following sequencing Rules for re-arraying 96 well plate to/from 384 well PCR from 96 well and spotting from 384 well Order of A1, A2, B1, B2 3/24/2005 TIGP 22 96 or 384 well cDNA clones sequencing 96 well PCR 96 well 384 well 3/24/2005 TIGP 23 96-well to 384 well plates B2 B1 A2 A1 3/24/2005 TIGP 24 Data collection Raw sequencing data obtained from the sequencing company Organized and stored both ABI and text files by labs and by date Confirmed with each sequence contributor for clone info Clone id matched with raw sequences 3/24/2005 TIGP 25 Processing the sequencing data cDNA libraries procedures confirmed with each single lab Vector/linker/primer trimming (Seqclean) Function annotation Blast against different database Gene Ontology annotation Sequence Assembly (Phrap) 3/24/2005 TIGP 26 Procedure to generate cDNA clones 3/24/2005 TIGP 27 IBS tomato EST Database Cloning information Sequencing data Vector/adaptor Trimming information EST assembly Function annotation Cross Reference 3/24/2005 TIGP 28 The Tomato Database Entity-Relationship model Untrimmed Sequence Trimmed Sequence 1. Seq id 2. Trimmed Sequence 1. Seq id 2. Trimmed Sequence 3. Method 4. Trim set Assembly Information 1. Contig _ id 2. Contig Sequence 3. BLAST Result 4. Position 5. Component seq id 1. Seq id 2. At number 3. E-Value 4. Description 5. Identity 6. Other result NCBI BLAST Result Lab info 1. Seq id 2. Comment 3. Primer 4. Biotech 5. Sender 6. Collect From TAIR Result 1. Seq id 2. NCBI _id 3. E-Value 4. Description 5. Identity 6. Other result Seq _ id TOM 4 TOM 3 TIGR Result Clone _ id Clone _ id ID MAP cDNA Library Information 1 1. Clone _ id(3)(4) 8. Host. 2. Name 9. Species 3. Date made 10. Vector 4. Developmental stage 11. Antibiotic. 5. Cloning sites 12. Authors 6. Description 13. Tissue 3/24/2005 7. Library 14. Primer n Clone _ id 1 n 1. Seq id 2. Clone _ id 3. Contig id 4. Lab_id#1 5. Lab_id#2 6. NCBI_sbmt_id93 7. NCBI_sbmt_id94 8. dbEST _ accn _no 9.TIGP note Gene Ontology 1. TC number 2. EC number 3. Process -GO_id -Description 4. Function -GO_id -Description 5. Component -GO_id -Description 1. Seq id 2. TC number 3. E-Value 4. Description 5. Identity 6. Other result TC number 29 Information to be further analyzed Gene set characterization Number of unique genes on the array Number of known/ unkown genes Coordination of each spotted sequence Statistics about spotted cDNA grouped by function/pathway grouped by sequence similarity 3/24/2005 TIGP 30 Post-hybridization data analysis and management 3/24/2005 TIGP 31 Post-hybridization data analysis Software for Microarray Analysis At IBS GenePix Pro5.0 – image processing GeneSpring – microarray data analysis Spotfire – microarray data analysis and data storage TransPath – pathway searching 3/24/2005 TIGP 32 Image Processing GenePix Pro5.0 GAL (GenePix Array List) file 3/24/2005 TIGP 33 From multi-well plate to microarray 3/24/2005 TIGP 34 GAL online 3/24/2005 TIGP 35 GeneSpring at IBS for microarray data analyses standalone software providing statistical methods for data analysis Some bioinformatics providing visaulization licensed annually rigid format requirement for input data requiring installation of a master gene list (master table) prior to data analysis 3/24/2005 TIGP 36 Master table for GeneSpring Master table contains information of Id Source of DNA Gene name Gene function annotation (from Blast results) GO annotation Each array needs its own master table Format of master table may vary with different version of the software. 3/24/2005 TIGP 37 To generate master table for GeneSpring Batch blast against three sequence database Parsing Blast results Incorporating EC number, GO number and other related data from the best BLAST matched results Integrate all required data from various files and generate the master table checking 3/24/2005 TIGP 38 Spotfire for microarray data analyses server-client software linked to Oracle database for data storage providing various statistical methods for data analysis capability in establishing links to more bioinformatics tools can record analysis procedure more flexible format requirement for input data 3/24/2005 TIGP 39 One color array for Arabidopsis Affymetrix ATH1 chip Annotation information provided by company and available on internet 3/24/2005 TIGP 40 Bioinformatics support at Affymetrix 3/24/2005 TIGP 41 Projects for now and the near future Infrastructure build-up Microarray data management system Platform for Bioinformatics analyses Plant Signaling Pathway Database 3/24/2005 TIGP 42 Team 3/24/2005 TIGP 43 Thank you! 3/24/2005 TIGP 44