Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probe selection for Microarrays Considerations and pitfalls Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2001.11 Probe selection wish list Probe selection strategy should ensure Biologically meaningful results (The truth...) Coverage, Sensitivity (... The whole truth...) Specificity (... And nothing but the truth) Annotation Reproducibility Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2001.11 Technology Probe immobilization Oligonucleotide coupling Synthesis with linker, covalent coupling to surface Oligonucleotide photolithography ds-cDNA coupling cDNA generated by PCR, nonspecific binding to surface ss-cDNA coupling PCR with one modified primer, covalent coupling, 2nd strand removal Spotting With contact (pin-based systems) Without contact (ink jet technology) Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2001.11 Technology-specific requirements General Not too short (sensitivity, selectivity) Not too long (viscosity, surface properties) Not too heterogeneous (robustness) Degree of importance depends on method Single strand methods (Oligos, ss-cDNA) Orientation must be known ss-cDNA methods are not perfect ds-cDNA methods don’t care Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2001.11 Probe selection approaches Accuracy Throughput Selected Gene Regions Selected Genes ESTs Cluster Representatives Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2001.11 Anonymous Non-Selective Approaches Anonymous (blind) spotting Using clones from a library without prior sequencing Only clones with interesting expression pattern are sequenced Normalization of library highly recommended Typical uses: HT-arrays of ‘exotic’ organisms or tissues Large-scale verification of Differential Display clones EST spotting Using clones from a library after sequencing Little justification since sequence availability allow selection Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2001.11 Spotting of cluster representatives Sequence Clustering For human/mouse/rat EST clones: public cluster libraries Unigene (NCBI) THC (TIGR) For custom sequence: clustering tools STACK_PACK (SANBI) JESAM (HGMP) PCP (Paracel, commercial) Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2001.11 A benign clustering situation Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2001.11 In the absence of 5‘-3‘ links ! Two clusters corresponding to one gene Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2001.11 ! Overlap too short Three clusters corresponding to one gene Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2001.11 ! Chimeric ESTs ! One cluster corresponding to two genes Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2001.11 Chimeric ESTs ... continued Chimeric ESTs are quite common Chimeric ESTs are a major nuisance for array probe selection One of the fusion partners is usually a highly expressed mRNA Double-picking of chimeric ESTs can fool even cautious clustering programs. Unigene contains several chimeric clusters The annotation of chimeric clusters is erratic Chimeric ESTs can be detected by genome comparison There is one particularly bad class of chimeric sequences that will be subject of the exercises. Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2001.11 How to select a cluster representative If possible, pick a clone with completely known sequence Avoid problematic regions Alu-repeats, B1, B2 and other SINEs LINEs Endogenous retroviruses Microsatellite repeats Avoid regions with high similarity to non-identical sequences In many clusters, orientation and position relative to ORF are unknown and cannot be selected for. Test selected clone for sequence correctness Test selected clone for chimerism Some commercial providers offer sequence verified UNIGENE cluster representatives Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2001.11 Selection of genes If possible, use all of them Biased selection Selection by tissue Selection by topic Selection by visibility Selection by known expression properties Selection from unbiased pre-screen Use sources of expression information EST frequency Published array studies SAGE data Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2001.11 Selection of gene regions 3‘ UTR ORF 5‘ UTR Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2001.11 Alternative polyadenylation Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2001.11 Alternative polyadenylation Constitutive polyA heterogeneity Regulated polyA heterogeneity 3’-Fragments: reduced sensitivity no impact on expression ratio Fragment choice influences expression ratio Multiple fragments necessary Detection of cryptic polyA signals Prediction (AATAAA) Polyadenylated ESTs SAGE tags Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2001.11 Alternative splicing Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2001.11 Alternative splicing Constitutive splice form heterogeneity Regulated splice form heterogeneity Fragment in alternative exon: reduced sensitivity No impact on expression ratio Fragment choice influences expression ratio Multiple fragments necessary Detection of alternative splicing events Hard/Impossible to predict EST analysis (beware of pre-mRNA) Literature Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2001.11 Alternative promoter usage Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2001.11 Alternative promoter usage What is the desired readout? If promoter activity matters most: multiple fragments If overall mRNA level matters most: downstream fragment Detection of alternative promoter usage Prediction difficult (possible?) EST analysis Literature Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2001.11 UDP-Glucuronosyltransferases UGT1A8 UGT1A7 Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2001.11 Selection of gene regions Coding region (ORF) Annotation relatively safe No problems with alternative polyA sites No repetitive elements or other funny sequences danger of close isoforms danger of alternative splicing might be missing in short RT products 3’ untranslated region Annotation less safe danger of alternative polyA sites danger of repetitive elements less likely to cross-hybridize with isoforms little danger of alternative splicing 5’ untranslated region close linkage to promoter frequently not available Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2001.11 A checklist Pick a gene Try get a complete cDNA sequence Verify sequence architecture (e.g. cross-species comparison) Mask repetitive elements (and vector!) If possible, discard 3’-UTR beyond first polyA signal Look for alternative splice events Use remaining region of interest for similarity searches Mask regions that could cross-hybridize Use the remaining region for probe amplification or EST selection When working with ESTs, use sequence-verified clones Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2001.11