Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
DEVELOPMENT OF OMICS-BASED PROFILING TESTS FOR TOXICOLOGY AND CLINICAL TRIALS Gilbert S. Omenn, MD, PhD Center for Computational Medicine and Bioinformatics University of Michigan Ann Arbor, MI, USA 8th International Symposium on Recent Advances in Environmental Health Research, Jackson State University Jackson, MS, 19 September, 2011 Disclosures Board of Directors, Amgen Inc (biotech leader) Board of Directors, Armune Biosciences Inc (Univ of Michigan spinoff, cancer diagnostics) Boards of Scientific Advisers: Arboretum Venture Partners (Ann Arbor) Compendia Biosciences (UM spinoff, bioinformatics) Galectin Therapeutics (early-stage; cancer and fibrosis) Innocentive Innovation (Eli Lilly spinoff) No conflicts relevant to this presentation. Near-Completion of Human Genome Sequence, Feb 2001 Eric Lander J. Craig Venter and Francis Collins Ari Patrinos Lee Hood Protein DNA “Unlock the Secrets of the Laboratory” In 1965, U.S. President Lyndon B. Johnson swooped down on the campus of the National Institutes of Health by helicopter from The White House downtown in Washington DC. He applauded the prowess of the biomedical scientists at NIH and around the country. He reminded us that the Nation is impatiently waiting for results from research—translated into better medicines and better health. Vision of Biology as an Information Science: Key Components • An avalanche of ’omic information: validated SNPs, haplotype blocks, candidate genes/alleles, sequences, proteins, & metabolites—to be associated with disease risks • Powerful computational methods and cyberinfrastructure • Effective linkages with better environmental, dietary, and behavioral datasets for eco-genetic analyses • Credible privacy and confidentiality protections in research and clinical care • Breakthrough tests, vaccines, drugs, behaviors, and regulatory actions to reduce health risks and cost-effectively treat patients globally. Integrating High-Throughput Measurements with the Phenotype: from Science to Medicine Theme: From Data to Knowledge Further Disclosure I currently chair the Institute of Medicine (National Academy of Sciences) Committee on “Omics-Based Predictive Tests for Clinical Trials”. The goal of these omics-based studies is more effective, more specific, safer, more “personalized” medical care. Our Committee is charged with also making the research more transparent and reproducible. Developing Omics Clinical Tests • Molecular analyses of cancers can reveal information about • • • • mechanisms of initiation and progression and provide the foundation for clinical tests. The aims of such tests include proper diagnosis, earlier diagnosis, prognosis/risk of metastases, response to specific therapies, and evidence of recurrence: “clinical utility”. Cancers are very heterogeneous in causation, progression, response to therapies, and risks of metastases and death. Gene expression, genomic, epigenomic, proteomic, and metabolomic studies are complementary “omics platforms” for development of clinically useful tests. It is a long path of discovery, confirmation, validation, clinical trials, and FDA approval to establish test validity and utility. Molecular Signatures in Medical Practice Cancer Diagnosis and Prognosis • Mammaprint®: 70-gene RNA sig from breast Ca (FDA-approved); (Agendia) + BluePrint: 80-gene RNA sig that distinguishes basal, luminal, and ERBB2 subgroups of breast cancer • Oncotype DX®: 21-gene RNA sign from BRCA; 12-gene, colon cancer (Genomic Health) • Pathwork® Tissue of Origin Test: 2000 RNAs (PathworkDx) • Ova1: 5 abundant proteins, for decision about pelvic surgery Cardiovascular Disease Diagnosis and Prognosis • AlloMap®: 11-gene RNA sig for rejection post-cardiac Tx (XDx) • Corus CAD™: 23 gene blood RNA sig for CAD (CardioDx) • Triage® Cardiac Panel: 3 blood protein sig for assessment of chest pain and short of breath for potential AMI (Biosite) • Genotype Panels: CYPs (drug metabolism); VKORC1 (Genelex) Gene Expression Signatures to Guide Treatment Decisions for Breast Cancer • Mammaprint test to guide use of systemic chemotherapy • Based on early gene expression prognosis studies by Van’t Veer et al (Nature 2002): node-negative, stage I/II, age <55, distant metastases <5 yrs vs none in 5 yrs, distinguished with panel of 70 genes (transcripts) • One of the few examples of a research test taken through the entire process to FDA-approved clinical In Vitro Diagnostic-Multigene Index Assay (IVDMIA) from Agendia, Inc (Amsterdam/Irvine) Breast Cancer – MammaPrint Signature Confirmation on Retrospective Consecutive Series OncotypeDX--21Genes—to Guide Decision on Chemotherapy in ER+, node-negative BRCA Generates a risk score: <18, 18-30, > 31 = low, intermediate, and high, with steep recurrence rates (6.8, 14.3, 30.5% over 10 yrs) and mortality rates (2.8, 10.7, 15.5%). Launched in 2004 by Genomic Health Inc. Includes tests for estrogen and progesterone receptors, which also informs the decision to go ahead with Tamoxifen or Aromatase inhibitor (endocrine preventive protocol). Testimony by Richard Simon, DSc, Chief, Biometrics, NCI, to IOM Committee “Most cancer treatments benefit only a minority of patients to whom they are administered .” Prognostic, predictive, and effect-modifier biomarkers could make a difference if “actionable” in clinical decision-making. “The one thing that different kinds of biomarkers have in common is that they are generally developed and validated poorly.” Critical Evidence for Clinical Utility Claims of medical utility for prognostic and predictive biomarkers based on analysis of archived tissues can have either a high or low level of evidence depending on: analytical validation of the assay, nature of the study yielding archived specimens, the number and condition of the specimens, and prior development of a focused written plan for analysis of a completely specified biomarker classifier. Studies using archived tissues from prospective clinical trials, conducted under ideal conditions and independently confirmed, can provide the highest level of evidence. Analyses of prognostic or predictive factors, using nonanalytically validated assays on a convenience sample of tissues and conducted in an exploratory and unfocused manner, provide poor evidence for clinical utility. Prostate Cancer Diagnosis and Prognosis The standard test for >30 yrs is PSA = prostate-specific antigen. A single protein. The test is quite good for monitoring treated patients for response (drop in elevated PSA) and recurrence (reappearance of PSA). The test is poor for screening large numbers of men for the diagnosis: Sn 0.6, Sp 0.6, very low predictive value, i.e. many false-positives and false-negatives. Bioinformatics Approach Led to the Discovery of Gene Fusions in Prostate Cancers Fusions of TMPRSS2 to the ETS Family of Transcription Factors Scott Tomlins et al Science (2005) Gene Fusions Reveal Molecular Subtypes of Prostate Cancers: Personalized Oncology ETS family fusions: 50-60% ETS and PARP inhibitors SPINK1:10-15% SPINK1 mAb/EGFR inhibitors B-RAF, K-RAS: 1-2% each Raf kinase and Ras inhibitors With the drugs in hand, seek the patients with the corresponding target for specific therapy. Making a Difference: Asking Patient-Centric Questions Especially among prostate cancers, a small percentage of cancers account for the mortality: those which are invasive and metastasize. What are the molecular markers and mediators for such cellular behaviors? How can we tell apart the lethal cancers from the relatively innocuous cancers that look the same by histology and stage? This is more important than earlier detection in most cases. The story of sarcosine (N-methylglycine) follows. Metabolomic Profiles Delineate Potential Role for Sarcosine in Prostate Cancer Progression Sreekumar et al (Nature 2009) combined highthroughput liquid-and-gas-chromatography-based mass spectrometry to profile 1126 metabolites across 262 clinical samples related to prostate cancer (42 tissues; 110 urine, 110 plasma). Few differences in urine or plasma; 60 of 626 identified in prostate tumor tissue but not benign prostate. Six cpds showed increase from benign to PCA to metastatic PCA: sarcosine, uracil, kynurenine, glycerol-3-phosphate, leucine, and proline. Oncomine Concept Maps showed amino acid metabolism and methyltransferase activity increased. Metabolomic Profiling of Cancer Progression Sreekumar et al, Nature 2009 Sarcosine concentration is greatly increased in metastatic prostate cancers, compared with localized tumors and especially benign tissue. Sarcosine as Biomarker/Mediator Sarcosine (N-methylglycine) was much higher in metastatic tumors than localized, and nearly undetectable in benign prostate. Its levels were also increased in invasive prostate cancer cell lines relative to benign prostate epithelial cells. Knockdown of glycine-N-methyl transferase attenuated prostate cancer invasion. Exogenous sarcosine or knockdown of the enzyme that leads to sarcosine degradation, sarcosine dehydrogenase, induced an invasive phenotype in benign prostate epithelial cells. Androgen receptor and the ERG gene fusion product coordinately regulate components of the sarcosine pathway, binding to the promoter of GNMT. A test on urine sediment and supernatant is under development by Metabolon after licensing from the Chinnaiyan Lab at U of M. Schematic Representation of the Sarcosine Pathway O O SAM H2N GNMT CH3 OH Glycine H N SAH SARDH CANCER OH Sarcosine Invasion? Migration? Aggressivity? PIPOX SAM: S-Adenosyl-L-methionine; SAH: S-Adenosyl-L-homocysteine; GNMT: Glycine-N-methyltransferase; SARDH: Sarcosine dehydrogenase; PIPOX: L-Pipecolate oxidase SARDH- overexpression reduces tumor growth and decreases Sarcosine levels in mouse xenograft model A. Tumor Growth B. Sarcosine Levels AUC of Individual Metabolites and the Panel of the Training Set Metabolites AUC Sarcosine 0.76 Glutamic Acid 0.74 Glycine 0.79 Cysteine 0.73 Multiplex Panel 0.88 Since the AUC of the panel (0.88) is higher than the AUC of each metabolites, we expect the panel will outperform the individual markers Metabolite Panel Construction ROC curve of the training set ● The multiplex panel was developed using logistic regression on the training set of 70 urine sediments consisting of 4 metabolites. ● The performance of the panel was evaluated using leave-one-out cross validation. ● The AUC (area under the ROC Curve) is 0.88 indicating high performance Validation in an Independent Cohort • The performance of the panel was evaluated using 88 urine sediments (28 biopsy negative, 28 biopsy positive and 32 radical prostatectomy) in a blinded fashion. The AUC is 0.80. • This data supports the utility of the multiplex metabolite marker panel in the non-invasive diagnosis of prostate cancer. Future Directions/Next Steps Continue validation of the multivariate panel with independent cohorts in Chinnaiyan Lab/UM Ctr for Translational Pathology Assist Metabolon in deployment/modification of the assay in Metabolon CLIA lab; consider UM CLIA/CAP lab, too. Facilitate EDRN validation trial of metabolite multiplex Test additional metabolites for an expanded multiplex Evaluate clinical utility for different use scenarios: (a) diagnosis when PSA 4-10 ng/ml; (b) aggressivity/risk that tumor is metastatic Explore multiplexes with other classes of molecular alterations, including TMPRSS2-ERG and PCA3. Lifestage Exposures and Adult Disease UM NIEHS P30 Center (Howard Hu, PI) Bioinformatics Core (BIC) Launched May, 2011 The Bioinformatics Core The range of high-throughput technologies available for studying the mechanisms of epigenetic modification is rapidly expanding. Thus, the importance of epigenetics researchers having access to advanced bioinformatics collaborators is growing. The Bioinformatics Core (BIC) of the University of Michigan NIEHS P30 Center aims to enhance the interpretation of experimental and clinical results from a broad range of epigenetic studies. BIC Staff Leader: Maureen A. Sartor, PhD Research Assistant Professor Center for Computational Medicine and Bioinformatics [email protected], 2044 Palmer Commons Co-Leader: Gilbert S. Omenn, MD PhD Director, Center for Computational Medicine and Bioinformatics Professor of Internal Medicine, Human Genetics, & Public Health, [email protected] Member: Richard C. McEachin, PhD Research Investigator Center for Computational Medicine and Bioinformatics [email protected], NCRC Bldg 10, Suite A121 Areas of Expertise We offer guidance in using bioinformatics tools related to, but not limited to, transcription factor binding motifs/modules, gene/toxin relationships, functions and biological processes, public high-throughput data repositories, genome visualization, regulatory prediction, and protein interaction networks. Also, tools for natural language search and processing of the biomedical literature. Epigenomics : DNA methylation (microarrays, Methyl-Seq, MeDIPSeq), histone modifications (ChIP-Seq), and microRNA analyses Genomics: microarrays, RNA-Seq, genome wide association, linkage Proteomics Metabolomics Regulatory mechanisms and transcriptomics Integrative analyses and systems biology: pathways, annotation Phenotype definitions Data management Assessment of Environmental Influences DNA methylation profile, from the lab of Dana Dolinoy (NIEHS #1R01 ES017524-01) Sample throughput vs. genome coverage for various DNA methylation techniques One of several mouse samples after exposure to BPA. Samples were prepared using the MethylPlex technique from Rubicon (Ann Arbor) and deep sequenced using Illumina GAIIx Laird PW, Principles and challenges of genomewide DNA methylation analysis, Nat Rev Genet. 2010 Mar;11(3):191-203 Collaborations Epigenomics Web Portal (Bisphenol A) - Genomics Portals is an integrative, web-based computational platform for the analysis and mining of genomics data, developed at the University of Cincinnati by a BIC external advisor (http://eh3.uc.edu/GenomicsPortals). Shinde K, Phatak M, Freudenberg JM, Chen J, Li Q, Joshi VK, Hu Z, Ghosh K, Meller J, Medvedovic M. Genomics Portals: Integrative WebPlatform for Mining Genomics Data. BMC Genomics. Jan 13;11(1):27. 2010 High-Throughput Data Analysis ConceptGen is a gene set enrichment and relationship mapping tool that can help you identify, explore, and visualize relationships among gene sets (http://conceptgen.ncibi.org). LRpath is an alternative gene set enrichment testing method for interpreting high-throughput results, such as from DNA methylation experiments (http://lrpath.ncibi.org). ConceptGen graphic of related biological concepts (p < 0.05) for genes with increased methylation and decreased expression level in HPV(-) relative to HPV(+) cell lines. Sartor MA, Dolinoy DC, Jones TR, Colacino JA, Prince MEP, Carey TE, and Rozek LS, Epigenetics June 2011; 6 (6): 777-787 LRpath graphic, methylation in 6 cancer datasets Other Resources (https://portal.ncibi.org) MiMI - The MiMI database comprehensively includes protein interaction information that has been integrated and merged from diverse protein interaction databases. (mimi.ncibi.org) Gene2MeSH - Gene2MeSH uses a statistical approach to automatically annotate genes with the concepts defined in Medical Subject Headings (MeSH). (gene2mesh.ncibi.org) Comparative Toxicogenomic Database (CTD) - CTD advances understanding of the effects of environmental chemicals on human health. (ctd.mdibl.org) GeneGo – MetaCore and ToxHunter together provide an integrated knowledgebase and software suite for pathway analysis and systems toxicology (www.genego.com) GenePattern – GenePattern is a powerful genomic analysis platform developed at the Broad Institute. (http://www.broadinstitute.org/cancer/software/genepattern) Genomatix MatInspector identifies transcription factor binding sites (www.genomatix.de) Cytoscape is an open source bioinformatics platform for visualizing molecular interaction networks and biological pathways. (http://www.cytoscape.org) Bioinformatics Training Classes and/or one-on-one sessions are available. Recent sessions include training on Cytoscape (hands-on and webinar formats), ConceptGen, Gene2MeSH, and MiMI. Suggestions for future offerings are welcome. For more information on training sessions, contact Marci Brandenberg ([email protected]), Maureen Sartor, or Rich McEachin. Comparative Toxicogenomics Database Manually-curated, public resource of the triad of chemical-gene, chemicaldisease, and gene-disease relationships, integrated to construct chemical-genedisease networks. As of July 2010, CTD contained 1.4 million triad data points and analytical tools like GeneComps and ChemComps to find comparable genes and chemicals that share toxicogenomic profiles, enriched Gene Ontology terms, and Venn diagram tools to discover overlapping and unique attributes of any set of chemicals, genes, or disease, and enhance gene pathways data. Indexed at numerous other databases, including PubChem, PharmGKB, UniProt, T3DB, GAD, ChemID, and TOXNET. Link also with databases of gene variants and eco-genetic relationships. Datasets from microarray and proteomics studies in various species are available at the Chemical Effects in Biological Systems Knowledgebase ((Waters et al., 2008); http://cebs.niehs.nih.gov/cebs). Aims for Proteomics of Cancers 1. Profile tumor specimens a) for diagnosis and stratification of patients b) for prognosis with particular therapies c) for clues to circulating biomarkers 2. Profile circulating proteins a) to discover and validate biomarkers for earlier diagnosis b) to apply such biomarkers to predict/monitor response to treatment and recurrence. OVA1 Test for Ovarian Cancers Based on empirical results from mass spectrometry-based proteomics—not tied to cancer pathways or mechanisms Five abundant plasma proteins: beta-2 microglobulin and CA125 (MUC16): up transthyretin, apolipoprotein A1, transferrin: down Approved by FDA for narrow indication of testing for cancer prior to surgery in women who have pelvic masses (80-90% benign). Licensed by Johns Hopkins (Dan Chan) to Vermillion, Inc. (Sn 0.92, Sp 0.45). Aptamer-based Proteomics, Lung Cancer Biomarker Panel (Ostroff/Gold, 2010) Aptamer 12-Biomarker Performance in Distinguishing NSCLC from Controls Performance of Aptamer Classifier on Serum Samples from 4 Sites Barriers for Proteomic Cancer Biomarker Discovery in Plasma Human cancers are very heterogeneous Tumor proteins are in low abundance for early detection of cancers Tumor proteins are greatly diluted upon release to extracellular fluid and blood Plasma is an extraordinarily complex specimen dominated by high abundance proteins Knowledge of the plasma proteome is still limited (latest, least-redundant Human Plasma Peptide Atlas has 1929 canonical proteins: Farrah et al, Mol Cell Proteomics June 2011; www.peptideatlas.org) Biomarker Discovery from Tumor Tissues and Plasma: Strategies 1. 2. 3. 4. Start with microarray or next-gen sequencing evidence for carcinogenic pathway mechanisms in tumor and track corresponding protein biomarker candidates to the plasma; e.g., TMPRSS2/ETS fusion and sarcosine in prostate cancers. Perform targeted proteomics with SRM/MRM to identify and quantitate these candidates. Detect auto-antibodies in plasma as a biological amplification of tumor protein signals, then confirm in tumor tissue. Identify alternative splice isoforms of biologically meaningful proteins in cancers and in plasma of humans and mouse models: exciting new work from my lab. A New Class of Biomarker Candidates, from Alternative Splicing Alternative splicing generates protein diversity without increasing genome size Most genes produce alternative transcripts Greatly improved MS/MS instrumentation enables confident identification of peptides from proteins coded by mRNA transcript sequences expressed at quite low levels. Alternative Splice Isoforms – Contribute to diseases, especially cancers – Potentially useful as biomarkers for cancer Alternative Splicing Events Rajan, P. et al., Nature Reviews Urology, 2009, 6, 454-460 Mouse Models and Human Cancers used in Splice Variant Studies Pancreatic Cancer – mutations: Kras G12D activation and INK4a/ARF deletion – Menon et al, Cancer Res 2009;69:300-309 Breast Cancer – mutation: Her2/Neu amplification – Menon & Omenn, Cancer Res 2010; 70:3440-49 Human Prostate Cancer and VCaP/RWPE Cell Lines -- Yocum et al (2011, submitted) Different Types of Alternative Splicing Events among Novel Peptides Protein Novel peptide Mgi symbol Sample type Probable splicing mechanism M14C435_25_s53_e 5231_1_rf1_c1_n0| ITFDDHKNGSCGVSYI AQEPDAP flnb tumor intron retention M19C1480_7_s383_ e953_1_rf1_c1_n0| ATETARLLPGTALAEA QSPLRRLTLTQAPPR fth1 tumor located in the 5'UTR region; alternate translation start site M15C7603_11_s118 1_e1415_1_rf1_c1_ n0| QTSSRPAMGGGTARW QR gapdh1 normal different frame; frame 2 MXC910_48_s179_ e2030_1_rf0_c1_n0| LLEELAAARPGEPALM SSSPLSKKRR uba1 normal novel peptide from Ensembl exon 2 and 3 junction; the currently annotated Ensembl cDNA starts from exon 3 M7C13466_20_s2_e 518_1_rf2_c1_n0| EARSLSDGGPADSVEA AK nap1l4 tumor exon skipping; exon 2 skipped Protein Interaction Network Displayed by MiMI-Cytoscape Plugin (Only the direct interactions between the input genes are shown) The parent gene symbols of the alternative splice variants found only in the tumor sample of the breast cancer dataset were used as the input gene list. The gene symbols in bold are differentially expressed proteins 3-UTRs, 3-UTRs, Summary We have identified many biologically interesting novel and known Alternative Splice Variants Many were over-expressed in the cancer samples versus the normal samples—either by labeling or by spectral counting Some cancer-associated splice variants have more CK-2 and PKC phosphorylation sites Predictive structural studies can show the effects of splicing on phosphorylation Alternative Splice Variants could be used as biomarker candidates Ongoing Special Studies Structural analyses of conformational and functional features of the differentially-expressed splice variants may help us understand the underlying mechanisms in different types of cancers We are combining next-generation mRNA sequencing and proteomics-based identification of splice variants with targeted (MRM) proteomics to develop biomarker assays. We have studies in progress on human prostate, lung, and colon tumors or cell lines. Human Plasma PeptideAtlas – 91 expts; 3,172,759 peptide-spectrum matches; 20,679 distinct peptides at FDR 0.0016; 1929 canonical proteins at FDR 0.01 The HUPO Human Plasma Project (HPP) Major Goals of the Human Proteome Project Identify and characterize proteins from all of the 20,300 protein-coding genes. Identify and quantify protein isoforms from post-translational modifications, splicing, SNPs, tissue-specific expression in health & disease Lay foundation for biomarker discovery, confirmation, validation, and development of clinically useful multiplex assays Contribution to the HPP by Antibody-based Proteomics Mathias Uhlén, HumanProtein Atlas V7.0, 2010, with immunohistochemistry for 10,000 proteins www.mrmatlas.org Picotti et al Nature Methods 2007; Picotti et al , Nature Methods, 2010 Peptide/Protein SRM Coverage by Chromosome Chromosome-Based HPP (9/2010) Pipeline 1 1 15 ERBB2 PSMD3 2 2 1 1 GRB7 3 ZPBP2 IKZF3 LOC72812 9 4 MED24 RARA 1 1 GSDMA CSF3 5 1 1 2 2 2 C17orf37 3 1 6 1 9 5 CDC6 PTMs Genes found in Ensembl Gene with MS information (GPMDB) Number of pseudogenes 1 1 THRA 1 ORMDL3 1 1 5 1 NR1D1 RAPGEFL 1 Ser/Thr Acetyl GPM Met Acetyl GPM Lys Acetyl GPM 1 MSL1 6 2 WIPF2 GSDMB 5 CASC3 Ser/Thr Phospho Unipro Ser/Thr Phospho GPM Tyr Phospho Uniprot Tyr Phospho GPMDB N-glyco Unipep New 1 ERBB2 C17orf37 GRB7 IKZF3 ZPBP2 PSMD3 GSDMA LOC72812 9 ORMDL3 GSDMB 1 CSF3 MED24 THRA NR1D1 MSL1 RARA CDC6 WIPF2 RAPGEFL 1 CASC3 Gene 1 Antibody found in HPA or Antibodypedia Number of pseudogenes No Antibody in HPA and Antibodypedia Gene 1 Number of hypothetical genes Integrating High-Throughput Measurements with the Phenotype: from Science to Medicine Acknowledgements Prostate Cancer Studies Arul Chinnaiyan (UM), Arun Sreekumar (Baylor) Epigenomics Maureen Sartor, Dana Dolinoy, Laura Rozek (UM) Peptide Atlas Terry Farrah, Eric Deutsch, Ruedi Aebersold, Rob Moritz (ISB/ETHZ) Protein Alternative Splice Variants Rajasree Menon, Anastasia Yocum, Yang Zhang (UM) Chromosome 17, Human Proteome Project (HPP) Bill Hancock, Michael Snyder, Ron Beavis