Download Summary of Maximizing the Value of NGS and Gene

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Therapeutic gene modulation wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Metagenomics wikipedia , lookup

Gene expression profiling wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
Maximizing the Value of NGS and
Gene Expression Experiments (Synopsis):
Strategies to Streamline Data Analysis and Interpretation
for Actionable Research Outcomes
HIGHLIGHTS
INTRODUCTION
• Research projects require large sums of money to produce data, but the
Today, scientists are able to leverage high-throughput techniques such micro-
investment and funds decrease once the data needs to be analyzed. Data
arrays, proteomics or NGS to measure levels for nearly every mRNA, protein or
analysis becomes the greatest roadblock in research due to the amount of
DNA sequence variation, and easily produce tens or hundreds of thousands
data now being produced and the time it takes to analyze the data. Sci-
of data points. The resulting data analysis is exponentially more complicated
entists are looking for the most effective, cheapest ways to streamline
and time consuming, and scientists grapple with the daunting task of analyz-
their gene expression data analysis in the shortest amount of time. IPA is
ing these data to determine what actually occurred in the experiment. Gaining
the industry leading gene expression analysis software enabling labs to
the full value from the experiment demands thorough biological interpretation
quickly narrow in on relevant information and examine data with biologi-
to understand cause and effect. Scientists often look for “upstream” regulatory
cal references. In this paper, we will discuss how IPA (commercial product)
molecules such as transcription factors or microRNAs that may be responsible
compares to open-source tools and the ROI (return on investment).
for gene expression changes observed in the experiment. To fully understand
the effects of the experimental results, scientists must analyze the data for
• The ROI is compared by calculating the time to analyze RNA-seq and Microarray data sets between commercial products (IPA) and open source tools.
molecular pathways, biological functions, known toxicities and identify particular gene(s) for further research (i.e. candidate targets or biomarkers).
• Gene expression datasets were analyzed to calculate the time required to
Historically, scientists could rely on their own expertise and published literature
complete the three most common scientific tasks – (1) research an unfa-
to perform simple analysis tasks. With the proliferation of published literature,
miliar gene to create experimental hypotheses, (2) analyze gene expression
this becomes increasingly challenging and results in an analysis and informa-
data (transcription factors, pathway and biological function effects) and (3)
tion bottleneck. Today, scientists can turn to internet-based software tools,
identify target genes regulated by microRNAs.
including informational websites (i.e. PubMed) and open-source or commercial
analysis software (i.e. DAVID, Ingenuity-IPA) for help interpreting data. Conduct-
• The time savings to do these common analyses between the products corresponds to the ROI.
ing literature searches, reading as many papers as possible and investigating
“top” genes with the highest changes in expression are common practices used
to interpret high throughput data and design hypotheses for next experiments.
• Not only did IPA outperform open source tools in the time it took to analyze
Unfortunately, the chance of missing a critical facet of biology is increased
gene expression data, but IPA also exceeded analysis capabilities. IPA was
greatly, simply due to the abundance of available information in molecular
found to save over 30 hours per data analysis over open-source tools. Also,
databases and publications coupled with the amount of experimental data.
with the powerful combination of the Ingenuity Knowledge Base and IPA,
This complexity is further increased with RNA-sequencing data that provides
researchers gain a deeper understanding of the underlying biology of
a more precise measurement of the level of transcripts and isoforms over
experimental systems and models.
microarray technologies. Tools that help understand the data from different
perspectives, bring different sources of information together and allow the
• Scientists who use IPA can now go into greater, deeper biological detail,
have access to scientific findings and perform more frequent analyses. Sci-
scientist flexibility to explore avenues of interest are critical to understanding
experimental results .
entists can generate more informed decisions on the next steps to take in
their research studies.
In this study, we compared a commercially available tool, IPA, with several
open-source tools that are commonly used for analysis of ‘omics data from high
throughput experiments for time investment and effectiveness of results. Three
representative tasks were performed with the goal of determining which was
the most efficient analysis for biological interpretation: 1) Research an unfa-
IPA analyzing this data which equates to 1.6 weeks per year per person. For the
miliar gene 2) Analyze gene expression data to identify regulatory transcription
simplistic test analysis conducted in this business brief, IPA was found to exceed
factors, pathway and biological function effects and 3) Find target genes regu-
the capabilities of the combined use of 3 open-source tools (DAVID, PathVisio
lated by microRNAs in a data set. Time to completion and information gained
and PubMed) saving almost 30 hours for a single data file and analysis (Figure 5).
were recorded, including possible benefits or liabilities of a particular method.
Consolidated tools with efficient workflows, such as the IPA and DAVID pro-
This is a 60% time savings per person per analysis over other tools. Using a fully
vided the highest ROI, compared to using search websites such as PubMed
loaded hourly rate of $100/hour the resulting savings was extrapolated to the
or miRBase. In addition, this exercise identified key drivers to consider with any
number of data files uploaded per average IPA users to 600 hours (30hours x
biological data analysis solution purchase.
20 datasets analyzed) over the course of a year. This equates to $60,000/year
savings per person. Taking a conservative approach to the cost of an annual
DISCUSSION
subscription for IPA, the average cost per user was determined from commercial account licensing fees. The resulting return on investment was determined
DETERMINING THE RETURN ON INVESTMENT (ROI)
to be 253% for one year.
The goal of this study was to quantitate the time required for analyzing
gene expression data resulting from microarray or RNA-sequencing experi-
To maximize the value of the large-scale gene expression experiments,
ments using both commercial and open-source tools. Based on the analysis
researchers must understand their data from a biological perspective. IPA helps
times, time savings could be determined and a Return on Investment could
scientists understand the biology most relevant to their experimental results
be calculated. For purposes of this study the ROI compares the net benefits
and generate more confident hypothesis. We have identified five key driv-
per scientist of implementing the software, versus its total cost per scientist.
ers important for a biological data analysis solution to drive value and ensure
The ROI is calculated from the net benefits divided by the software costs and
successful ROI . In addition, indirect benefits can be realized through success-
expressed as a percentage. The ROI was calculated for a single year as IPA is an
ful implementation of these drivers; such as a deeper understanding of the
annual subscription.
underlying biology of experimental systems and models, an increased level of
experimental confidence, an improvement in researcher’s ability to prioritize
To determine the return on investment for a single year of IPA the average
work and make decisions, and an overall improvement in innovation among
number of datasets uploaded and analyzed and the average amount of time
research teams.
a user spends in IPA was collected from the production system logs. IPA is the
market leading commercial pathway analysis tool and maintains a significant
The five key drivers important for a biological data analysis solution:
user base for quantifying utilization on the system by the average user. In 2011
the average IPA user uploaded and analyzed 20 data files from gene expression,
1. Analysis: Elucidate cause and effect of observed gene expression changes.
metabolomics and proteomic experiments. This average user spent 62 hours in
The ability to predict the activation state of upstream causes of gene
A. IPA Data Analysis Workflow
Upload
Data
Run Core
Analysis
Pathways
(overlay)
Functional
Effects
Transcription
Regulators
Research Genes
of Interest
Save,
Export
IPA
Analysis IPA
Time 32.65 min.
Review Findings (research 10 genes)
2.33 h (average/gene) X 10
20.87
Hours
B. Open-Source Data Analysis Wokflow*
*Direction of effect and transcription regulators prediction not included.
Upload
Data
Run
Analysis
Analysis
Time
Functions
Analysis*
DAVID
26.47 min.
Save,
Export
Upload
Data
Path Visio (10 pathways)
8.32 min. X 10
Pathway
(overlay)
Save,
Export
PubMed (research 10 genes)
4.75 h (average/gene) X 10
Research Genes
of Interest
>49.29
Hours
Figure 1. Comparison of IPA and open-source data analysis workflows. A) Conservative estimate of time for IPA data analysis is 20.87 h., including high value functional effects
and transcription regulator prediction. B) Open-source workflow requires 3 tools to conduct a complete analysis including functions and pathways, view pathways with data overlay,
and follow-up with manual gene research to interpret effects. This analysis workflow still lacks significant benefits only available in IPA including predicted activation state of upstream
transcription factors, the microRNA-mRNA Target Filter, and directional downstream effects.
expression changes including transcription factors, microRNA, and other
and known molecular relationships, and identifying upstream causes of those
molecules that are upstream of the genes in a dataset is key to a bet-
expression changes and the downstream effects on biological processes and
ter understanding of the biological system and impact of the experiment.
disease, provides a faster and more reliable, replicable way to identify key
insights from complex data. Using a commercial tool such as IPA leads to a
2. Analysis: Enable a systems biology approach through network exploration.
Using an iterative exploratory approach enables a deeper understanding of
savings of over 30 hours per data analysis which can mean optimization of
resources leading to new testable hypotheses in a shorter amount of time.
the biological system being studied. Tools need to accommodate multiple
data types such as microRNA, metabolomics and proteomics in addition to
When considering an analysis strategy identifying the best tools which give
gene expression data. Scientists can then use tools to generate networks
the highest return on investment is critical to maximizing the value of each
and further explore the biology associated with the data such as build sec-
experiment and investment on reagents and instrumentation. This paper
ond messenger cascades, identify clinically validated biomarkers associated
systematically described and calculated the time required for three critical
with the data, or determine what pathways are significantly impacted by
steps within the biological analysis of microarray data resulting in a signifi-
selected molecules.
cant ROI of 253% for the commercial tool IPA. In addition, five key drivers
were outlined to ensure successful environment to achieve the maximum
3. Platform: Support for cutting edge research. The research community
ROI and other benefits.
is fast paced especially with the advent of next-generation sequencing
technologies and the race to better understand the biology associated
full version
with the data. The best analysis solutions will be at the forefront of these
technologies, embracing new ways to interpret data from these com-
This is a synopsis of the paper Maximizing the Value of NGS and Gene Expression
plex studies. Comprehensive tools should aide researchers using RNA-Seq
Experiments (Synopsis): Strategies to Streamline Data Analysis and Interpre-
in understanding experimental results at the isoform level and provide the
tation for Actionable Research Outcomes. To access the unabridged version
ability to visualize specific biology associated with splice-variants and their
please go to: http://www.ingenuity.com/products/ipa#/?tab=resources
impacted protein domains. In addition, look for a comprehensive tool to
identify and prioritize microRNA-mRNA target pairings by biological con-
ACKNOWLEDGEMENTS
text such as pathways or disease.
The authors are grateful to the many scientists at EBI, Ingenuity, NIH-NCBI, Stan4. Integration: Time cost of integrating all of the results. Even if you use
multiple pieces of software to get different types of insights, it takes an
extremely long time to integrate and interpret the results, since other software is usually designed to answer a single question about your data, and
not designed for integration with other types of information.
5. Content: Comprehensive and timely quality content. The quality and
timeliness of the content in the database supporting the analytical tools
is critical. In addition, how the manually extracted facts are organized
is crucial to enable computation and inferencing, semantic and linguistic consistency, and directional predictions. Comprehensive content can
be incorporated from published literature and third party databases for
maximum coverage of biological and chemical interactions, functional
annotations, protein domains, biomarkers, mutations, and microRNAmRNA relationships to name a few. Manual review by experts to ensure
the content is accurate and detailed is key to providing confidence in the
information and resulting analytics.
CONCLUSION
Researchers and laboratories invest thousands of dollars on instrumentation to
produce data, but that investment can be lost or misguided if the data analysis
is lacking or haphazard. For the large amounts of data produced from microarray or RNA-Seq experiments, biological interpretation aided by software such
as IPA is key to enabling scientists to quickly narrow in on relevant information
and examine data within a consistent set of biological references. Examining
the results from an RNA-Seq or microarray dataset in the context of established
ford Labs for working to provide gene, microRNA, and protein data bases and
analysis tools that help solve research problems.
endnotes
1. Shendure J and Hanlee J. 2008. Next-generation DNA Sequencing. Nat.
Biotechnol. 26:1135-145.
2. Fuller CW, et al. 2009. The challenges of sequencing by synthesis. Nat. Biotechnol.
27:1013-1023.
3. Metzker ML. 2010. Sequencing technologies – the next generation. Nat. Rev.
Genet. 11:31-46.
4. NCBI PubMed. http://www.ncbi.nlm.nih.gov/pubmed/
5. Huang DW, Sherman BT, Lempicki RA. 2009. Systematic and integrative
analysis of large gene lists using DAVID Bioinformatics Resources. Nature Protoc.
4(1):44-57.
6. Huang DW, Sherman BT, Lempicki RA. 2009. Bioinformatics enrichment tools:
paths toward the comprehensive functional analysis of large gene lists. Nucleic
Acids Res. 37(1):1-13.
7. Ingenuity Systems, Redwood City, CA. http://www.ingenuity.com.
8. Kelder T, Conklin BR, Evelo CT, Pico AR. 2010. Finding the Right Questions:
Exploratory Pathway Analysis to Enhance Biological Discovery in Large Datasets.
PLoS Biol 8(8): e1000472. Doi:10.1371/journal.pbio.1000472
9. This 2004 white paper (Life Science Insights, an IDC company) concluded that
IPA has a significant ROI for organizations in terms of productivity, cost savings,
and innovation. Paper is available at http://www.ingenuity.com/products/ROI_
IDC_LSI_7_04.pdf.
Ingenuity Systems, Inc.
1700 Seaport Blvd. Third Floor
Redwood City, CA 94063
© 2013
Tel. +1 650 381 5100
Fax. +1 650 381 5190
Ingenuity Systems, Inc. All Rights Reserved.
[email protected]
www.ingenuity.com