Download Next Generation Genome Sequencing (NGS) Data Analysis Made Lightning-

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Next Generation Genome Sequencing
(NGS) Data Analysis Made LightningFast with SAP HANA
Enterprise Solutions
The Human Genome Project, an effort supported by a global consortium of scientists, demystified the human
genetic code by producing a comprehensive blueprint of a human being's genetic makeup.
Overview
Benefits
This groundbreaking discovery has given scientists the means to decipher
and analyze billions of DNA sequences to determine what specific genes do,
and gain the insight into how the body works to develop new treatments
and medicines. It has also resulted in rapid advances in Next Generation
Sequencing (NGS) technologies, producing a flood of data.
n
Reduces the total time taken for the analysis.
To allow researchers to analyze and mine genomic data with significantly
more speed, Tata Consultancy Services (TCS) offers an Accelerated NGS Data
Analysis Platform for building automated analysis pipelines powered by the
lightning-fast SAP HANA platform.
Deciphering and analyzing DNA sequences with NGS technologies is
a data-intensive process. Once genes are sequenced, the first order of
business is to conduct Genomic annotation. Annotation essentially
determines what a specific gene is designated to do.
With a baseline understanding for a particular gene or set of genes that are
considered healthy or normal, researchers can begin to gain insights into
“variations” or anomalies within DNA sequences. The analysis of DNA
variations, also referred to as variant calling, is helping researchers pinpoint
the genetic cause of human diseases, and in some cases, helping them to
develop specialized treatments and, potentially, personalized cures.
NGS technologies are helping medical providers and life sciences
organizations advance, as well as support new discoveries and innovations
that improve human health. But, they present one significant obstacle: NGS
analysis produces massive volumes of data—up to a terabyte for a single
DNA sample.
n
Automates clinical interpretations of patient/individual genetic variations
through automated annotation and reporting.
n
Enables scientists to identify new markers within a patient population.
n
Reduces querying lag when a researcher queries part of input data
or ancillary public/prior annotation data.
n
Reduces time delays in exploratory research through faster response.
Our Solution
TCS NGS Data Analysis accelerates read assembly, variant calling
and variant annotation
Working with the Center for Computational Biology at the University of
California at Berkeley, TCS has developed new methods for the rapid
interpretation of genome variation data.
The Next Generation Sequencing Analysis platform, built on SAP HANA,
automates parts of the read assembly, variant calling and variant annotation
processes using a pipeline-based approach to clean data and dramatically
speed up analysis processes.
Our solution includes:
Researchers must make many manual steps to assemble raw data
sequences, create DNA annotations and root out false negatives and errors
in variant call data—tedious processes that greatly slow the analysis.
Automated Read Assembly: The read assembler generates read assembly
maps of single patient samples as Storage Assembly Management (SAM)
files. The read-assembly process deploys an ultrafast read aligner to align
short DNA sequences, or reads, to the baseline data in the Human Genome.
The read-assembly process is run on high-speed Hadoop clusters, a stable,
open-source technology designed to manage large data sets.
The broad adoption of NGS in clinical research and diagnostics has
also been hampered by workflow challenges that become more
pronounced in clinical settings due, in part, to the increased number
of samples being processed.
Variant Calling: The variant calling process is enabled by ’R’, another stable,
open-source technology widely used among statisticians and data miners.
The platform comes with custom-developed R programs to analyze SAM
files and extract variations in the patient or individual genome sequence.
With TCS' Accelerated NGS Data Analysis Platform, researchers can
automate the process of establishing read assemblies, creating annotations
and rooting out false negatives in variant call data. They can then run these
automated processes on the very fast SAP HANA platform, allowing them to
analyze genetic variations in minutes, rather than days.
Variant Annotation: The variant annotation process maps variant calls
against all known conditions and diseases to define patterns and trends.
The annotation uses 11 different sources of data for this purpose, and we
are adding more data sources.
www.tcs.com
UI
Key Attributes of the Platform
Pipeline Engine
Data
Management
Module
Alignment
Module
Variant
Analysis
Module
Annotation
Module
n
In memory database for faster data
access and analysis
Expression
Analysis
Module
n
Faster query processing
n
HANA-R integration to empower
analysis possibilities
...
n
A workflow engine supporting end
to end pipelines
TCS NGS Analysis Platform
Hadoop
HADOOP technologies for
n
SAP HANA
SAP HANA Platform: The key to the solution is the lightning-fast
computing power made possible by SAP HANA’s in-memory database.
It consolidates and stores comprehensive read assembly data (SAM data)
and variant calls with annotation data, which is then accessed through
other programs and R scripts for further analysis and reporting. It allows
DNA testing and analysis that could take days, to be processed in minutes.
computational volumes
TCS' R&D labs, Technology Excellence Groups (TEG) and Process Excellence
Groups (PEG) constantly work in collaboration with project teams to
provide better solutions and deployments. Customers also benefit from
our Co-Innovation™(COIN) labs, which bring the best minds together to
create innovative solutions for complex business challenges.
SAP Business Objects Data Services: By integrating genomic data
into powerful business intelligence tools, such as SAP Business Objects,
researchers are able to gain detailed biological insights into their DNA data.
The TCS Advantage
As one of an elite group of SAP Global Solutions Partners, TCS delivers
solutions that address the strategic, tactical and operational aspects of
the life sciences supply chain required for product serialization, perfect
plant best practice, and transformative data management and analysis,
including SAP HANA. This powerful in-memory analytics solution is a key
enabler for life sciences, including pharmaceuticals and diagnostics using
NGS technologies to identify new markers that will be useful in clinical
trials and diagnostics.
With more than two decades of experience in working with global life
sciences companies in diverse geographies, TCS helps organizations in
their transformation journey, leveraging its people, platforms, products
and services across the value chain.
Contact
To learn more, contact [email protected]
About Tata Consultancy Services Ltd (TCS)
Tata Consultancy Services is an IT services, consulting and business
solutions organization that delivers real results to global business,
ensuring a level of certainty no other firm can match. TCS offers a
consulting-led, integrated portfolio of IT and IT-enabled infrastructure,
engineering and assurance services. This is delivered through its unique
Global Network Delivery ModelTM, recognized as the benchmark of
excellence in software development. A part of the Tata Group, India’s
largest industrial conglomerate, TCS has a global footprint and is listed
on the National Stock Exchange and Bombay Stock Exchange in India.
For more information, visit us at www.tcs.com
IT Services
Business Solutions
Consulting
All content / information present here is the exclusive property of Tata Consultancy Services Limited (TCS). The content / information contained here is correct at the time of publishing. No material from here may be copied, modified,
reproduced, republished, uploaded, transmitted, posted or distributed in any form without prior written permission from TCS. Unauthorized use of the content / information appearing here may violate copyright, trademark and other
applicable laws, and could result in criminal or civil penalties. Copyright © 2013 Tata Consultancy Services Limited
TCS Design Services I P I 08 I 13
Hadoop: A Hadoop distributed computing framework provides the
effective and easy to use MapReduce method in parallelization for many
bioinformatics data analysis algorithms. SAP HANA has already integrated
Hadoop in their Big Data Bundle, that makes it even more attractive for the
solution.