Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Next Generation Genome Sequencing (NGS) Data Analysis Made LightningFast with SAP HANA Enterprise Solutions The Human Genome Project, an effort supported by a global consortium of scientists, demystified the human genetic code by producing a comprehensive blueprint of a human being's genetic makeup. Overview Benefits This groundbreaking discovery has given scientists the means to decipher and analyze billions of DNA sequences to determine what specific genes do, and gain the insight into how the body works to develop new treatments and medicines. It has also resulted in rapid advances in Next Generation Sequencing (NGS) technologies, producing a flood of data. n Reduces the total time taken for the analysis. To allow researchers to analyze and mine genomic data with significantly more speed, Tata Consultancy Services (TCS) offers an Accelerated NGS Data Analysis Platform for building automated analysis pipelines powered by the lightning-fast SAP HANA platform. Deciphering and analyzing DNA sequences with NGS technologies is a data-intensive process. Once genes are sequenced, the first order of business is to conduct Genomic annotation. Annotation essentially determines what a specific gene is designated to do. With a baseline understanding for a particular gene or set of genes that are considered healthy or normal, researchers can begin to gain insights into “variations” or anomalies within DNA sequences. The analysis of DNA variations, also referred to as variant calling, is helping researchers pinpoint the genetic cause of human diseases, and in some cases, helping them to develop specialized treatments and, potentially, personalized cures. NGS technologies are helping medical providers and life sciences organizations advance, as well as support new discoveries and innovations that improve human health. But, they present one significant obstacle: NGS analysis produces massive volumes of data—up to a terabyte for a single DNA sample. n Automates clinical interpretations of patient/individual genetic variations through automated annotation and reporting. n Enables scientists to identify new markers within a patient population. n Reduces querying lag when a researcher queries part of input data or ancillary public/prior annotation data. n Reduces time delays in exploratory research through faster response. Our Solution TCS NGS Data Analysis accelerates read assembly, variant calling and variant annotation Working with the Center for Computational Biology at the University of California at Berkeley, TCS has developed new methods for the rapid interpretation of genome variation data. The Next Generation Sequencing Analysis platform, built on SAP HANA, automates parts of the read assembly, variant calling and variant annotation processes using a pipeline-based approach to clean data and dramatically speed up analysis processes. Our solution includes: Researchers must make many manual steps to assemble raw data sequences, create DNA annotations and root out false negatives and errors in variant call data—tedious processes that greatly slow the analysis. Automated Read Assembly: The read assembler generates read assembly maps of single patient samples as Storage Assembly Management (SAM) files. The read-assembly process deploys an ultrafast read aligner to align short DNA sequences, or reads, to the baseline data in the Human Genome. The read-assembly process is run on high-speed Hadoop clusters, a stable, open-source technology designed to manage large data sets. The broad adoption of NGS in clinical research and diagnostics has also been hampered by workflow challenges that become more pronounced in clinical settings due, in part, to the increased number of samples being processed. Variant Calling: The variant calling process is enabled by ’R’, another stable, open-source technology widely used among statisticians and data miners. The platform comes with custom-developed R programs to analyze SAM files and extract variations in the patient or individual genome sequence. With TCS' Accelerated NGS Data Analysis Platform, researchers can automate the process of establishing read assemblies, creating annotations and rooting out false negatives in variant call data. They can then run these automated processes on the very fast SAP HANA platform, allowing them to analyze genetic variations in minutes, rather than days. Variant Annotation: The variant annotation process maps variant calls against all known conditions and diseases to define patterns and trends. The annotation uses 11 different sources of data for this purpose, and we are adding more data sources. www.tcs.com UI Key Attributes of the Platform Pipeline Engine Data Management Module Alignment Module Variant Analysis Module Annotation Module n In memory database for faster data access and analysis Expression Analysis Module n Faster query processing n HANA-R integration to empower analysis possibilities ... n A workflow engine supporting end to end pipelines TCS NGS Analysis Platform Hadoop HADOOP technologies for n SAP HANA SAP HANA Platform: The key to the solution is the lightning-fast computing power made possible by SAP HANA’s in-memory database. It consolidates and stores comprehensive read assembly data (SAM data) and variant calls with annotation data, which is then accessed through other programs and R scripts for further analysis and reporting. It allows DNA testing and analysis that could take days, to be processed in minutes. computational volumes TCS' R&D labs, Technology Excellence Groups (TEG) and Process Excellence Groups (PEG) constantly work in collaboration with project teams to provide better solutions and deployments. Customers also benefit from our Co-Innovation™(COIN) labs, which bring the best minds together to create innovative solutions for complex business challenges. SAP Business Objects Data Services: By integrating genomic data into powerful business intelligence tools, such as SAP Business Objects, researchers are able to gain detailed biological insights into their DNA data. The TCS Advantage As one of an elite group of SAP Global Solutions Partners, TCS delivers solutions that address the strategic, tactical and operational aspects of the life sciences supply chain required for product serialization, perfect plant best practice, and transformative data management and analysis, including SAP HANA. This powerful in-memory analytics solution is a key enabler for life sciences, including pharmaceuticals and diagnostics using NGS technologies to identify new markers that will be useful in clinical trials and diagnostics. With more than two decades of experience in working with global life sciences companies in diverse geographies, TCS helps organizations in their transformation journey, leveraging its people, platforms, products and services across the value chain. Contact To learn more, contact [email protected] About Tata Consultancy Services Ltd (TCS) Tata Consultancy Services is an IT services, consulting and business solutions organization that delivers real results to global business, ensuring a level of certainty no other firm can match. TCS offers a consulting-led, integrated portfolio of IT and IT-enabled infrastructure, engineering and assurance services. This is delivered through its unique Global Network Delivery ModelTM, recognized as the benchmark of excellence in software development. A part of the Tata Group, India’s largest industrial conglomerate, TCS has a global footprint and is listed on the National Stock Exchange and Bombay Stock Exchange in India. For more information, visit us at www.tcs.com IT Services Business Solutions Consulting All content / information present here is the exclusive property of Tata Consultancy Services Limited (TCS). The content / information contained here is correct at the time of publishing. No material from here may be copied, modified, reproduced, republished, uploaded, transmitted, posted or distributed in any form without prior written permission from TCS. Unauthorized use of the content / information appearing here may violate copyright, trademark and other applicable laws, and could result in criminal or civil penalties. Copyright © 2013 Tata Consultancy Services Limited TCS Design Services I P I 08 I 13 Hadoop: A Hadoop distributed computing framework provides the effective and easy to use MapReduce method in parallelization for many bioinformatics data analysis algorithms. SAP HANA has already integrated Hadoop in their Big Data Bundle, that makes it even more attractive for the solution.