Download DiscoveryLink ?for data integration

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data Protection Act, 2012 wikipedia , lookup

Database wikipedia , lookup

Data center wikipedia , lookup

Data model wikipedia , lookup

Clusterpoint wikipedia , lookup

Data analysis wikipedia , lookup

3D optical data storage wikipedia , lookup

Data vault modeling wikipedia , lookup

Information privacy law wikipedia , lookup

Business intelligence wikipedia , lookup

Database model wikipedia , lookup

Transcript
IBM Global Services
IBM® DiscoveryLink® middleware helps integrate multiple, heterogeneous data sources into a single virtual database.
With one query, researchers can get one cohesive view of results for manipulation, comparison and analysis — while the
data itself remains unchanged in separate databases across different platforms.
With DiscoveryLink, researchers can collaborate more easily on unique searches. Businesses can protect their
technology investments and extend the value of legacy data sources. And IT professionals can add research functionality
without additional development of complex applications.
The communication between DiscoveryLink and a specific data source is handled by a software "wrapper." And now,
IBM has added new wrappers, wrapper toolkit APIs, enhanced performance and new functionality to this proven data
integration solution. In addition, a library of user-defined functions (UDFs) now provides specialized tools for data query
and analysis.
Key capabilities of DiscoveryLink
Provides expanded access to more data sources with new DB2 Information Integrator™ technology
Accesses multiple and specialized databases with a single query
Provides a single-format virtual database view of multiple heterogeneous data sources
Uses automatic query optimization to provide the most efficient query execution
Adds new data sources quickly and easily, keeping the original sources intact
Runs on multiple servers and a variety of operating systems
Key benefits of DiscoveryLink
Eliminates the need to write multiple point-to-point connections to data sources
Increases laboratory productivity and efficiency
Complements and extends existing data warehouse capabilities; eliminates the need to build query data
warehouses
DiscoveryLink ?for data integration
Enables collaborative research across companies
New DiscoveryLink components add greater functionality
The IBM DiscoveryLink solution delivers unified, single-query access to multiple, heterogeneous life sciences databases,
helping to accelerate the discovery process. With a single query to internal and external bioinformatics and
cheminformatics data sources, researchers can get one cohesive view of results for manipulation, comparison and
analysis – while the data itself remains unchanged in separate databases across different platforms.
The DiscoveryLink solution consists of robust middleware technology, delivered with the expertise of IBM Life Sciences
Consulting Services. Built on the new IBM DB2 Information Integrator middleware technology, the DiscoveryLink solution
provides enhanced functionality and access to more life sciences data sources. DB2 Information Integrator expands the
selection of DiscoveryLink "wrappers", which are the specially authored codes that allow the server to connect to
different types of data sources and retrieve a wide variety of information for use in a query.
New wrappers enable access and integration to NCBI Entrez, XML, ODBC, HMMER, and a broad array of public
bioinformatics data sources accessible through BioRS. Existing wrappers in DiscoveryLink also access and integrate
data in Oracle, Sybase, SQL Server, Microsoft Excel, Documentum, and BLAST.
DB2 Information Integrator database software also features integrated in-memory text search capabilities, integration of
internal and external data sources and applications, and integration of information from the Web. DB2 Information
Integrator makes access to relational databases transparent to the calling application, and while DB2 software is
embedded in the solution, existing databases do not need to be perturbed or migrated.
The Entrez wrapper provides direct and fast access to the key Pubmed and Nucleotide/Genbank data sources
available from NCBI.
The BLAST wrapper, through its significant enhancements in this new release, provides even more power for gene
and protein similarity searches using NCBI BLAST or TurboBLAST from Turboworx. DiscoveryLink provides a SQLbased "front-end" for the BLAST application, and easily integrates BLAST searches with other data sources.
The HMMER wrapper provides an SQL-based "front-end" to the HMMER application and eases the integration of
these HMMER searches with other data sources.
The XML wrapper provides SQL-based access to XML-based data sources that are becoming increasingly
prevalent in life sciences. The XML wrapper can augment or extend a data warehouse by providing a fast and easy way
to "shred" XML documents into DB2 or other relational databases.
2
DiscoveryLink ?for data integration
The BioRS wrapper provides access to a broad array of public bioinformatics data sources through the integration
with BioRS from Biomax, such as SwissProt, TrEMBL, and many more.
The ODBC wrapper provides access to a wide variety of additional relational data sources, many of which are
common in life sciences, including MySQL, MSAccess, Postgres and more.
The Extended Search wrapper provides the ability to integrate information from many unstructured data sources,
including Web sites, Lotus® Notes®, Sametime®, LDAP directories, text indexes, and much more.
The Teradata wrapper provides access to information residing in a Teradata data warehouse.
Specialized built-in DB2 user-defined functions for life sciences give researchers many valuable algorithms for
further analysis, such as conversion of an amino acid sequence into a nucleotide sequence (or the reverse), generalized
pattern matching (to identify areas of interest in an amino acid or peptide sequence), alignment of a protein sequence to
a genomic sequence, and FASTA parsing.
DB2 Control Center – with the new "Discover" feature — will allow fast and easy configuration of any data source
within DiscoveryLink, including XML, Oracle, Entrez and more.
The choice of leading research centers
San Diego Supercomputer Center (SDSC), with a staff of more than 400 scientists, software developers and research
support personnel, has served researchers from more than 350 institutions and 50 industrial partners since 1985.
Recently, IBM DiscoveryLink enabled SDSC to federate five distinct major bioinformatics databases from four different
platforms: Linux, Solaris, Windows® and AIX®, aggregating more than 600 tables -- helping ensure information is readily
accessible by both local and off-site users.
With DiscoveryLink, SDSC scientists were able to query across these databases, performing more comprehensive
analyses than ever before. For administrators, routine updates and other operations that were previously performed in
one database are now automatically captured in others -- freeing both scientists and administrators to focus on higher
priority projects, productivity and quality.
DiscoveryLink Flash demonstration
Flash demonstration: how DiscoveryLink can increase R&D productivity with single-query access to heterogeneous
databases.
3