Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Oracle Semantic Technologies V1.0 Shintaro Nagaoka Presales Oracle Netherlands January, 2016 Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Oracle Confidential – Oracle Spatial and Graph and Big Data Spatial and Graph: Graph Overview Speaker: Bill Beauregard Senior Principal Product Manager, Oracle 3 Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Graph Data Model B C • What is a graph? – A set of links and nodes (and optionally attributes) – A graph is simply linked data A • Why do we care? D F E – Rise in Commercial use of Big Data • Web log files, Twitter feeds, sensor readings, Internet of Things • Cyber networks, power grids, protein interaction graphs • Knowledge graphs (IBM Watson, Apple SIRI, Google Knowledge Graph) – Graphs are intuitive and flexible • Easy to navigate, easy to form a path, natural to visualize • Do not require a predefined schema Copyright © 2015 Oracle and/or its affiliates. All rights reserved. 4 Oracle’s Graph Database Strategy Support Graph Data Types… …On all enterprise platforms • Oracle Database • Add graph analytics to applications, tools, and information technology platforms • Deliver a scalable, secure, and high performing product • Simplify development with integrated graph analysis, APIs and services • Cloudera with Apache Hadoop • Oracle NoSQL Database • Oracle Big Data Appliance • Oracle Exadata Database Machine • Oracle Cloud Copyright © 2015 Oracle and/or its affiliates. All rights reserved. 5 3 Graph Models / 3 Domain Use Cases Use Case Spatial Network Analysis Graph Model Industry Domain Network Data Model • Network path analysis • Multi-model modeling RDF Data Model Linked Data / Semantic Mediation • Data federation • Knowledge representation • Master Metadata Mgmt Property Graph Model Social Network Analysis • Graph Search & Analysis • Big Data analytics • Entity analytics Logistics Transportation Utilities Telcoms Life Sciences Finance Publishing Public Sector Copyright © 2015 Oracle and/or its affiliates. All rights reserved. National Intelligence Public Safety Social Media search Marketing - Sentiment 6 Oracle Spatial and Graph RDF Semantic Graph Overview Copyright © 2015 Oracle and/or its affiliates. All rights reserved. 7 “Semantic technologies include software standards and methodologies that are aimed at providing more explicit meaning from the information that’s at our disposal.” • The CIO’s Guide to Semantics • Dave McComb, Semantic Arts, Inc. • Standards defined by W3C & OGC – – RDF, RDF/S, OWL, SKOS SPARQL, RDFa, RDB2RDF, R2RML GeoSPARQL • RDF embeds semantics in the data Copyright © 2015 Oracle and/or its affiliates. All rights reserved. 8 Fundamental Concepts and “building blocks” 1) Anything can be described by its unique relationship to something else – John Smith Is At OpenWorld Subject Relationship Item – OpenWorld Is In San Francisco John Smith Is At Openworld – Seema Is Presenter of OOW Semantic Session Openworld Is In San Francisco Oracle Has A Conference Called Openworld Seema Rao Works At Oracle Seema Rao Is Presenter of OOW Semantic Session John Smith Is Registered for OOW Semantic Session OOW Semantic Session Is Held 10/6/11, 12:00 Noon – This is called a “triple” – Uniqueness in the triple is enforced by the inclusion of a URI Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Fundamental Concepts and “building blocks” 2) Implied relationships can be found in the data using rules This is called “inferencing” RULE: 1. OOW is the same as Openworld “John and Seema were in San Francisco on 10/6/11” Derived ( inferred ) information Subject Relationship Item Openworld Is In San Francisco Openworld Has A Session Called OOW Semantic Session Seema Rao Is Presenter of OOW Semantic Session John Smith Is Registered for OOW Semantic Session OOW Semantic Session Is Held 10/6/11, 12:00 Noon Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Fundamental Concepts and “building blocks” 3) Standard sets of related concepts can be stored to describe relationships and referenced to enhance query and discovery This is called an “ontology” Type of Relationship What you evaluate What you compare Opposite/Inverse Relationship Lends to Businesses and related parties Businesses Borrows from Owns Institutions and related parties Institutions Is owned by Now known as Corporate names and symbols Corporate names Previously known as Operates in Geographic hierarchy Geographic name No presence in -- Holding companies own banks banks lend to other institutions … Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Fundamental Concepts and “building blocks” 4) Conceptually, Semantic applications look at things as being represented as graphs, rather than tables Type of Relationship What you evaluate What you compare Opposite/Inverse Relationship Lends to Businesses and related parties Businesses Borrows from Owns Institutions and related parties Institutions Is owned by Now known as Corporate names and symbols Corporate names Previously known as Geographic hierarchy Geographic name Operates in Wells Fargo Is owned by No presence in Now known as Wachovia Norwest In Oracle Database, we use Triples and Key relationships to represent nodes and links in the Graph. Now known as Now known as Core States Copyright © 2015 Oracle and/or its affiliates. All rights reserved. First Union Now known as Crocker National Now known as First Nat Bank of Philadelphia Fundamental Concepts and “building blocks” 5) Querying is based on graphs Prime_M Ex: Find sub-prime mortgage exposure for “Wells Fargo” bank… AutoLoan RMBS Sub-prime_M Sub-prime M Is type of MortgageLoan CDOs SecuredLoan Is type of SELECT SUM (?subprime_amount) AS exposure WHERE {?loan_instance rdf:type :mortgage_loan ?lending_institution rdfs :subclassOf :wells_fargo ?loan_instance :subprime _loan ?subprime_amount 2/28 ARM Lender and Lending Institution are same LoanProducts Is seller of Wells Fargo Lender Lending_Institution Is name of Is owned by Now known as Wachovia Norwest Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Is name of Wells Fargo Is name of JPMC BofA Recap: Key ideas • Based on fundamentally different Open World Assumption – What is unknown is undefined (not false) - that supports discovery • Schema are flexible, evolving, can’t be known in advance – Rich, real world relationships are modeled in the data • Every data element is uniquely identified - supports integration – Data & relationships are machine-readable • Pattern query language supports discovery workflows Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Two Application Use Cases For RDF Semantic Graph Linked Data • Unified metadata model for distributed data sources Entity Analytics SPARQL pattern matching • Flexible model for sparse and evolving data Detecting related entities across large, sparse, disparate collections of data • Validate semantic and structural consistency Inferencing: Applying rules on asserted data Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Semantic Technologies Partners Integrated Tools and Solution Providers: Ontology Engineering Reasoners Open Source Frameworks Joseki NLP Entity Extractors Standards Sesame Applications SI / Consulting Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Linked Data: Industry Adoption Industries • Life Sciences • Finance • Media Hutchinson 3G Austria • Networks & Communications • Defense & Intelligence • Police Copyright © 2015 Oracle and/or its affiliates. All rights reserved. 17 Novartis Institutes for BioMedical Research (NIBR) Business Challenge • Link database information on genes, proteins, metabolic pathways, compounds, ligands, etc. to original sources. • Increase productivity for accessing, sharing, searching, navigating, cross-linking, analyzing internal /external data Solution • Semantic integration layer on RDF graph • Rich domain-specific terminology (biology, chemistry and medicine) 1.6 M terms • Terminology Hub: 8 GB of referential data that cross-references between data repositories. Copyright © 2015 Oracle and/or its affiliates. All rights reserved. 18 EU Publications Office Linked Metadata Platform for European Union Objectives Common metadata model supports: Search and discovery of EU Publications Multiple domains and languages Solution Validate and tag EU law, tenders, and publicity to standardized vocabularies Unified RDF graph metadata model Supports discovery of content through user’s terminology and language Provides variety of dissemination modes Copyright © 2015 Oracle and/or its affiliates. All rights reserved. 19 Object-based Intelligence/Entity Analytics Extracted Entities & Relationships Information Extraction Country: UK Nationality: Somalian Feature Extraction, Term Extraction Has Currently resides Group: Al Shabab Country: Morocco Person: Abduwali Abdukhadir Muse Member of Search, Presentation, Report, Visualization, Query Link ? Supports Currently resides Person: Chehab Abdouljamid Bouyaly Link ? Ideology: Islamist Member of Supports Group: ? Person: ? Member of Has Group: al Qaeda Currently resides Country: Pakistan Nationality: Pakistani RDF Intelligence Ontologies SQL/SPARQL Enterprise Data Spatial images Documents Data Sources Contents Repository Databases Web resources Blogs, Mails, news, RSS feeds National Intelligence Scenario Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Oracle Database 12c Spatial and Graph Tooling Transaction Systems Unstructured Content RSS, email Other Data Formats Transform & Load, Query Applications & Modeling Tools & Inference Analysis Tools Relational2RDF Support for Protégé Support for Apache Jena Natural Language Processing Extraction (partners) • RDF/OWL Data Management • SQL & SPARQL Query • OWL Inferencing • Semantic Rules • Scalability & Security • Semantic Indexing Data Sources • Java, HTTP access • JSON, XML output • Graph visualization (Cytoscape) • Oracle Advanced Analytics (R, Mining) • Oracle Business Intelligence (OBIEE) • Map (GIS) Visualization Oracle Database 12c Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Oracle Database 12c RDF Semantic Graph Database • Compression & partitioning • Parallel load, inference, query • High availability • Label security: triple-level • W3C standards compliance • Semantic Indexing of text • Enterprise Manager • Support for Open Source • Development framework, ontology editing, visualization • Exadata ready Load / Storage Query Reasoning Analytics • Native RDF graph data store • Manages billions of triples • Optimized storage architecture • RDF Views on Relational Data • SPARQL-Jena/Joseki • SQL/graph query, B-tree indexing • Ontology assisted SQL query • RDFS, OWL2 RL, EL, SKOS • User-defined rules • Incremental, parallel reasoning • User-defined inferencing • Plug-in architecture • Semantic indexing framework • Integration with • OBIEE, Oracle R Enterprise • Oracle Data Mining Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Manageability of RDF Semantic Graph Built in support from Oracle Database utilities and tools Ingest / Replicate / Recover Tune / Analyze Bulk load: Tune load/ query/ inference: • Apache Jena bulk loader • Oracle external tables & • SQL*Loader (Direct Path) w/ PL/SQL Bulk Load API • Parallelism • Btree indexing triple/quad • Typed literals indexing • SPARQL query hints • Statistics gathering • Dynamic Sampling Replicate & recover: • Data Guard: physical standby • Data Pump: staging tables • Recovery Manager: RMAN Analyze performance: • Enterprise Manager: view optimizer plans, monitor execution / resource usage Manage Control query execution: • in database & Jena client Create & monitor graph w/ SQL Developer: • Semantic Network • Models, virtual models • Btree indexes • Rule bases • Entailments • Security data labels • Semantic index policies Copyright © 2015 Oracle and/or its affiliates. All rights reserved. 23 World’s Fastest Big Data Graph Benchmark 1 Trillion Triple RDF Benchmark with Oracle Spatial and Graph • World’s fastest data loading performance Oracle Database 12c can load, query and inference millions of RDF graph edges per second • World’s fastest query performance • Worlds fastest inference performance • Massive scalability: 1.08 trillion edges Millions of triples per second 2.00 1.42 1.50 • Platform: Oracle Exadata X4-2 Database Machine 1.00 • Source: w3.org/wiki/LargeTripleStores, 9/26/2014 0.50 1.52 1.13 0.00 Query Load Inference Copyright © 2015 Oracle and/or its affiliates. All rights reserved. 24 Oracle RDBMS with RDF Graph Similar to Spatial. Combining the strength of relational and object-relational approach • Allows meta-data storage • Repository ( created post installation ) • Operators and functions • Storage of common RDF objects in the relational tables • SQL and SPARQL support • Leveraging all the traditional Oracle RDBMS Strength – Security, availability, scalability, manageability • Exadata Ready Copyright © 2015 Oracle and/or its affiliates. All rights reserved. 25 Many Known Graphs and Vocabularies on the Web • DBPedia • Wordnet • Semanitc XBRL • SIOC • Drug Bank • US Census • NCI • ACM • YAGO • SNOMED • Daily Med • Cyc/Open Cyc • FOAF • Linked CT • PubMed • Geonames • Eurostat • Freebase • CIA World Fact Book • KEGG • Gene Ontology • DBLP • Data.gov.uk • UniRef • UniProt • Music Brainz Data • Smart Link • UniParc • Semantic Tweet • Reactome • CiteSeer • CO2 Emission • Diseasome And so much more ! Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Oracle Big Data Spatial and Graph Property Graph Overview Copyright © 2015 Oracle and/or its affiliates. All rights reserved. 30 Oracle Property Graph Data Model • A set of vertices (or nodes) – – – each vertex has a unique identifier. each vertex has a set of in/out edges. each vertex has a collection of key-value properties. • A set of edges – – – – each edge has a unique identifier. each edge has a head/tail vertex. each edge has a label denoting type of relationship between two vertices. each edge has a collection of key-value properties. • Blueprints Java APIs • Implementations • Oracle, Neo4j, DataStax(Titan), Spark GraphX, Dato GraphLab Create, InfiniteGraph, Dex, Sail, MongoDB … • A property graph can be modeled as an RDF Graph https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model Copyright © 2015 Oracle and/or its affiliates. All rights reserved. 31 RDF Graph v. Property Graph RDF Semantic Graphs • Use Case: – Linked data, semantic metadata layer • Analytics: – pattern matching, Inferencing Property Graph • Use Case: –Social network analysis • Analytics: –Clustering, centrality, page rank, path finding Copyright © 2015 Oracle and/or its affiliates. All rights reserved. 32 Common (Property) Graph Analysis Use Cases Recommend the most similar item purchased by similar people Product Recommendation Find out people that are central in the given network – e.g. influencer marketing Influencer Identification Identify group of people that are close to each other – e.g. target group marketing Community Detection Find out all the sets of entities that match to the given pattern – e.g. fraud detection Graph Pattern Matching customeritems Purchase Record Communication Stream (e.g. tweets) Copyright © 2015 Oracle and/or its affiliates. All rights reserved. 33 CyberSecurity Modeling / Internet of Things (IoT) •Property graph model •Dynamic construction of IP network •The graph includes metadata as well as events/enriched data •Extensible by other data source (add properties, relations) •Search – Text search on graph DB proprieties Copyright © 2015 Oracle and/or its affiliates. All rights reserved. 35 Oracle Property Graph Overview • Massively-Scalable Graph Database Detecting Components and Communities Ranking/Walking – Scales securely to trillions edges – Optimized & Secure schemas: • Apache Hbase, NoSQL Database – Parallel Loading – Support open & Oracle optimized file formats • GML, GraphML, GraphSON, Oracle • In-Memory Analyst Evaluating Communities Path-Finding – 35 built-in parallel graph analysis algorithms – Flexible deployment: Embedded, Remote, YARN • Simple interfaces ∑ ∑ – Java: Tinkerpop: Blueprints, Gremlin, Rexster – Search: Apache Lucene and SolrCloud – Scripting languages: Groovy, Python… Copyright © 2015 Oracle and/or its affiliates. All rights reserved. 36 Architecture for Property Graph Support Graph Analytics Parallel In-memory Analytic Engine Apache Lucene and/or Apache Solr (SolrCloud) Graph Data Access Layer (APIs) Java APIs REST/Groovy Text Search Scalable and Persistent Storage Apache HBase Oracle NoSQL Database Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Graph Graphformats Model and format Property Graph Formats RDF (RDF/XML, NGraphML, Triples, N-Quads, GML, TriG,N3,JSON) Graph-SON, Flat Files Key Features: In-Memory Analyst Built-in Algorithms and Graph Mutation A rich set of built-in, parallel algorithms Parallel graph mutation operationsCreate Undirected Graph Detecting Components and Communities Tarjan’s, Kosaraju’s, Weakly Connected Components, Label Propagation (w/ variants), Spasification Evaluating Community Structures ∑ ∑ Conductance, Modularity Clustering Coefficient (Triangle Counting) Ranking and Walking Pagerank, Personalized Pagerank, Betwenness Centrality (w/ variants), Closeness Centrality, Degree Centrality, Eigenvector Centrality, HITS, Random walking and sampling (w/ variants) Path-Finding Hop-Distance (BFS) Dijkstra’s, Bi-directional Dijkstra’s Bellman-Ford’s a f d Left Set: “a,b,e” d a g f g b b d e e h g i b i c e c h i c Create Bipartite Graph a d g b f Sort-By-Degree (Renumbering) d e g b i Link Prediction a The original graph SALSA (Twitter’s Who-to-follow) e b Other Classics Vertex Cover d i a f c g e h h c Filtered Subgraph Copyright © 2015 Oracle and/or its affiliates. All rights reserved. i Simplify Graph 38 Features: Support Big Data SQL Apache Hive External table Oracle RDBMS SQL based aggregation and analytics Apache HBase Copyright © 2015 Oracle and/or its affiliates. All rights reserved. 39 Performance on Oracle NoSQL Database Oracle NoSQL Database on a 6-Node BDA cluster (128GB RAM/node ) 2+ billion edges Loading Time of LiveJ Graph Execution Time of Basic Operation (ms) 1000000 80 60 50 40 30 Time (secs) 70 100000 10000 20 10 0 1000 1 2 4 8 16 32 DOP Copyright © 2015 Oracle and/or its affiliates. All rights reserved. 40 Performance on Apache HBase Apache HBase on a 6-Node BDA cluster (128GB RAM/node) 5+ billion edges Loading performance Time (min) 310 146 100 71 Time (secs) 5000 1000 1027 541 500 37 21 16 14 13 279 149 93 10 75 76 GetEdgesPart itioned: splitsPerRegi on=1, splits=24 50 1 1 1 2 4 8 16 24 32 48 2 4 8 16 24 32 Degree of Parallelism (DOP) Degree of Parallelism Copyright © 2015 Oracle and/or its affiliates. All rights reserved. 41 1000 100 10 1 0.1 1000 100 10 1 Execution Time (secs) Spark (16) Web Eigenvector Centrality 10000 1000 100 10 1 0.1 Spark (4) Spark (2) Oracle Twitter Spark (16) Spark (8) Spark (4) Web Spark (2) Oracle Spark (16) Spark (8) Spark (4) Spark (2) Oracle Spark (16) Spark (8) Spark (4) Spark (2) Spark (8) 1 Spark (16) 10 Spark (8) 100 Spark (4) Spark (2) Oracle Spark (16) Spark (8) Spark (4) Spark (2) Oracle Spark (16) Execution Time (secs) Single-Source Shortest Path Twitter 1000 Twitter 10000 Web Hop-Dist 10000 Oracle – CPU: Intel “Sandy Bridge”, Xeon E5-2660, 2.20 GHz, 8 Cores (x 2 HT) – Memory: 256 GB (DDR3 – 1600) – SSD: 3 x 256 GB (combination of OCZ Vertex 4 and Samsung 840 Pro) – Network Card: Mellanox Connect-IB (InfiniBand Adapter) – Switch: Mellanox SX6512 (InfiniBand Switch) Spark (8) Twitter Spark (4) Spark (2) Oracle Spark (16) Spark (8) Spark (4) Spark (2) Oracle • Environment: homogeneous computing cluster: Pagerank 10000 Execution Time (secs) • Oracle on a single node is up to 2 orders of magnitude faster than Spark GraphX distributed execution on 2 to 16 nodes Execution Time (secs) Oracle’s In-Memory Analyst vs Spark GraphX 1.1 Web Data sets: Verticies / Edges Twitter followership 2010: 41,652,230 Web page relations .UK domain: 77,741,046 Copyright © 2015 Oracle and/or its affiliates. All rights reserved. 1,468,365,182 2,965,197,340 42 Oracle’s In-Memory Analyst vs. GraphLab • On a single machine is faster than existing –Distributed execution or –Out-of-core execution Two orders-of-magnitude faster than disk-based execution Runtime in Seconds 3x – 10x faster than 16-machine distributed execution PageRank 1000 100 10 1 0.1 0.01 LiveJ PGX (SPARC) In-Memory analyst : x86 and SPARC (T5) PGX (X86) Twitter GraphLab (X86 x 16) SQL (X86) Triangle Counting GraphLab (state-of-art distributed framework) SQL: disk-based Execution time of two popular graph analysis algorithms (log scale, lower is better) Runtime in Seconds 100000 10000 1000 100 10 1 LiveJ Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Web-UK Summary Copyright © 2015 Oracle and/or its affiliates. All rights reserved. 44 Property Graph and RDF Graph RDF Graph Property Graph • Has formal theoretical foundation: interpretation, entailment, description logic • Hard to associate properties with edges • Simpler: no formal theoretical foundation, no semantics, no inference • Numerous W3C and OGC standards • Community driven. No standards yet • Very natural to handle multiple RDF graphs at the same time • Processing multiple property graphs is hard • Has many curated terms, ontologies • Has no standard terms, vocabularies • An RDF graph can be modeled as a property graph with a loss of semantics • A property graph can be modeled as an RDF Graph • Easy to associate properties with edges Copyright © 2015 Oracle and/or its affiliates. All rights reserved. RDF Semantic Graph Summary • Standards based: W3C, OGC • Multi-platform: Oracle Database, NoSQL Database, Oracle Cloud • Scalability: Trillions of triples • Transactional: Concurrent loading and updates with ACID properties • Security: OLS security labels at “triple” level (OLS). • Manageable: Use existing DB tools, utilities and expertise • Multi-type support: graph, relational, search, geospatial … • Oracle & 3rd Party Tools: OBIEE, Oracle Advanced Analytics, TopQuadrant, Tom Sawyer, IO Informatics, Jena, Protégé, Cytoscape Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Property Graph Summary • Complete platform: • Secure databases + Text indexing/search + Built-in analytics + Open source Java APIs & scripting languages for developers + Groovy console + integration w/ relational & SQL-based analytics • Scalable: • Distributed database and text indexing/search; parallel in-memory analytics are concurrent & multiuser; filter queries refine graph data read into memory for analysis • Fast • Parallel everywhere - load, query and in-memory analytics • Flexible: • Deploy on-premise or in the Cloud, store in RDBMS, Hbase & NoSQL, text search w/ Lucene & SolrCloud, 3 ways to deploy in-memory analytics, extensible analytics architecture • Open Source-based: Apache, Java, Tinkerpop APIs; Groovy, Python… scripting languages Copyright © 2015 Oracle and/or its affiliates. All rights reserved. 47 Q&A Copyright © 2015 Oracle and/or its affiliates. All rights reserved. 48 Resources • OTN: Oracle Spatial and Graph - RDF Semantic Graphhttp://www.oracle.com/technetwork/database/options/spatialandgraph/overview/rdfsemantic-graph1902016.html • OTN: Big Data Lite Virtual Machine (a free sandbox environment to get started): http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html • Oracle.com: https://www.oracle.com/database/big-data-spatial-and-graph • OTN: Oracle Big Data Spatial and Graph – property graph (trial software downloads, doc, help forum) http://www.oracle.com/technetwork/database/database-technologies/bigdata-spatialandgraph • Blog: (technical examples and tips): https://blogs.oracle.com/bigdataspatialgraph/ Copyright © 2015 Oracle and/or its affiliates. All rights reserved. 49 Safe Harbor Statement The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Oracle Confidential – 50 Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Oracle Confidential – 51