Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Modeling for Predictive Analytics SQL Saturday Providence – Dec 10 Beth Wolfset, Data Architect Email: [email protected] SQL Saturday 575 - Thanks to our Sponsors! • Gold • Silver • Bronze • Swag • Blog About BlueMetal (an event sponsor) Modern technology, craftsman quality. We’re an interactive design and technology architecture firm matching the most experienced consultants in the industry to the most challenging business and technical problems facing our clients. Founded August 2010 and as of October 2015 we are an Insight company. 6 | YEARS IN OPERATION 5 | LOCATIONS 6 | SERVICE AREAS 4 | INDUSTRY SPECIALIZATIONS We are hiring Data Engineers in Boston and NYC Data Is An Asset “Whether you want it or not, the amount and variety of data are expanding exponentially. Embrace that trend and transition your organizations to understand information as a competency that needs the right people, processes and platforms” John Lewis, president & CEO, consumer group, NA, at Nielsen “organizations integrating high-value, diverse, new information types and sources into a coherent information management infrastructure will outperform their industry peers financially by more than 20%.” Regina Casonato, et al, Gartner Research “Information is the oil of the 21st century, and analytics is the combustion engine.” Peter Sondergaard, Gartner Research Why Are We Here? • What is Data Modeling • What is the Predictive Analytics Process • Database Types and how No SQL fits in • What are the Types of Data Models and where they are appropriate • A Word on other Data Topics • Data Architecture and No SQL • Code First vs Model First What is Data Modeling Data modeling is a process used to define and analyze data requirements needed to support the business processes within the scope of corresponding information systems in organizations. Therefore, the process of data modeling involves professional data modelers working closely with business stakeholders, as well as potential users of the information system. The Data Journey Predictive Analytics Process Identify Business Objectives Monitor & Analyze Exploratory Analysis Implement Statistical Model Develop Statistical Model Test Statistical Model Popular Types of Databases Database Type Example Relational (SQL) Database Type Example MySQL Document Lotus Notes CouchBase CouchDB MongoDB OrientDB Raven Terrastore Graph & Resource Neo4J Description Flock Framework (RDF) HyperGraph Infinite Graph Jena Sesame AllegoGraph Search Engine ElasticSearch Splunk Solr MarkLogic Sphinx Key-Value Riak Redis Column-Family Cassandra Amazon Simple DB Oracle Sybase Object-Oriented Berkely Level Memcached Cache Db4o ObjectStore Versant Objectivity/DB Hypertable Hierarchical DATA ARCHITECTURE AND NOT ONLY SQL NO SQL – Not Only SQL • Implies a non-relational database • Does not imply there is no structure to the data • Data may be stored in methods that do not require the structure to be understood a priori • Structure of data may be defined at query time NO SQL does not mean NO Data Architect • The Data Architect understands what data is available and the methods of accessing it, regardless of the management system • Data requires documentation • Generally uses Schema-On-Read Data Models Model Types • • • • • • Conceptual Logical Physical Star Schemas Summary Tables and Cubes Others Business Case • To capture information for an Educational System. This will include information about the Employees, Faculty, Students , Alumni, Courses and Educational Organization Data Modeling Tools Tool ERwin Data Modeler ER/Studio Creator Supported Database Platforms Supported Supported data OSs models (conceptual, logical, physical) ERwin Inc. Access, IBM DB2, Informix, Windows Conceptual, logical, (formerly part Ingres, MySQL, Oracle, physical of CA Progress, MS SQL Server, Technologies) Sybase, Teradata Embarcadero Access, IBM DB2, Informix, Windows Conceptual, logical, (acquired by Hitachi HiRDB, Firebird, physical, ETL IDERA) Interbase, MySQL, MS SQL Server, Netezza, Oracle, PostgreSQL, Sybase, Teradata, Visual Foxpro and others via ODBC/ANSI SQL Enterprise Architect Sparx Systems IBM DB2, Firebird, InterBase, Informix, Ingres, Access, MS SQL Server, MySQL, SQLite, Oracle, PostgreSQL, Sybase Windows, Linux, Mac Conceptual, Logical & Physical + MDA Transform of Logical to Physical SQL Server Management Studio Oracle SQL Developer Data Modeler PowerDesigner Microsoft MS SQL Server Windows Physical Oracle Oracle, MS SQL Server, IBM DB2 Crossplatform Logical, physical Sybase MS SQL Server, Oracle, PostgreSQL, MySQL, IBM DB2, Informix Conceptual, logical, physical Windows Supported notations Forward Reverse Engineering Engineering Model/database comparison and synchronization Update database and/or update model Repository IDEF1X, IE (Crows feet), and more Yes Yes IDEF1X, IE (Crows feet) Yes Yes Update database and/or update model ER/Studio Repository and Team Server (formerly Portal/CONNECT) for collaboration IDEF1X, UML DDL, Information Engineering & ERD Yes Yes Update database and/or update model Multi-user collaboration using File, DBMS or Cloud Repository (or transfer via XMI, CVS/TFS or Difference Merge). Yes IDEF1X, IE (Crows feet), and more IDEF1X, IE (Crows feet), and more Yes Yes Yes Yes Workgroup edition provides collaboration Yes Update database and/or update model Update database and/or update model Yes Yes Conceptual Model A conceptual data model is a high level or coarse model which is abstract in structure and content that is intended to represent a business area SubType SuperType Transactional Data Relational Data Warehouse Data Non-Relational Data IoT Data SuperType SubType Logical Model A logical data model (LDM) is a type of data model showing aAttribute representation of the organization's data, independent of any particular Entity technology Primary / Surrogate Key Business Key / Alternate Key SuperType Audit ColumnsDomains SuperType Foreign Key Transactional Data Warehouse Data IoT Data SubType Relational Data Non-Relational Data SubType { Physical Model "course": { { "courseId": "ENG 301", A physical data model (or database design) is a representation of a data design which "person": { "instructor": { "name": { takes into account the capabilities and constraints of a given database management "name": { "first_name": "Severus", system "first_name": "Severus", "last_name": "Snape" What "last_name": "Snape" }, }, about reverse "ssn": "999-999-9999", "ssn": "999-999-9999" engineering? "active_status": true, SuperType }, "person_type": "External", "students": [ "role": { { "faculty": { "name": { "department": "Engineering", "first_name": "Harry", "salary": "80000" "last_name": "Potter" }, }, "student": { "ssn": "111-111-1111" "fee": 0 }, } { }, Transactional Data "name": { "course": { "first_name": "Luna", Warehouse Data "courseId": "ENG 301" "last_name": "Lovegood" } IoT Data }, } "ssn": "222-222-2222" } Relational Data } SubType X Non-Relational Data ] } } Star Schema A star schema is a representation of a dimensional data model which consists of facts and dimensions. Transactional Data Warehouse Data IoT Data X Relational Data Non-Relational Data X Summary Tables and Cubes A summary or aggregation table will be created when metrics have been standardized and reports or analytics want to use the information frequently or consistently. Transactional Data Warehouse Data IoT Data Relational Data Non-Relational Data Predictive Analytics Process Identify Business Objectives Monitor & Analyze Exploratory Analysis Implement Statistical Model Develop Statistical Model Test Statistical Model Other Areas • • • • • • Data Model Patterns Master Data Management Code First vs Model First Data Flow Diagrams Use Case Diagrams Data Governance Other Areas? Data Model Patterns Logs Taxonomies & Reference Data Events Business Objects Demographic Data Auditing Utilization Master Data Management Master Data tCustomer Customer Config Sales Customer Data • core data that is essential operation of the business Three Ways to Masterto Data • • Mutually Exclusiveset of identifiers and extended attributes that describes consistent and uniform the core• entities Vertically Fragmented Master Data Management • Match and Merge • • • Name: S. Snape Master Plan SSN: 123-45-6789 List Degree: Engineering a methodology that identifies the most critical information within an Which is the organization—and creates a single view of truth to power business right processes customer Name: Prof. Severus Snape discipline in which business and IT work together toEngineering ensure the address? SSN: 123-45-6789 Classes and uniformity, accuracy, stewardship, semantic consistency Emp Id: 456 accountability of the enterprise’s official shared master data assets Address: 9 Galen St Phone: 617-555-1212 Math Degree: Engineering may be technology Masterenabled Classes Name: Prof. Snape Class List Emp Id: 456 Phone: 617-555-1212 Name: Severus Snape SSN: 123-45-6789 Address: 9 Galen St Philosophy ClassesPhone: 617-555-1212 Code First vs Model First Code First Model First • • • • • • • Build code-data structures in memory Measured on speed to provide functionality OR/M maps structures to DB Limits need to understand DB Access • Understand data and future growth Use standards and templates Measured on multiple uses across applications Consistency of model facilitates efforts Data Abstraction Layer Goals • Efficient Storage • Performant Retrieval • Understand DB Goals • Ease of Development • Understand code Microservices Stored Procedures Data Flow Diagram A data flow diagram is a representation of the movement of data through a system Transactional Data Warehouse Data IoT Data Relational Data Non-Relational Data 23 Use Case Diagram A use case diagram is a representation of a user's interaction with the system Transactional Data Warehouse Data IoT Data Relational Data Non-Relational Data Thank you. We appreciate your interest, and look forward to working with you in the future! Beth Wolfset [email protected] Twitter: @beth_wolfset www.bluemetal.com | (866) 252-0111