Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CS457/557 Introduction – Chapters 1-2 Relevance of DB • DBs are a part of most decisions in an enterprise – Traditional DBs – Operational – Data Warehouses – Decision Support – NoSQL DBs – Information Databases • Databases play a critical role in? – Business, medicine, industry, etc., – everything? • Databases can be? – Traditional, XML, Object-relational, multimedia, real-time, Web • What databases have you used recently? Data vs. Databases • Data – Recorded known facts, implicit meaning • Database (DB) – – – – – – Collection of related data Logically coherent Represents mini-world Designed, built for specific purpose Intended user group Preconceived applications DBMS • Database Management System (DBMS) – Software – Create and maintain a DB – Define types of data – Store on disk controlled by DBMS – Manipulate data DBMS cont’d • Why a DBMS? – – – – – – – – Program independence Data abstraction Conceptual representation Meta data Share data Multiple views Transaction processing Higher overhead Fig. 2.3 and increased complexity So why use a DBMS? –OPTIMIZATION Definitions • Database System DBS – Data + DBMS • DBS – Schema (meta-data) - DB description, schema diagram – Instance (actual data) Fig. 1.2 - initially empty • 3-schema architecture Fig 2.2 – External view – Conceptual – structure of DB, hides physical – Internal – physical storage access paths Fig 2.1 Data Model • Describes the structure records, types, relationships, constraints, basic operations • DBMS based on data model • Types: – High-level (conceptual) - ER, UML, OO – Low level (physical) - XML – Implementation (representational) combines conceptual and physical – Relational – NoSQL data models – Column, key-value, document stores DBMS Languages • DDL - data definition language • DML - data manipulations language – High-level, nonprocedural – Set at a time – Interactive or embedded (host language) • SQL most common/popular DB Language DBMS • Software to create, query, manipulate data in the database • Based on a particular data model • Allows for program independence • Provides language to define, manipulate data • Contains meta data Meta Data • Data about the data • “Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource.” NISO Meta Data • Three categories of meta data (books as example): – Structural metadata: A way to define how objects are put together, for example, how pages are ordered to form chapters. – Administrative metadata: Information to help manage a resource, such as when and how it was created, types, and who has access – Descriptive metadata: A resource for discovery and identification, including elements such as title, abstract, author, and keywords. Meta Data •Structural – Student (Name, CWID, address, GPA, major) •Administrative – Owner of data? • Account#, when created, modified •Descriptive: – Everything but the content Meta Data – According to the Guardian • Metadata associated with emails: – Sender's name, email, and IP address – Recipient's name and email address – Date, time, and time zone – Unique identifier of email and related emails – Mail client login records with IP address – Mail client header formats – Subject of email Meta Data • Metadata associated with mobile phones: – Phone number of every caller – Serial numbers of phones involved – Time of call – Duration of call – Location of each participant – Telephone calling card numbers Meta Data • Metadata associated with Facebook: – Username and profile bio information including birthday, hometown, work history, and interests – Username and unique identifier – User subscriptions – User location – User device – Activity date, time, and time zone Meta Data • Metadata associated with web browsers: – Activity including pages the user visits and when visited – User data and possibly user login details with auto-fill features – User IP address, internet service provider, device hardware details, operating system, and browser version – Cookies and cached data from websites Meta Data • What about medical records? Additional Characteristics • Interfaces • Actors – DBA – Designers – Users • Naïve or parametric (same info each time) • Casual (different info each time) • Sophisticated (implement own applications using databases) • Standalone (personal DB) DB classifications • Single-user vs. multi-user • Centralized vs. distributed • Homogeneous vs. heterogeneous • Federated DBMS, multidatabase system Extending traditional databases • • • • Need for more complex databases Object-oriented databases Images, videos, scientific Data mining (decision support systems), spatial • Data on the web for e-commerce – XML • Non or semi-structured data • Databases for cloud computing Application packages • Software packages work with database backends (>1 database) • Web enabled • Examples – Enterprise Resource Planning (ERP) • Integrate data and processes of organization • Production, sales, distribution, marketing, finance, human resources, etc. – Customer Relationship Management (CRM) • Integrate customer information • Marketing and customer support Information Retrieval IR • Databases traditionally used for – Banking, insurance, retail, finance, manufacturing, payroll • Information retrieval used for – Books, manuscripts, library • Searching based on key-words • document processing –keywords, categorization, ranking documents Information Retrieval IR • Advent of web, IR is exciting again! – Web pages have active objects, change dynamically – New strategies needed • Big Data • NoSQL DB Management Issues • This course 457/557 – Design/Model DBs • Weird course – theory + applications – Relational: Query DBs, Algebra, Normalization We will use Oracle, MySQL – Intro to: Security, performance, transactions, NoSQL • Grad course 609 – Redundancy – Integrity constraints and concurrency control (transactions) – Backup and recovery – In depth: performance, NoSQL