Download ISY 4340 Class Introduction

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Open Database Connectivity wikipedia , lookup

Big data wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Concurrency control wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
G. Green
Foundations of Database
Systems
Class Introduction
1
•
•
•
•
•
•
Introductions
Seating Chart
Course Overview
Syllabus
Case
Database Development Overview
G. Green
Agenda
2
Foundations of Database Systems
 Understand data-related activities of SDLC
 Implement data modeling, database design, and database
implementation techniques
 CASE (Visio)
 Database (SQL Server)
G. Green
 Objectives
 Course Contents





Lectures, Examples, In-Class Exercises
Individual Assignments (3)
Team Project* (3 parts)
Quizzes (3)
Exams (2)
3
*Can request teammates; see syllabus for Team Preferences deadline
Research
• International and US
• Periodic Assessments
G. Green
• Service Learning & Kolb’s Learning Cycle
• Some NOT graded; others are
4
Learning
›
›
›
›
Prepare --read & reread book, notes-- for each class
Attend, listen, be attentive, engaged
Ask and answer questions, & add to discussion
Do each assignment completely & in a timely and professional
manner
G. Green
 Participate :
 Take PLENTY of notes in class:
› Do NOT just rely on powerpoint
 Explore :
› Go beyond classroom material
5
Class Resources
 http://canvas.baylor.edu
 Schedule also contains links to all lecture slides, study guides,
assignments and project write-ups
G. Green
 Syllabus/Schedule, Grades, Attendance:
 Other Resources:
 http://blogs.baylor.edu/gina_green/mis-4340-resources/
 NOTE: the syllabus/schedule on this website will NOT contain the
links described above
6
G. Green
Syllabus…
7
8
Introduction to Databases
Chapter 1
G. Green
Topics
• Chapter 1
• The Database Environment
• Database Development Process
• Big Data
9
• Chapter 9 (Pages 409 – 410)
• Chapter 10 (Pages 444 – 445, 446-447)
• Master Data Management
• Data Federation
• Chapter 11 (Pages 464 – 472, 486, 499 – 506)
•
•
•
•
Database Personnel
Metadata Management (e.g., Data Dictionaries)
Backup Facilities
Overview of Tuning the Database for Performance
G. Green
Evolution of Database
Technologies
1970’s
1980’s
1990’s
2000+
Federated
10
1960’s
MDDB
Hierarchical
Object
XML
Traditional Files
Relational
Network
NoSQL
Object-Relational
…….
G. Green
Figure 1-3 Old file processing systems: Example
Duplicate Data
11
Traditional File Processing Environment
› Program-data dependence = “structural” & “data”
› Limited data sharing = “islands of automation”
› Duplication of data = “redundancy”
› Lengthy development times
› Excessive program maintenance
12
Disadvantages:
G. Green
The Database Environment
G. Green
13
 Program-data independence
 Improved data sharing
 Minimal data redundancy
 Improved data accessibility/responsiveness
 Improved data consistency
 Faster application development
 Enforcement of standards
 Improved data quality
 Reduced program maintenance
14
Advantages of Databases
G. Green
15
Data and Database
Administration
Chapter 11
G. Green
Traditional Administration Definitions
 Data Administration: A high-level function that is
responsible for the overall management of data
resources in an organization, including maintaining
corporate-wide definitions and standards
 Database Administration: A technical function that is
responsible for physical database design and for
dealing with technical issues such as security
enforcement, database performance, and backup and
recovery
16
Data People Involved in
SDLC
 Data(base) Analysts/Designers
 requirements elicitation, design
 Business (Intelligence) Analyst
 BI requirements, design
 Data Architects
 strategy, governance
 Data Stewards
 quality, metadata, MDM
 Business Analytics Engineer
 data analytics, statistics, mining
 Data Mining Engineer; Big Data
 “big data” specialists
17
 Data Administrators
Engineer; Data Scientist …
 Database Administrators
 (System) DBAs
 implementation/maintenance
 Application DBAs
 Procedural DBAs
 stored code
 e-DBAs
 web-enabled DBMSs
 Data Warehouse Administrators
 ETL, DW implementation
G. Green
•
•
•
•
•
•
•
•
•
Relational database design, implementation
Database programming
ETL (extract, translate, load)
Data warehousing design (star schema) and implementation
(MDDB)
Data analysis, reporting, and mining techniques
Cloud database implementations
Statistical modeling with tools such as R, SAS, or SPSS
Data visualization tools
Technologies for structured and unstructured data
• Hadoop (Hadoop is an Apache project to provide an open-source
implementation of frameworks for reliable, scalable, distributed
computing and data storage.)
• NoSQL
• "NewSQL"
***See Big Data University for (mostly) free self-study training
18
Growing Skillset
G.
Gree
n
19
Data Quality and Integration
Chapter 10
G. Green
Metadata Management
• System Catalog
20
• Part of DBMS
• "Active" dictionary
• Data Dictionary
• Typically "passive"
• Extension of catalog metadata
• Information Repository (e.g., IRDS)
• Standards for data dictionaries
• Integrates dictionaries
G.
Green
• "Ensuring the currency, meaning, and quality of
reference data within and across various subject
areas" (pg 444)
• Identify
21
Master Data Management
• Common Data Subjects
• Common Data Elements
• Sources of "the truth"
• Cleanse
• Update applications to reference Master Data
repository
• Ensures consistency of key data (not ALL data)
throughout organization
G.
Gree
n
22
Database Development Process
G. Green
Systems Development Life
Cycle
DB Activities in SDLC
Planning
Enterprise Modeling*
Analysis
DB Scope, Requirements
(Conceptual Data Model)
Design
DB Design
(Logical DB Design)
DB Design
(Physical DB Design)
Implementation
DB Implementation
(Load, Test, Eval, Op)
DB Maintenance*
23
SDLC for this class
G. Green
Enterprise Data Modeling
• Determine organizational data
requirements
• Build enterprise data model
• outcome is a very high-level Entity-Relationship Diagram
• see :
• http://da.ks.gov/kito/ITPlans/data_maps06.ppt
• http://www.tdan.com/view-articles/5205
25
G.
Gree
n
Source: http://www.tdan.com/view-articles/5205
Conceptual Data Modeling
Determine business rules
26
Determine user data requirements
Build conceptual data model
› outcome is an Entity-Relationship Diagram
(conceptual schema)
G. Green
Logical Database Design
› e.g., the Relational Model
27
Select database model
Transform conceptual (ERD) into logical
(relational) data model
Normalize data structures
› Outcome is normalized, relational tables
G. Green
Physical Database Design
Select storage device(s)
28
Select database product (e.g., SQL Server)
Design fields, records, files (physical schema)
› outcomes are detailed, physical definitions for:
 fields (data dictionary)
 records (space requirements for physical structures)*
 files (access methods)
*Will not do in this class
G. Green
Database Implementation
• Create database file/table structures
• Establish access rights
29
• Create views (external schema)
• Load test data
• Write/test programs that process data
• Install database (with production data) into
production operations
› outcomes are secured database tables loaded with data
G. Green
Database Maintenance
• Maintain database structures
• Storage/space management
• Performance, tuning
• I/O Contention
• CPU Usage
• Application Tuning
• Data availability
• DBMS upgrades, "fixes"
• Backup, recovery …….
Database Maintenance, cont…
• Full
• Incremental
• Differential
31
• Backup
• Business Continuity
• Data Replication ("fallback")
G.
Gree
n
32
Data and Database
Administration
Chapter 11
G. Green
Cloud Computing
• Business Model
Computing resources on demand
Need-based architectures
Internet-based delivery
Pay as you go
33
•
•
•
•
• History (VERY high-level and approximate)
Time-sharing
Utility Computing
Virtual Machines
50's
60's
WWW
70's
Cloud Computing
Personal Computers
80's
Grid Computing
90's
2000's
G.
Gree
n
Cloud Computing Services
• Impacts to Data(base) Administration
• See textbook page 469
G. Green
34
Summary
• Evolution of Data Management
• Disadvantages of file processing
• Components of a DBMS Environment
• Database Advantages
35
• Database Concepts
• Database Development:
• Overall SDLC
• Database Activities in the SDLC
• Data Models/Schemas
• What they represent
• People Involved in SDLC (esp. DB)
• Traditional job divisions and responsibilities
• Newer job titles
G. Green