Download AQG Data Management Issues - Colorado State University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Clusterpoint wikipedia , lookup

Functional Database Model wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Database model wikipedia , lookup

Transcript
Shawn McClure
Software Engineer
CIRA, Colorado State University
[email protected]
970-491-8455
Projects:
Visibility Information Exchange Web System (VIEWS)
Interagency Monitoring of Protected Visual Environments (IMPROVE)
Air Toxics Data Archive (ATDA)
WRAP Technical Support System (TSS)
Background:
Systems Analyst
Database Architect
Programmer
“VIEWS-based” web sites...
VIEWS
ATDA
IMPROVE
VIEWS Data Presentation and Analysis
Why am I here?
To help ask a question:
Can we improve upon the data management system* used by the
Air Quality Group at CNL?
Some possible goals:
Understand the current system
Identify issues, goals, and requirements
Design a new system
Implement the new system
Maintain and evolve the system
* System: consists of the collection of manual and automatic processes by which
information is collected, managed, and disseminated, and by which work is done.
Some common problems with data management…
Data Redundancy and Confusion
Results when different people independently collect or copy the same piece of information
Can lead to conflicting naming and coding conventions
May result in confusion about which version of the data is most recent or most correct
Program – Data Dependence
Tight relationship between data and the programs required to update and maintain that data
A change in data requires a change in all the program that work with the data
Changes in data type, field length, etc. cause change in programs
Some common problems with data management: (cont’d)
Lack of Flexibility
Difficulty delivering ad hoc reports and/or responding to unanticipated information
requirements in a timely fashion
Handling ad hoc requests: Data is in the system, but is very expensive (in time and effort) to
assemble and organize
Poor Security
There is no mechanism for knowing who is accessing the data and how they're modifying it
Access to the data can be unsystematic and uncontrolled
Some common problems with data management: (cont’d)
Lack of Data Sharing and Availability
Finding data can be difficult
Retrieving data can be difficult
Because pieces of information in different files and different parts of the organization cannot
be related to one another, it is virtually impossible for information to be shared or accessed in
a timely manner
Information cannot flow freely across different functional areas or different parts of the
organization
Functions of a Data Management System
Getting data into the system
Working with data while it is in the system
Getting data out of the system
1.
2.
3.
1
3
Getting
Data In
Getting
Data Out
Data Management System
2
Working
With Data
A data management system in more detail…
Import
Validation
Source Data
Database
Rules
Storage
Source Data
Source Data
Transformation
Retrieval
Analysis
Presentation
Interpretation
Program
Logic
Back End
Import: Getting data into the system
Validation: Ensuring data accuracy
Storage: Managing data, backup, and archival
Transformation: Sorting, joining, aggregating
Front End
Retrieval: Getting the data out
Presentation: Displaying the data
Analysis: Making the data understandable
Interpretation: Making the data usable
Two Possible Approaches: Transactional Database and a Data Warehouse
Transactional Database
Transactional Database
·
·
·
Fully normalized, third normal form
Best for data import, validation, tracking, and
management
More difficult for end user interaction
Data Warehouse
Data Warehouse
·
·
·
De-normalized “star” schema
Best for end user interaction, querying, and
front-end applications
Harder to update and automatically maintain
data integrity
A Hybrid Approach: Two Interrelated Relational Database Systems
OLTP:
Data Warehouse Generation System:
Data Warehouse:
• Functions as the “back-end” database
• Fully relational and in 3rd normal form
• Used for data import, validation, and
management
• Technologies: Microsoft SQL Server
•
•
•
•
•
•
•
•
•
•
•
Extracts data from the OLTP
De-normalizes and transforms data
Loads data into the Data Warehouse
Builds table indexes
Archives “snapshots” of the database
Technologies: VB, stored procedures
Functions as the “front-end” database
Uses a de-normalized “star schema”
Used for querying and archiving data
Automatically generated from the OLTP
Technologies: Microsoft SQL Server
General Tasks and Associated Questions
Determine our goals: What are we trying to achieve?
Design a strategy: How are we going to achieve it?
Identify problem areas: What do we have to watch out for?
Prioritize tasks: What is most important for success?
Adjust scope: What set of goals is most realistic?
Recommend alternatives: How else could we do things?
Allocate resources: How do we support our efforts?
Realign expectations: How do we communicate any adjustments?
Define milestones: How do we know when we're done?
Some Issues and Concerns…
What components/aspects of the current CNL-AQG data management system(s) are we
interested in examining?
How do we determine our priorities? (i.e. What should come first, next, and later?)
How do we make any necessary “transitions” with minimal impact to current operations?
What new “learning curves” will be involved?
How will we need to “shift our paradigms”?
How do we maintain a new system?
How do we know when we’ve achieved what we want?
Thanks!
[email protected]