Download Decision Support Technology

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Clusterpoint wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Relational model wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Database model wikipedia , lookup

Transcript
Decision Support Technology
DSS Reference Architecture
Language
System
Presentation
System
Problem Processing
System
Knowledge
System
Outline
• Knowledge system technology
– data management
– data warehousing
– Data marts
– online analytic processing (OLAP)
• ROLAP, MOLAP, WOLAP
Evolution of DSS
• Transaction Processing Systems (TPS)
– Operational data stores and OLTP
– Batch reports, hard to find and analyze
information, inflexible and expensive, reprogram
every new request (circa 60’s)
• MIS
– Management reporting from transactions in TPS
– Still inflexible, not integrated with desktop tools
(circa 70’s)
• DSS
– Combine data with analytic models or expert rules
– Integration with desktop tools (80’s)
Evolution of DSS
• Data Warehousing
– Data integrated after (cleaning and
scrubbing) from multiple sources (both
internal and external to the organization)
– OLAP is the technology used to study the
data in terms of operations on a multidimensional data set
– Data warehousing also supports
processing of data by analytic methods
and permits data mining (90’s)
Applications






Retail - inventory management, promotions
Manufacturing - order shipment
Insurance – policy and claims tracking
Telecommunications - call analysis
Financial – account tracking
CRM/eCRM – customer profiling, clickstream
analysis
 Healthcare – disease management, patient and
physician profiling
Disease Management Programs
 Address health quality of population by identifying at-risk
members and applying prevention programs
 Challenge: identifying at-risk members
 Data warehouses can help with this effort
 Aetna U.S. Healthcare
 Members with certain ailments are flagged using an algorithm
that examines member’s diagnoses, procedures, laboratories,
and pharmaceuticals
 Data gathered from medical and pharmaceutical claims,
member and provider profiles (nearly 3 terabytes)
 Results: reduction in frequency of acute asthmatic episodes,
improvements in vaccination rates
Data Model
• A representation scheme with which to describe data: data
relationships, data semantics, and consistency constraints.
• Examples
– ER (entity relationship model)
– Relational model
– Object-oriented model
• References:
– http://www.smartdraw.com/resources/centers/software/erd.htm
(ER model)
– http://www.fgcu.edu/support/office2000/index.html (MS Office
product tutorials)
– http://mis.bus.sfu.ca/tutorials/MSAccess/tutorials.html (MS
Access tutorial)
E-R schema: A simple example
physician
m
evaluates
n
patient
-Modeling the structure of data, not the processing of it
ER Schema: a detailed example
Relational model
• Primary model used today for dataprocessing applications
• Database systems like Oracle,
Sybase, Informix, MS SQL server etc.
support this model
• Based on a well understood
theoretical model
An Example
Example of a relational schema:
Physician(doc_id, d_name, specialty)
Patient(p_id, p_name)
Evaluates(doc_id, p_id, date, diagnosis)
Essential features of the
relational model
• A relational model schema consists of
relations or tables
• Each table has a set of fields (columns) that
are related to one another
• One or more fields whose values determine
the value of other fields are called keys
• Tables are normalized in order to remove
redundancies
New Data Types
• Text, images, video, audio, time series,
spatial, ….
• Other, more exotic data types like
fingerprints (an IBM Extender) and face
recognition (an Informix DataBlade).
• Offered by IBM, Informix, Oracle
Object-oriented Databases
• Objects belong to object classes determined by
the structure (variables) and behavior (methods)
of the object - not easy to represent in a
relational database
• Methods that impact the state of the variables of
the object are encapsulated within the object and
facilitate interaction between objects
• eg. Object “bank account” modifies “balance” by
“amount”
• Potentially reduced development and
maintenance time
From Databases to Transaction
Processing
Source: Jim Gray,MSFT
Reality
Abstraction
DB
Transaction
DB'
Query
Change
Answer
The real state is represented by an abstraction, called the database, and the
transformation of the real state is mirrored by the execution of a program, called a
transaction, that transforms the database.
Examples
• Point of sale systems
– credit card transactions
• ATM machines
– all withdrawals and deposits
• E-commerce web sites
• Health Care
– medical records, billing
Databases for Decision Support
• Transaction Processing systems are
optimized for performance
• Data they capture are too detailed to be
of use for decision support purposes
• Online Analytical Processing (OLAP)
imposes very different demands on
databases than does Online
Transaction Processing (OLTP)
Data Analysis: An Example
1997
Reimbursements
Net Profits
1998
% Inc.
45.4 Mill
52.4 Mill
15.41
3.2 Mill
5.2 Mill
62.50
Data Analysis: A Hospital Example
1997
Reimbursements
Net Profits
1998
% Inc.
45.4 Mill
52.4 Mill
15.41
3.2 Mill
5.2 Mill
62.50
Reimb. by region
North
7.2 Mill
7.5 Mill
4.17
South
13.4 Mill
18.4 Mill
37.31
East
18.4 Mill
17.4 Mill
-5.43
West
6.4 Mill
9.1 Mill
42.18
Data Analysis: An Example
Reimb. By region
North
7.2 Mill
7.5 Mill
4.17
South
13.4 Mill
18.4 Mill
37.31
East
18.4 Mill
17.4 Mill
-5.43
West
6.4 Mill
9.1 Mill
42.18
Specialty A
2.65 Mill
2.70 Mill
1.90
Specialty B
6.45 Mill
7.10 Mill
10.08
Specialty C
3.1 Mill
4.0 Mill
29.03
Specialty D
1.2 Mill
4.60 Mill
283.33
Reimb. - South by specialty
Data Analysis: An Example
1997
Reimbursements
Net Profits
1998
% Inc.
45.4 Mill
52.4 Mill
15.41
3.2 Mill
5.2 Mill
62.50
Profits by region
North
0.88 Mill
0.50 Mill
-43.18
South
1.12 Mill
2.60 Mill
132.1
East
1.1 Mill
0.13 Mill
-88.18
West
0.1 Mill
1.97 Mill
1870
Data Analysis: An Example
1997
Reimbursements
Net Profits
1998
% Inc.
45.4 Mill
52.4 Mill
15.41
3.2 Mill
5.2 Mill
62.50
Profits-South by specialty
Specialty A
0.20 Mill
0.10 Mill
-50.00
Specialty B
0.60 Mill
1.20 Mill
100.0
Specialty C
0.22 Mill
0.40 Mill
81.82
Specialty D
0.10 Mill
0.90 Mill
800.0
Heterogeneous Database
Integration
Integration System
World
Wide
Web
Digital Libraries
Scientific Databases
Personal
Databases
• Collects and combines information from disparate sources
• Provides integrated view, and a uniform user interface
• Supports sharing of data between entities
Why look at data in this
way?
– What would be the demand for services (forecasting)?
– Who are our key customers/patients, and
– What are the margins/outcomes? (profitable
customers/satisfied patients)
– How do we market to them/treat them?
– What pricing/treatment strategy is desirable?
– What are their preferences?
– What type of customer/patient services are required?
– What services when packaged result in higher/better
sales/revenues/margins/outcomes, efficient workflow?
– Which promotion/patient education/counseling works or does not
work and why?
– What is the inventory/patient turnover?
– Which channel/technology is more effective/profitable?
– Why do margins/outcomes differ from one place to another or one
patient to another?
Customer Relationship
Management
DISCOVER
ANALYZE
Trends in market
Selling opportunities
Opp. For improvements
TAKE ACTION
Source: Pilot Software
Customer base
Profitability
Buying pattern
Support pattern
Productivity
Policies & Procedures
Marketing policies
Support procedures
Business Intelligence
Software applications, technologies, and
analytical methodologies that perform data
analysis
Often used as a broad term that includes OLAP,
data mining, query, reporting tools and
technologies
Business Intelligence Loop
Business Strategist
OLAP
Data Mining
Reports
Decision
Support
Data Storage
Data
Warehouse
Extraction, Transformation,
& Cleansing
CRM
Clinical IS
Pharmacy
Lab
Data Warehouse
• A decision support database that is maintained
separately from the organization’s operational
databases.
• A data warehouse is a
– subject-oriented,
– integrated,
– time-varying,
– non-volatile
collection of data that is used primarily in
organizational decision making (W.H. Inmon)
Why Separate Data Warehouse?
• Performance
 Operational databases are designed & tuned for known
transactions & workloads.
 Complex OLAP queries would degrade performance for
transactions.
 Special data organization, access & implementation
methods needed for multidimensional views & queries
 Function
– decision support requires historical data (up to 5 to 10
years of data)
– consolidation of data from many operational systems and
external sources
– data quality considerations (semantic and measurement
issues are resolved)
Local vs Central Data Repository
Local
2
Knowledge
Generation
Central
3
Forward Data
4
1
HIS
Data
Local Rules Base
Local Data
Data Warehouse
Repository
Central Data
Feedback
Option
Knowledge
Generation
Other
Data
Sources
Warehouse Database Schema
• ER design techniques not appropriate
• Design should reflect multidimensional
view
– Star Schema
– Snowflake Schema
– Fact Constellation Schema
Multidimensional Data Model
• Database is a set of facts (points) in a multidimensional space
• A fact has a measure dimension
– quantity that is analyzed, e.g., sale, budget
• A set of dimensions on which data is analyzed
– e.g. , store, product, date associated with a sale amount
• Dimensions form a sparsely populated coordinate system
• Each dimension has a set of attributes
– e.g., owner, city and county of store
• Attributes of a dimension may be related by partial order
– Hierarchy: e.g., street > county >city
– Lattice: e.g., date> month>year, date>week>year
Example: Patient profiling
 A healthcare organization needed a longitudinal view of
patients, including trends of services to patients
 Model
 Facts include Healthcare (e.g., diagnosis, procedure),
Financial (e.g., amount billed, number of claims), Resources
(e.g., number of bed-days, inpatient and outpatient visits)
 Dimensions include Time, Provider, Claim Type,
Demographics, Encounter Type, Diagnosis and Procedure,
Person, Organization
 Questions answered by system:
 Which individuals are eligible for services but not obtaining
them?
 Which individuals are registered for services, but not receiving
preventive healthcare?
Example of a Star Schema
Order
Product
Order No
ProductNO
Order Date
ProdName
Customer
Customer No
Customer Name
Customer
Address
City
Salesperson
SalespersonID
SalespersonName
City
Quota
Fact Table
ProdDescr
OrderNO
Category
SalespersonID
CategoryDescription
CustomerNO
UnitPrice
ProdNo
Date
DateKey
DateKey
CityName
Date
Quantity
Total Price
City
CityName
State
Country
Star Schema
• A single fact table and a single table for each
dimension
• Every fact points to one tuple in each of the
dimensions and has additional attributes
• Does not capture hierarchies directly
• Straightforward means of capturing a multiple
dimension data model using relations
Example of a Snowflake Schema
Order
Order No
Product
ProductNO
Order Date
ProdName
CategoryName
ProdDescr
CategoryDescr
Fact Table
Customer
Customer No
Customer Name
Customer
Address
City
Salesperson
OrderNO
SalespersonID
CustomerNO
Category
Category
UnitPrice
ProdNo
Date
DateKey
DateKey
CityName
Date
SalespersonID
Quantity
Month
City
SalespersonName
Total Price
CityName
City
Quota
Category
State
Country
Month
Month
Year
State
StateName
Country
Year
Year
Snowflake Schema
• Represent dimensional hierarchy directly
by normalizing the dimension tables
• Easy to maintain
• Saves storage, but may reduce
effectiveness of browsing (Kimball)
Fact Constellation
Sales
Fact Table
Store Key
Product Dimension
Shipping
Fact Table
Shipper Key
Product Key
Product Key
Store Key
Period Key
Product Desc
Product Key
Units
Period Key
Price
Units
Store Dimension
Store Key
Store Name
City
State
Region
Price
Fact Constellation
• Multiple fact tables share dimension tables.
• This schema is viewed as collection of
stars hence called galaxy schema or fact
constellation.
• Sophisticated applications require such
schema.
Data Warehouse vs. Data Marts
• Enterprise warehouse: collects all
information about subjects (customers,
products, sales, assets, personnel) that span
the entire organization.
– Requires extensive business modeling
– May take years to design and build
• Data Marts: departmental subsets that focus
on selected subjects: Marketing data mart:
customer, products, sales.
– Faster roll out, but complex integration in
the long run.
Virtual Warehouse
• Views over operational databases
– Materialize some summary views for efficient
query processing, easier to build, requisite
excess capacity on operational database
servers
Data Warehouse Architecture
(e.g.,
AS400)
Clinical
System
Payroll
System
Data
Mining Tools
Meta-Data
Oracle
Financials
on HP 9000
Billing System
OLAP servers
Transformation
Integration
Access,
Files
Other
Internal
Data
Data
Warehouse
(Industry Reports)
External
Data
Excel
Web
Other
Extraction, Transformation, &
Load (ETL)
 ETL is a set of tools and techniques
used to populate a data warehouse
 Extraction
 Extract data from sources (e.g., operational
DBMSs, file systems, Web pages)
 Transformation
 Clean data
 Convert from legacy/host format to
warehouse format (e.g., convert “surname”
to “last name”)
Extraction, Transformation, &
Load (ETL)
 Load
 Sort, summarize, consolidate, compute views, check
integrity, build indexes, partition
 Huge volumes of data to be loaded, yet small time window
(usually at night) when the warehouse can be taken off-line
 Techniques: batch, sequential load often too slow;
incremental, parallel loading techniques may be used
 Refresh
 Propagate updates from sources to the warehouse
 When to refresh - on every update, periodically (e.g., every
24 hours), or after “significant” events
 How to refresh – full extract from base tables vs. incremental
techniques
Metadata
 Types of Metadata
 Administrative metadata











source databases and their contents
gateway descriptions
warehouse schema, view & derived data definitions
dimensions, hierarchies
pre-defined queries and reports
data mart locations and contents
data partitions
data extraction, cleansing, transformation rules, defaults
data refresh and purging rules
user profiles, user groups
security: user authorization, access control
Metadata …
 Types (continued)
 Business data
 business terms and definitions
 ownership of data
 charging policies
 Operational metadata
 data lineage: history of migrated data and sequence of
transformations applied
 currency of data: active, archived, purged
 monitoring information: warehouse usage statistics, error
reports, audit trails
 Tools:
 Platinum Repository (Computer Associates)
 Meta Directory (Information Builders)…
The Complete Decision Support
System
(Source: Franconi)
Information Sources
Data Warehouse
Server
(Tier 1)
OLAP Servers
(Tier 2)
Clients
(Tier 3)
e.g., MOLAP
Semistructured
Sources
Data
Warehouse
extract
transform
load
refresh
etc.
Analysis
serve
Query/Reporting
serve
e.g., ROLAP
Operational
DB’s
serve
Data Marts
Data Mining
Three-Tier Architecture
• Warehouse database server
– Almost always a relational DBMS; rarely flat files
• OLAP servers
– Relational OLAP (ROLAP): extended relational DBMS that
maps operations on multidimensional data to standard
relational operations.
– Multidimensional OLAP (MOLAP): special purpose server
that directly implements multidimensional data and
operations.
• Clients
– Query and reporting tools.
– Analysis tools (excel)
– Data mining tools (e.g., trend analysis, prediction)