Download data warehouse /data mining road map

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
DATA WAREHOUSE /DATA MINING ROAD MAP
1. WHAT THE USER WANT?







Does the organization need a data warehouse/data mart? (Why a warehouse?)
What are the business objectives? In the government it may be fulfilling social
sector projects, resource mobilization, financial issues, target-based projects, how
the resources are utilized etc.
Scope of the DW? Very important. Its not that every system needs a DW.
Where the data will come from? From Blocks, Districts, States, PSUs, Markets
etc
Calculate the cost/benefit analysis. DW is an expensive proposition. One has to do
a careful analysis and justify.
Calculate the project estimation. How much time it will take to establish a DW
system –project approval, release of fund, execution, Cost of the project.
Calculate the risk assessment. What are the positive and negative aspects of the
project; Whether the Rate of Return is justified?
2. DETERMINE DBMS SERVER PLATFORM



Which database servers do you already have? Whether the server is good enough
to handle large amount of databases and the proposed DWs/Data Marts.
Determine cost, inter-operability and staff training considerations.
Determine DBMS server platform based on Return on Investment (ROI)
3. DETERMINE HW PLATFORM




Where should the data warehouse or data mart be housed? At the center and/or
state level
Which hardware platforms do you already have? Is it sufficient?
Determine cost, inter-operability and staff training considerations.
Determine hardware based on Return on Investment (ROI).
4. INFORMATION AND DATA MODELLING
Building an Information Model



How do I use a data model to express relationships between the data? There are
hundreds of departments. Means thousands of OLTP applications.
Resolve access and usage issues.
Determine the logical and physical design of the data warehouse or data mart.
5. CONSTRUCT THE META DATA REPOSITORY
How do I keep track of what means what, who can access it and how and when it will
be accessed? Who all can access it? Fully/partially.

Build a repository.

Determine the business user's view of the metadata repository
6. DATA ACQUISITION AND CLEANSING
How do I






Extract data from multiple sources across multiple database/OS/HW platforms.
Cleanse the data. Acceptable level of information. Add, Delete etc.
Scrub
Reconcile
Aggregate
Summarize
7. DATA TRANSFORMATION, TRANSPORTATION & POPULATION
How do I build the data warehouse/data mart?



Transform the data.
Transport the data.
Populate the data warehouse or data mart.
8. DETERMINE MIDDLEWARE CONNECTIVITY
How do I connect the source data to the target data warehouse or data mart?


Ongoing connection
Direct data access
9. PROTOTYPING, QUERYING & REPORTING
How do I:



Implement a prototype with user involvement?
Develop applications?
Use query and reporting tools?
10. DATA MINING
How do I find patterns in the data?


How can those patterns be used for revenue growth?
How can tools be used to further identify patterns in the data warehouse or data
mart?
10. OLAP
How can users analyze the data?
How can the necessary number of dimensions be determined?


How can users see the data represented in multiple dimensions?
How can they use the OLAP tools?
12. DEPLOYMENT & SYSTEM MANAGEMENT
How can I provide security, backup, recovery and the necessary capabilities for a
production data warehouse or data mart?



How can resources be allocated best?
How can the growth of the data be accommodated?
How can everything be kept running smoothly?
How do I:



Implement a prototype with user involvement?
Develop applications?
Use query and reporting tools?
2.12 Data Warehouse "Roadmap" and Roll out Strategy
The following table provides the framework and the general "roadmap" for the
rollout of the Enterprise Data Warehouse.
Data Warehouse Rollout Map & Related Activities
Warehouse Task
Activities
Objectives
Implementation Details
Create Project
Plan
Develop new
project Plan for data
warehouse
increment(s)
Gain consensus as to
project tasks and
schedule
Develop Plan and Schedule
Agree to Scope
from Technical
Requirements
Identify and
Document scope
data and
functionality
Gain consensus on
specific deliverables
(data, functionality,
structures) of enterprise
project
Obtain Business Community
Agreement
Data Acquisition
Identify, evaluate,
and design
elements pertinent
to data acquisition
Document and agree on
design components
(design standards, data
source mapping, ETT,
load, refresh and purge
modules, as well as data
mart design)
Develop the Data Acquisition
Plan Development Standards
and Component
Implementation Strategies
Data Quality
Assess Data
Quality, Develop
Define objectives and
requirements for data
Develop Data Quality
Components Design and Test
Design & Build
Standards
(cleansing, error
handling, audit &
control) assess quality of
source data, identify
data management
procedures, finalize tool
selection
Plans
Data Warehouse
Technical
Architecture
Identify, plan, and
design capacity,
hardware, and
software
components and
test criteria
Ensure adequate
capacity, identify
process flow of data,
plan integration of
modules, data, access,
and meta data
Plan Implementation and
Test Environments for
Warehouse Architecture
Components
Warehouse
Administration
Develop, Design
and Build
Standards for
Warehouse
Administration
Design and plan version
control, data archiving,
scheduling, usage, data
governing, backup,
restoration, query
profiles, and security
Plan for Development and
Test Environments for
Warehouse Administration
Components
Meta data
Management
Identify and
document Meta
data Requirements,
as well as design
and build standards
Collect meta data
needed by technical and
user communities,
identify required meta
data tools, and develop
standards to which meta
data should adhere
Develop a Meta data
Management Plan to manage
and monitor meta data,
Develop Design and Build
Standards (or identify
appropriate tool)
Data Access
Determine Access
Requirements and
develop design and
build standards
Collect specific access
requirements to support
analysis capabilities,
data manipulation
functionality, and user
interface. Develop,
design and build
standards to support
query and reporting and
query criteria, user
security, and
confirmation of data
availability
Develop Access Plan and
Determine access tools.
Develop Standards and
Confirm Data Availability
Design and Build
Design and Build
Development
Standards and
Modules for all
Components of
Data Warehouse
Solution
Design, build, load and
test Data Acquisition
Modules (ETT, load,
refresh, purge, data
mart)
Design, Build, Populate and
Test:
Design, develop, and
generate or implement
Data Quality Modules
(cleansing, error
handling, audit &
control)
Design, build, load and
test Architecture
Components
(Multidimensional, Test
and production
Databases)

Data Acquisition
Modules

Data Quality
Modules

Databases and
other Schema’s

Warehouse
Administration
Modules

Meta data
Modules

Data Access
Modules
Design, build, load and
test Warehouse
Administration
Modules (versioning,
scheduling, backup,
restoration,
performance, security)
Build or Implement
Meta data Modules
Design, build, load and
test Data Access
Components (reports
and query criteria, user
security, user access and
style specifications)
Documentation
Define and Produce
Documentation
Standards,
Procedures and
Environment
Specify and develop
documentation
deliverables (User,
Technical, Operational,
Reference)
Develop Documentation
Requirements, Standards, and
Delivery Strategy and
Produce Final Documentation
Testing
Define testing
strategies, develop
test procedures and
perform testing
specific to the scope
of the Data
Warehouse solution
Design, develop and
implement testing plans
and strategies for ETT,
Performance, Interfaces,
Integration, Volume,
Query Profiles and
other Components
Develop and Implement
Testing Strategy, Plan,
Models and Integration
requirements
Training
Define and develop
increment training
requirements and
plans
Identify and document
training requirements
for technical and end
user staff, Identify
specific roles who
should receive training,
and create training
databases
Develop Training Strategy,
Requirements, and Class
Material. Create Training
Databases
Installation
Define and develop
installation plan
Develop an installation
plan to support the
production, test and
other maintenance
environments for the
data warehouse solution
Develop Sequential or
repeatable (Step by Step)
Installation Plan
Transition
Define transition
strategy to
production
environment
Identify how the
transition to production
will occur including
planning for data
acquisition, preparation
of production
database(s), developer
preparations and other
cut-over issues
Develop Cut-Over Plan,
Implement Maintenance,
Production and Regression
Environments
Production
Support
Measure and
support the
production systems
Evaluate and audit the
system for performance,
faults, use, growth,
recovery and tuning
issues
Develop Library for
production support metrics,
corrections, enhancements,
results
End Phase
Prepare for and
complete phase end
activities
Secure acceptance of
phase end deliverables,
release resources, assess
and audit deliverables
Prepare Phase End Report
and perform Quality
Assessment activities
Post
Implementation
Support
Evaluate
implemented
increment, and nonimplemented
requirements, data
warehouse
architecture and
plans
Assess responsiveness
of solution to stated
need, assess
performance of data
warehouse, identify
next increment
Document Evaluation of Data
Warehouse Solution and
Architecture, Identify Next
Increment Opportunities,
Assess Performance of Project
to Plan
End DW
Increment
Prepare for and
complete final
project activities
Secure final acceptance
of project deliverables
Conduct Discovery Meetings
with Business Community
NOTE: Input for the above from the INTERNET