Download Left out topics – MIS Unit 5 5.6 Data 5.6.1 CRISP

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Left out topics – MIS
Unit 5
5.6 Data
5.6.1 CRISP - Cross Industry Standard Process for Data Mining
The process of data mining

The sequence of the phases is not strict and moving back and forth between
different phases is always required.

The arrows in the process diagram indicate the most important and frequent
dependencies between phases.

The outer circle in the diagram symbolizes the cyclic nature of data mining itself.

A data mining process continues after a solution has been deployed.

The lessons learned during the process can trigger new, often more focused
business questions and subsequent data mining processes will benefit from the
experiences of previous ones.
Six major phases.
(1) Business Understanding

This initial phase focuses on understanding the project objectives and
requirements from a business perspective, and then converting this knowledge
into a data mining problem definition, and a preliminary plan designed to achieve
the objectives.
(2) Data Understanding

The data understanding phase starts with an initial data collection and proceeds
with activities in order to get familiar with the data, to identify data quality
problems, to discover first insights into the data, or to detect interesting subsets to
form hypotheses for hidden information.
(3) Data Preparation

The data preparation phase covers all activities to construct the final dataset (data
that will be fed into the modeling tool(s)) from the initial raw data. Data
preparation tasks are likely to be performed multiple times, and not in any
prescribed order. Tasks include table, record, and attribute selection as well as
transformation and cleaning of data for modeling tools.
(4) Modeling

In this phase, various modeling techniques are selected and applied, and their
parameters are calibrated to optimal values.

Typically, there are several techniques for the same data mining problem type.

Some techniques have specific requirements on the form of data. Therefore,
stepping back to the data preparation phase is often needed.
(5) Evaluation

At this stage in the project you have built a model (or models) that appear to have
high quality, from a data analysis perspective.

Before proceeding to final deployment of the model, it is important to more
thoroughly evaluate the model, and review the steps executed to construct the
model, to be certain it properly achieves the business objectives.

A key objective is to determine if there is some important business issue that has
not been sufficiently considered.

At the end of this phase, a decision on the use of the data mining results should be
reached.
(6) Deployment

Creation of the model is generally not the end of the project. Even if the purpose
of the model is to increase knowledge of the data, the knowledge gained will need
to be organized and presented in a way that the customer can use it.

Depending on the requirements, the deployment phase can be as simple as
generating a report or as complex as implementing a repeatable data mining
process.

In many cases it will be the customer, not the data analyst, who will carry out the
deployment steps.

However, even if the analyst will not carry out the deployment effort it is
important for the customer to understand up front the actions which will need to
be carried out in order to actually make use of the created models.
5.6.2 Data Warehouse Database

The central data warehouse database is the cornerstone of the data warehousing
environment.

This database is almost always implemented on the relational database management
system (RDBMS) technology.

However, this kind of implementation is often constrained by the fact that traditional
RDBMS products are optimized for transactional database processing. Certain data
warehouse attributes, such as very large database size, ad hoc query processing and the
need for flexible user view creation including aggregates, multi-table joins and drilldowns, have become drivers for different technological approaches to the data warehouse
database. These approaches include:

Parallel relational database designs for scalability that include shared-memory, shared
disk,
or
shared-nothing
configurations.
models
implemented
on
various
multiprocessor

An innovative approach to speed up a traditional RDBMS by using new index
structures to bypass relational table scans.

Multidimensional databases (MDDBs) that are based on proprietary database
technology; conversely, a dimensional data model can be implemented using a
familiar RDBMS. Multi-dimensional databases are designed to overcome any
limitations placed on the warehouse by the nature of the relational data model.
MDDBs enable on-line analytical processing (OLAP) tools that architecturally belong
to a group of data warehousing components jointly categorized as the data query,
reporting, and analysis and mining tools.

(1) Sourcing, Acquisition, Cleanup and Transformation Tools

A significant portion of the implementation effort is spent extracting data from
operational systems and putting it in a format suitable for informational applications that
run off the data warehouse.

The data sourcing, cleanup, transformation and migration tools perform all of the
conversions, summarizations, key changes, structural changes and condensations needed
to transform disparate data into information that can be used by the decision support tool.

They produce the programs and control statements, including the COBOL programs,
MVS job-control language (JCL), UNIX scripts, and SQL data definition language
(DDL) needed to move data into the data warehouse for multiple operational systems.

These tools also maintain the Meta data. The functionality includes:

Removing unwanted data from operational databases

Converting to common data names and definitions

Establishing defaults for missing data

Accommodating source data definition changes
The data sourcing, cleanup, extract, transformation and migration tools have to deal with some
significant issues including:

Database heterogeneity. DBMSs are very different in data models, data access language,
data navigation, operations, concurrency, integrity, recovery etc.

Data heterogeneity. This is the difference in the way data is defined and used in different
models - homonyms, synonyms, unit compatibility (U.S. vs metric), different attributes
for the same entity and different ways of modeling the same fact.

These tools can save a considerable amount of time and effort. However, significant
shortcomings do exist. For example, many available tools are generally useful for simpler
data extracts. Frequently, customized extract routines need to be developed for the more
complicated data extraction procedures.
(2) Meta data

Meta data is data about data that describes the data warehouse. It is used for building,
maintaining, managing and using the data warehouse. Meta data can be classified into:

Technical Meta data, which contains information about warehouse data for use by
warehouse designers and administrators when carrying out warehouse
development and management tasks.

Business Meta data, which contains information that gives users an easy-tounderstand perspective of the information stored in the data warehouse.

Equally important, Meta data provides interactive access to users to help understand
content and find data.

One of the issues dealing with Meta data relates to the fact that many data extraction tool
capabilities to gather Meta data remain fairly immature.

Therefore, there is often the need to create a Meta data interface for users, which may
involve some duplication of effort.

Meta data management is provided via a Meta data repository and accompanying
software.

Meta data repository management software, which typically runs on a workstation, can be
used to map the source data to the target database; generate code for data transformations;
integrate and transform the data; and control moving data to the warehouse.

As user's interactions with the data warehouse increase, their approaches to reviewing the
results of their requests for information can be expected to evolve from relatively simple
manual analysis for trends and exceptions to agent-driven initiation of the analysis based
on user-defined thresholds.

The definition of these thresholds, configuration parameters for the software agents using
them, and the information directory indicating where the appropriate sources for the
information can be found are all stored in the Meta data repository as well.
(3) Access Tools

The principal purpose of data warehousing is to provide information to business users for
strategic decision-making.

These users interact with the data warehouse using front-end tools.

Many of these tools require an information specialist, although many end users develop
expertise in the tools.

Tools fall into four main categories: query and reporting tools, application development
tools, online analytical processing tools, and data mining tools.

Query and Reporting tools can be divided into two groups: reporting tools and managed
query tools.

Reporting tools can be further divided into production reporting tools and report writers.

Production reporting tools let companies generate regular operational reports or support
high-volume batch jobs such as calculating and printing paychecks.

Report writers, on the other hand, are inexpensive desktop tools designed for end-users.

Managed query tools shield end users from the complexities of SQL and database
structures by inserting a Meta layer between users and the database.

These tools are designed for easy-to-use, point-and-click operations that either accept
SQL or generate SQL database queries.

Often, the analytical needs of the data warehouse user community exceed the built-in
capabilities of query and reporting tools.

In these cases, organizations will often rely on the tried-and-true approach of in-house
application
development
using
graphical
PowerBuilder, Visual Basic and Forte.
development
environments
such
as

These application development platforms integrate well with popular OLAP tools and
access all major database systems including Oracle, Sybase, and Informix.

OLAP tools are based on the concepts of dimensional data models and corresponding
databases, and allow users to analyze the data using elaborate, multidimensional views.

Typical
business
applications
include
product
performance
and
profitability,
effectiveness of a sales program or marketing campaign, sales forecasting and capacity
planning.

These tools assume that the data is organized in a multidimensional model.

A critical success factor for any business today is the ability to use information
effectively.

Data mining is the process of discovering meaningful new correlations, patterns and
trends by digging into large amounts of data stored in the warehouse using artificial
intelligence, statistical and mathematical techniques.
(4) Data Marts

The concept of a data mart is causing a lot of excitement and attracts much attention in
the data warehouse industry.

Mostly, data marts are presented as an alternative to a data warehouse that takes
significantly less time and money to build.

However, the term data mart means different things to different people.

A rigorous definition of this term is a data store that is subsidiary to a data warehouse of
integrated data.

The data mart is directed at a partition of data (often called a subject area) that is created
for the use of a dedicated group of users.

A data mart might, in fact, be a set of de-normalized, summarized, or aggregated data.
Sometimes, such a set could be placed on the data warehouse rather than a physically
separate store of data.

In most instances, however, the data mart is a physically separate store of data and is
resident on separate database server, often a local area network serving a dedicated user
group.

Sometimes the data mart simply comprises relational OLAP technology which creates
highly de-normalized dimensional model (e.g., star schema) implemented on a relational
database.

The resulting hyper cubes of data are used for analysis by groups of users with a common
interest in a limited portion of the database.

These types of data marts, called dependent data marts because their data is sourced from
the data warehouse, have a high value because no matter how they are deployed and how
many different enabling technologies are used, different users are all accessing the
information views derived from the single integrated version of the data.

Unfortunately, the misleading statements about the simplicity and low cost of data marts
sometimes result in organizations or vendors incorrectly positioning them as an
alternative to the data warehouse.

This viewpoint defines independent data marts that in fact, represent fragmented point
solutions to a range of business problems in the enterprise.

This type of implementation should be rarely deployed in the context of an overall
technology or applications architecture.

Indeed, it is missing the ingredient that is at the heart of the data warehousing concept -that of data integration.

Each independent data mart makes its own assumptions about how to consolidate the
data, and the data across several data marts may not be consistent.

Moreover, the concept of an independent data mart is dangerous -- as soon as the first
data mart is created, other organizations, groups, and subject areas within the enterprise
embark on the task of building their own data marts.

As a result, you create an environment where multiple operational systems feed multiple
non-integrated data marts that are often overlapping in data content, job scheduling,
connectivity and management.
(5) Data Warehouse Administration and Management

Data warehouses tend to be as much as 4 times as large as related operational databases,
reaching terabytes in size depending on how much history needs to be saved.

They are not synchronized in real time to the associated operational data but are updated
as often as once a day if the application requires it.

Almost all data warehouse products include gateways to transparently access multiple
enterprise data sources without having to rewrite applications to interpret and utilize the
data.

Furthermore, in a heterogeneous data warehouse environment, the various databases
reside on disparate systems, thus requiring inter-networking tools. The need to manage
this environment is obvious.

Managing data warehouses includes security and priority management; monitoring
updates from the multiple sources; data quality checks; managing and updating meta
data; auditing and reporting data warehouse usage and status; purging data; replicating,
sub setting and distributing data; backup and recovery and data warehouse storage
management.
(6) Information Delivery System

The information delivery component is used to enable the process of subscribing for data
warehouse information and having it delivered to one or more destinations according to
some user-specified scheduling algorithm.

The information delivery system distributes warehouse-stored data and other information
objects to other data warehouses and end-user products such as spreadsheets and local
databases.

Delivery of information may be based on time of day or on the completion of an external
event.

The rationale for the delivery systems component is based on the fact that once the data
warehouse is installed and operational, its users don't have to be aware of its location and
maintenance.

All they need is the report or an analytical view of data at a specific point in time.

With the proliferation of the Internet and the World Wide Web such a delivery system
may leverage the convenience of the Internet by delivering warehouse-enabled
information to thousands of end-users via the ubiquitous worldwide network.

The Web is changing the data warehousing landscape since at the very high level the
goals of both the Web and data warehousing are the same easy access for the information.

The value of data warehousing is maximized when the right information gets into the
hands of those individuals who need it, where they need it and they need it most.

However, many corporations have struggled with complex client/server systems to give
end users the access they need.

The issues become even more difficult to resolve when the users are physically remote
from the data warehouse location.

The Web removes a lot of these issues by giving users universal and relatively
inexpensive access to data.

Couple this access with the ability to deliver required information on demand and the
result is a web-enabled information delivery system that allows users dispersed across
continents to perform a sophisticated business-critical analysis and to engage in collective
decision-making.
5.3 ERP II

The traditional ERP is the main component in the e-ERP system.

But for the purposes of the collaboration, an e-ERP system is opened to inflow and
outflow of information.

The ERP applications are now web enabled meaning that the functions are available from
the internet, either on a corporate Intranet or through a corporate controlled Extranet.

E-commerce denotes the carrying out of commercial transactions be it with businesses or
with individual customers over the electronic medium, commonly the Internet.

This indeed requires an extensive infrastructure of which the main features are a
catalogue, online ordering facilities and status checking facilities. ERP system serves as
the transaction processing back end for the Internet based front end. Some of this
functionality may be built on legacy EDI modules.

E-procurement or Internet Procurement improves the efficiency of the procurement
process by automating and decentralizing the procurement process. The traditional
methods of sending Request for Quotes (RFQ) documents and obtaining invoices etc. are
carried out over the web through purchasing mechanisms such as auctions or other
electronic Marketplace functions, including catalogues.

With the introduction of computers to industry the first applications were automating
manual tasks such as book-keeping, invoicing and reordering. Gradually as the
sophistication of computers and models increased the enterprise systems begun to take a
more active role as a planning and control agent.

After the CIM (Computer Integrated Manufacturing) experiences in the eighties the role
of the enterprise systems has been redefines to a decision support tool.

Also the scope of the enterprise systems has evolved from manufacturing to the entire
enterprise (ERP).

Thus in the beginning of a potential new era for enterprise systems (ERP/II) we need to
explore the new concepts of the enterprise systems in detail.

The ERP/2 concept is a term designed by Gartner Group in 2000 (they also named the
ERP systems) as “an application and deployment strategy to integrate all things enterprise
centric”.

AMR Research defines ECM as: “Enterprise Commerce Management (ECM) is a
blueprint that enables clients to plan, manage, and maximize the critical applications,
business processes and technologies they need to support employees, customers, and
suppliers” (www.amrresearch.com/ECM).

The prime functionalities of the e-ERP systems are E-commerce, Supply Chain
Management (SCM), Customer Relationship Management (CRM), Business Intelligence
(BI), Advance Both Business Intelligence (BI) and Advance Planning and Scheduling
(APS) provide decision support.

While the BI function analyses the data collected in the corporate Data warehouse for
providing decision support, the APS system optimizes the production capacity based on
the forecasted orders, inventories and production capacity.

Supply Chain Management (SCM) systems provide information that assist in planning
and managing the production of goods.

For instance, SCM assists in answering questions such as where the good is to be
produced, from which the parts are to be procured and by when it is to be delivered.

The SCM systems build on APS for decision support.

Customer Relationship Management (CRM) systems facilitates the managing of a broad
set of functions that primarily include the categories of customer identification process
and customer service management.

The ERP system provides the CRM systems with the required back end information.

In addition to the above mentioned there are numerous new systems and emerging
concepts not touched upon here, e.g. Corporate Performance Management (CPM) and
Human Resource Management (HRM) systems Product Lifecycle Management (PLM)
including Product Data Management (PDM) enables enterprises to bring innovative and
profitable products to market more effectively, especially in the evolving e-business
environment. PLM enables enterprises to harness their innovation process through
effective management of the full product definition lifecycle in their extended enterprises.

Supplier Relationship Management (SRM) is the vendor side analogy to CRM aimed at
the management of the supplier base. SRM is rapidly transitioning from a competitive
advantage to a competitive necessity, and is an essential element for companies to
successfully compete in the evolving new world of e-business.

These new functions are provided through a combination of best-practices processes and
technologies such as product data management, collaboration, visualization, enterprise
applications integration, components supplier management, knowledge management, etc.
However we conclude that the generic functionality of e-ERP systems may be
conceptualized as illustrated in the framework in figure 1 above. What we have seen so
far is an evolution of the ERP systems explained from an information systems point of
view. To understand the development further we need to analyses the concepts of
logistics and supply chain management in order to be able to understand the driving force
behind the new business models that e-ERP enables.
4.2 Inspection and Walkthroughs
Difference between Inspections and Walkthroughs.
Inspection
Walkthrough
Formal
Informal
Initiated by the project team
Initiated by the author
Planned meeting with fixed roles assigned to all the
members involved
Unplanned.
Reader reads the product code. Everyone inspects it Author reads the product code and his team
and comes up with defects.
Recorder records the defects
Moderator has a role in making sure that the
discussions proceed on the productive lines
mate comes up with defects or suggestions
Author makes a note of defects and
suggestions offered by team mate
Informal, so there is no moderator
Walkthroughs:

Author presents their developed artifact to an audience of peers.

Peers question and comment on the artifact to identify as many defects as possible.

It involves no prior preparation by the audience. Usually involves minimal
documentation of either the process or any arising issues.

Defect tracking in walkthroughs is inconsistent.
Inspection:

Is used to verify the compliance of the product with specified standards and requirements.

It is done by examining, comparing the product with the designs, code, artefacts and any
other documentation available.

It needs proper planning and overviews are done on the planning to ensure that
inspections are held properly.

Lots of preparations are needed, meetings are held to do inspections and then on the basis
of the feedback of the inspection, rework is done.
Walkthrough:

Method of conducting informal group/individual review is called walkthrough, in which a
designer or programmer leads members of the development team and other interested
parties through a software product, and the participants ask questions and make
comments about possible errors, violation of development standards, and other problems
or may suggest improvement on the article, walkthrough can be pre planned or can be
conducted at need basis and generally people working on the work product are involved
in the walkthrough process.
o Leader: who conducts the walkthrough, handles administrative tasks, and ensures
orderly conduct
o Recorder: who notes all anomalies (potential defects), decisions, and action items
identified during the walkthrough meeting, normally generate minutes of meeting
at the end of walkthrough session.
o Author: who presents the software product in step-by-step manner at the walkthrough meeting, and is probably responsible for completing most action items.

Walkthrough Process:
o Author describes the artifact to be reviewed to reviewers during the meeting.
o Reviewers present comments, possible defects, and improvement suggestions to
the author.
o Recorder records all defect, suggestion during walkthrough meeting.
o Based on reviewer comments, author performs any necessary rework of the work
product if required.
o Recorder prepares minutes of meeting and sends the relevant stakeholders and
leader is normally to monitor overall walkthrough meeting activities as per the
defined company process or responsibilities for conducting the reviews, generally
performs monitoring activities, commitment against action items etc.

Inspection: An inspection is a formal, rigorous, in-depth group review designed to
identify problems as close to their point of origin as possible., Inspection is a recognized
industry best practice to improve the quality of a product and to improve productivity,
Inspections is a formal review and generally need is predefined at the start of the product
planning,
o Inspector Leader: The inspection leader shall be responsible for administrative
tasks pertaining to the inspection, shall be responsible for planning and
preparation, shall ensure that the inspection is conducted in an orderly manner and
meets its objectives, should be responsible for collecting inspection data
o Recorder: The recorder should record inspection data required for process
analysis. The inspection leader may be the recorder.
o Reader: The reader shall lead the inspection team through the software product in
a comprehensive and logical fashion, interpreting sections of the work product
and highlighting important aspects
o Author: The author shall be responsible for the software product meeting its
inspection entry criteria, for contributing to the inspection based on special
understanding of the software product, and for performing any rework required to
make the software product meet its inspection exit criteria.

Inspector: Inspectors shall identify and describe anomalies in the software product.
Inspectors shall be chosen to represent different viewpoints at the meeting (for example,
sponsor, requirements, design, code, safety, test, independent test, project management,
quality management, and hardware engineering).

Only those viewpoints pertinent to the inspection of the product should be present. Some
inspectors should be assigned specific review topics to ensure effective coverage.

All participants in the review are inspectors.

The author shall not act as inspection leader and should not act as reader or recorder.
Other roles may be shared among the team members.

Inspection Process: Following are review phases:
·
Planning
·
Overview
·
Preparation
·
Examination meeting
(1) Planning:
·
Inspection Leader perform following task in planning phase
·
Determine which work products need to be inspected
·
Determine if a work product that needs to be inspected is ready to be inspected
·
Identify the inspection team
·
Determine if an overview meeting is needed.
(2) Overview:
o Purpose of the overview meeting is to educate inspectors; meeting is led by
Inspector lead and is presented by author, overview is presented for the
inspection, this meeting normally acts as optional meeting, purpose to sync the
entire participant and the area to be inspected.
(3) Preparation:
o Objective of the preparation phase is to prepare for the inspection meeting by
critically reviewing the review materials and the work product, participant drill
down on the document distributed by the lead inspector and identify the defect
before the meeting.
(4) Examination meeting
o The objective of the inspection meeting is to identify final defect list in the work
product being inspected, based on the initial list of defects prepared by the
inspectors [identified at preparation phase and the new one found during the
inspection meeting.
o The Lead Auditor opens the meeting and describes the review objectives and area
to be inspected.
o Identify that all participants are well familiar with the content material, Reader
reads the meeting material and inspector finds out any inconsistence, possible
defects, and improvement suggestions to the author.
o Recorder records all the discussion during the inspection meeting, and mark
actions against the relevant stake holders.
o Lead Inspector may take decision that if there is need of follow up meeting.
o Author updates the relevant document if required on the basis of the inspection
meeting discussion.
(5) Rework and Follow-up:

Objective is to ensure that corrective action has been taken to correct problems
found during an inspection.