Download datamining and datawarehousing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

Transcript
DATA MINING AND DATA WAREHOUSING
PRESENTED BY:
R.V. Ravi Kiran
Computer science engineering
Email id:[email protected]
Ph no:9440966469
P. Nagesh
Computer science engineering (3/4 B.tech)
Email id:[email protected]
Ph no: 9701706236
S.k.chaitanya
Computer science engineering (3/4 B.tech)
Email id:[email protected]
Ph no: 9491893712
Gayathri Vidya Parishad College Of Engineering
Visakhapatnam.
Both Data Mining and Data Warehousing
are important in the present competitive market
ABSTRACT:
world with others. More applications like
Data Mining is a concept that is
taking off in the commercial sector as a means of
finding useful information out
data. While
products
environment
Customer
Retention,
Marketing,
Risk
Assessment, Fraud detection and others.
of gigabytes of
for the commercial
are starting to become available,
tools for a scientific environment are much rarer
(or even non-existent). Yet scientists have long
had to search through reams of printouts and
rooms full of tapes to find the gems that make up
scientific discovery.
This paper will explore some of the ad hoc
methods generally used for Data Mining in the
scientific community, including such things
as scientific visualization, and
some of
the
more
recently
outline how
developed
products used in the commercial environment
FIG.DATA ANALYSIS
INTRODUCTION
can be adapted to scientific Data Mining
In today’s fiercely competitive market
Data Warehousing is a repository of
place, companies have an insatiable need for
data gathered from multiple sources stored
information. Customer data, financial data and
under a unified schema at a single site. In this
Internet-click stream data is a powerful asset
paper, we will discuss about the Data
provided it can be integrated and utilized to
Warehouse design using star and snowflake
schemas. We are frequently using Star schema,
it has more advantages over the other schemas.
Snowflake schemas normalize dimensions to
eliminate redundancy.
enhance
customer
experiences.
With
centralization
of
the current trends in
an organization’s data in
large databases, particularly in a commercial
environment, the process of extracting useful
information has become more formalized and
the term Data Mining has been coined for it. In
one of the first papers on commercial Data
Mining, Evangelos Simoudis of IBM defined
it as:
“The
The ability to access meaningful
process
of
extracting
previously unknown, comprehensible
and
data, moving and sharing of data throughout an
actionable information from large databases and
organization between departments, officers and
using it to make crucial business decisions”
business partners in a timely efficient manner
This definition has a definite business favor and
through the use of familiar query and analytical
much of IBM's development of Data Mining
tools are critical.
has been in this direction. In practice, Data
Mining is a process which can take on different
approaches depending on the type of data
involved and the objectives desired. As this is
still very much an evolving discipline, much
work is being undertaken to determine standard
processes for the varied environments. Further,
as the context in which the data is gathered is
often an important component, this must be
factored into any analysis.
FIG. HOW DATA IS SHARED
DEF: A Database is a collection of nonredundant data which is sharable between
different applications.
WHAT IS DATAMINING?
FIG.DATAMINING PROCESS
Data Mining is defined as “the
non-trivial extraction of implicit, previously
unknown, potentially useful and understandable
knowledge from data”. Data Mining is the
process of finding correlations or patterns
among dozens of fields in large relational
databases.
Latest Trends in Technologies and Methods
LATEST
TRENDS
IN
TECHNOLOGIES AND METHODS:
FIG.DISTRIBUTED DATAMINING
There are many number of Data Mining
trends is in terms of technologies
SPATIAL
AND
GEOGRAPHIC
DATA
MINING:
and methodologies which are currently being
developed and rehearsal. The trends identified
include
“The extraction of implicit knowledge,
spatial relationships or other patterns not
explicitly stored in spatial databases.” is known
DISTRIBUTED/COLLECTIVE
as spatial Data Mining.
DATAMINING:
The applications are useful in remote
The information located in different places, in
different physical locations is generally known
as distributed Data Mining. Distributed Data
Mining (DDM) is used to offer a different
approach to traditional approaches analysis, by
using a combination of localized data analysis,
together with a “global data model”.
sensing, medical, navigation, and related uses.
FI
FIG.SEQUENTIAL DATAMINING
HYPERTEXT&HYPERMEDIA
DATAMINING:
Hypertext and Hypermedia Data Mining
can be characterized as mining data which
includes text, hyperlinks and text markups.
FIG.SPATIAL DATAMINING
TIME SERIES/SEQUENCE DATAMINING:
Another important area in Data Mining
centers on the mining of time series and
sequence-based data. This involves the mining
of sequence of data. Sequential pattern mining
focuses on the identification of sequences.
FIG.DATAMINING
PHENOMENAL DATAMINING:
Phenomenal Data mining focuses on the
relationships between data and the phenomenon
which are inferred from the data is not went
WHAT IS DATA WAREHOUSING?
well in data ware project.
A single, complete and
consistent store of data obtained from a variety
APPLICATIONS OF
of different sources made available to end users
DATAMINING:
in what they can understand and use in a
Data Mining collects, stores and organizes data
business context.
for use in areas such as
A data warehouse is a subject-oriented,
 Data Mining and customer relationship
management
(CRM)
software
for
solving business decision problems
integrated,
time-variant and non-volatile
collection of data in support of management’s
decision making process
 Privacy of data in Insurance companies
and Government agencies
 Fraud detection in Telecommunications
and stock exchanges
 Medical diagnosis to detect abnormal
patterns
 Airline reservation to maximize seat
utilization
FIG.DATA WAREHOUSE
A Data Warehouse is a relational
database that is designed
analysis
rather
than
for query and
for
transaction
processing. It contains historical data derived
from
FIG.APPLICATIONS OF DATAMINING
transaction
data.
characteristics,
 Subject oriented
Data
Warehouses
 Integrated
Once loaded into the
 Non-volatile
Data Warehouse, the data is not updated. Acts
 Time-variant
as stable resource for consistent reporting and
comparative analysis
TIME-VARIANT:
All data in the Data
Warehouse is time stamped at time of entry into
the warehouse or when it is summarized within
the warehouse to act as chronological record
and to provide historical and trend analysis
possibilities
FIG.DATAWAREHOUSING
SUBJECT ORIENTED:
The data in the warehouse is
defined in business terms and is grouped under
business oriented subject headings such as
customers, products, sales analysis report and
marketing campaigns achieved through data
modeling.
INTEGRATED:
Data Warehouses must put
FIG.PROCESS OF DATA WAREHOUSING
data from disparate sources into a consistent
format. They must resolve problems such as
naming conflicts and inconsistencies among
ARCHITECTURE OF DATA WAREHOUSE:
Three common architectures in data
units of measure. When they achieve this, they
are said to be integrated.
Ware house are
 Warehouse Architecture (Basic) Data
NON-VOLATILE:

Data Warehouse Architecture (with a
Staging Area)

Data Warehouse Architecture (with a
management.
Staging Area and Data Marts)
FIG.DATAWAREHOUSE ARCHITECTURE
DATA
WAREHOUSE
ARCHITECTURE
(BASIC):
The metadata and raw data of a
traditional
online
transaction
processing
(OLTP) system is present, as is an additional
FIG.DATAWAREHOUSE WITH STAGING
type of data, summary data. A summary in
DATA
Oracle is called a materialized view.\
WITH STAGING AREA & DATA MARTS:
DATA
WAREHOUSE
ARCHITECTURE
WAREHOUSE
ARCHITECTURE
WITH
A
STAGING AREA:
Most data warehouses use a
staging area instead. A staging area simplifies
building summaries and general warehouse
FIG>DATA WARE WITH STAGING AREA
&
DATA
MARTS
We may want to customize your warehouse's
analyzed by end users and the schema
architecture for different groups within our
design.
organization. We can do this by adding data

Are widely supported by a large number of
marts, which are systems designed for a
business intelligence tools.A star join is a
particular line of business.
primary key to foreign key join of the
dimension tables to a fact table.
PROCESSES
WITHIN
A
DATA
WAREHOUSE:
SNOWFLAKE SCHEMA:
The Snowflake schema is a more
 Extract and load the data
complex data warehouse model than a star
 Clean and transform data into a form
schema, and is a type of star schema. The
that can cope with large data volumes
diagram of the schema resembles a snowflake.
and provide good query performance
Snowflake
schemas
 Backup and archive data
dimensions to eliminate redundancy.
 Manage queries, and direct them to the
CONCLUSION:
normalize
appropriate data sources
Data Mining is a new term and formalism for a
SCHEMAS IN DATA WAREHOUSE:
A schema is a collection of
process that has been undertaken by scientists
database objects, including tables, views,
for generations. The massive increase in the
indexes, and synonyms. Commonly used
volume of data collected or generated for
Schemas are Star schema, Snowflake schema.
analysis with the use of computers has made it
an essential tool. However, despite the more
STAR SCHEMA:
The star schema is the simplest schema. The
entity-relationship diagram of this schema
resembles a star. The center of the star consists
of a large fact table and the points of the star are
the dimension tables. A Star schema is
characterized by one or more fact tables and
formal approach, Data Mining is something that
scientists perform on an ad hoc basis and can
easily adapt to. Many of the methods used for
the analysis of the data were originally
developed to process scientific data and are
used unchanged.
dimension tables
Data Warehouse usually contains
historical data derived from transaction data,
The main advantages of star schemas are :
but it can include data from other sources. The

Provide a direct and intuitive mapping
determination of which schema model should
between
be used for a Data Warehouse should be based
the
business
entities
being
upon the requirements and preferences of the
Data Warehouse project team. Star schemas are
widely supported by a large number of business
intelligence tools where as Snowflake schemas
normalize dimensions to eliminate redundancy.
As a final point, the biggest of
all, the Internet, is becoming more and more
important, and while there is useful
information, extracting that from the terabytes
being added daily is an enormous task. The
techniques of Data Mining are applicable here
more than any other domain. However, to make
use of it takes time, effort and, above all, people
with a knowledge of the field, to differentiate
the true solutions from the infeasible
Bibliography:
Using Information Technology by
William Sawyers Hutchinson
Data Base System Concepts by Silberschatz,
Korth and Sudharshan
Data Base Management Systems by
Alexis Leon and Mathews Leon
http://www.technology-and-computers.com/