Download Report for Data Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Data model wikipedia , lookup

Data center wikipedia , lookup

Database model wikipedia , lookup

Forecasting wikipedia , lookup

Data analysis wikipedia , lookup

Data vault modeling wikipedia , lookup

3D optical data storage wikipedia , lookup

Data mining wikipedia , lookup

Information privacy law wikipedia , lookup

Business intelligence wikipedia , lookup

Transcript
PAPER PRESENTATION
ON
Presented by:CONTACT DETAILS
RAMSWAROOP SINGH T
BRANCH: CSE
K VINAY KUMAR
BRANCH: IT
ROLL NO: 05C71A0547
CONTACT NO: 9966952101
EMAIL ID: [email protected]
ROLL NO: 05C71A1218
CONTACT NO: 9885522506
EMAIL ID: [email protected]
ELLENKI COLLEGE OF ENGG. & TECH., PATEL GUDA
INDEX
ABSTRACT
INTRODUCTION
WHAT IS DATAMINNG?
WHAT IS DATA WAREHOUSING?
HOW DO DATAMINING AND DATAWARE HOUSING WORK
TOGETHER?
APPLICATIONS
ADVANTAGES
DISADVANTAGES
CONCLUSION
REFERENCES
ABSTRACT
We live in the age of information. Data is the most valuable resource of an
enterprise.
In today’s competitive global business environment, understanding and
managing enterprise wide information is crucial for making timely decisions and
responding to changing business conditions. Many companies are realizing a business
advantage by leveraging one of their key assets – business Data. There is a tremendous
amount of data generated by day-to-day business operational applications. In addition
there is valuable data available from external sources such as market research
organizations, independent surveys and quality testing labs. Studies indicate that the
amount of data in a given organization doubles every 5 years.
Data warehousing has emerged as an increasingly popular and powerful concept
of applying information technology to turn these huge islands of data into meaningful
information for better business.
Data mining, the extraction of hidden predictive
information from large databases is a powerful new technology with great potential to
help companies focus on the most important information in their data warehouses. Data
mining tools predict future trends and behaviors, allowing businesses to make proactive,
knowledge-driven decisions.
This paper describes the practicalities and the constraints in Data mining and Data
warehousing and its advancements from the earlier technologie
INTRODUCTION
Data Warehousing
o
Extract archived
operational data

A data warehouse can be defined
Overcome inconsistencies
as any centralized data repository
between different legacy
which can be queried for
data formats
business benefit

o
Warehousing makes it possible to
o
Integrate data throughout
an enterprise, regardless
o
of location, format, or
guide to uncover inherent trends and
communication
tendencies in historical information, as
requirements
well as allow for statistical predictions,
Incorporate additional or
groupings and
expert information
Classification of data.
Typical
data
warehousing
implementations in organizations will
Data Mining
allow users to ask and answer questions
such as “How many sales were made, by
Data mining is not an
territory, by sales person between the
“intelligence” tool or framework,
months of May and June in 1999?” Data
typically drawn from an enterprise data
warehouse is used to analyze and
mining will allow business decision
makers to ask and answer questions,
uncover information about past
such as “Who is my core customer that
performance on an aggregate level. Data
purchases a particular product we sell?”
warehousing and business intelligence
or “Geographically, how well would a
provide a method for users to anticipate
future trends from analyzing past
line of products sell in a particular
region and who would purchase them,
patterns in organizational data. Data
mining is more intuitive, allowing for
given the sale of similar products in that
region.
increased insight beyond data
warehousing. An implementation of data
mining in an organization will serve as a
WHAT IS DATA MINING?
Generally,
data
mining
information that can be used to increase
(sometimes called data or knowledge
revenue, cuts costs, or both. Data mining
discovery) is the process of analyzing
software is one of a number of analytical
data from different perspectives and
tools for analyzing data. It allows users
summarizing it into useful information -
to analyze data from many different
dimensions or angles, categorize it, and
not. Companies have used powerful
summarize the relationships identified.
computers to sift through volumes of
Technically, data mining is the process
supermarket scanner data and analyze
of finding correlations or patterns among
market
dozens of fields in large relational
However, continuous innovations in
databases.
computer
research
reports
processing
for
power,
years.
disk
storage, and statistical software are
Although
data
mining
is
a
relatively new term, the technology is
dramatically increasing the accuracy of
analysis while driving down the cost.
WHAT IS DATA WAREHOUSING?
Dramatic advances in data capture,
allowing users to access this data freely.
processing power, data transmission, and
The data analysis software is what
storage
supports data mining.
capabilities
are
enabling
organizations to integrate their various
databases into data warehouses. Data
According to Bill Inman, author
warehousing is defined as a process of
of Building the Data Warehouse and the
centralized
and
guru who is widely considered to be the
retrieval. Data warehousing, like data
originator of the data warehousing
mining, is a relatively new term although
concept,
the concept itself has been around for
characteristics that describe a data
years. Data warehousing represents an
warehouse:
data
management
there
are
generally
four
ideal vision of maintaining a central
repository of all organizational data.
Centralization of data is needed to
maximize user access and analysis.
Dramatic technological advances are
making this vision a reality for many
companies.
And,
equally
dramatic
advances in data analysis software are

Subject-oriented:
data
are
organized according to subject
instead of application e.g. an
insurance company using a data
warehouse would organize their
data by customer, premium, and
claim, instead of by different

products (auto, life, etc.). The
from the operational environment
data organized by subject contain
into the data warehouse, they
only the information necessary
assume
for decision support processing.
convention e.g. gender data is
Integrated: When data resides in
transformed to "m" and "f".
many separate applications in the
operational
encoding

environment,
of
data
is
often

a
consistent
Time-variant:
The
coding
data
warehouse contains a place for
storing data that are five to 10
inconsistent. For instance, in one
years old, or older, to be used for
application, gender might be
comparisons,
coded as "m" and "f" in another
forecasting. These data are not
by 0 and 1. When data are moved
updated.
trends,
and
An Overview of Data Mining Techniques
This overview provides a description
of some of the most common data
2) Next
Generation
Techniques
such as trees, networks and rules.
mining algorithms in use today. We have
broken the discussion into two sections,
Each section will describe a number
of data mining algorithms at a high level,
each with a specific theme:
focusing on the "big picture" so that the
1) Classical Techniques such as
statistics,
neighborhoods
and
clustering, and
reader will be able to understand how
each algorithm fits into the landscape of
data mining techniques.
HOW DO DATAMINING AND DATAWAREHOUSING WORK TOGETHER??
Extracting
meaningful
that might otherwise be overlooked
numerous
is called "data mining." Assembling
databases and cross-referencing it to
the information in one place is called
find patterns, trends and correlations
"data warehousing."
information
from
 All the information is stored
transformed and the useful
in Information repositories.
data is sent through Data
 Data warehouse takes the
mining.
 The data, which is sent
cleaned and integrated data.
 The data taken by Data
through
warehouse is selected and
data
mining
is
evaluated and presented.
APPLICATIONS
Data Warehousing

Retrieve data - from a variety of
heterogeneous operational
databases

Insulate data - i.e. the current
o
delivered to the data
operational information
o
warehouse/store based on
Preserves the security and
a selected model (or
integrity of mission-
mapping definition)
critical OLTP
applications
o
Gives access to the
broadest possible base of
data.
Data is transformed and
o
Metadata - information
describing the model and
definition of the source
data elements

Data cleansing - removal of
ADVANTAGES:
certain aspects of operational

data, such as low-level
transaction information, which
slow down the query times.

Transfer - processed data
transferred to the data
warehouse, a large database on a
high performance box.
Data Mining
Enhances end-user access to a
wide variety of data.

Business decision makers can
obtain various kinds of trend
reports e.g. the item with the
most sales in a particular area /
country for the last two years.
A data warehouse can be a
significant enabler of commercial
business applications, most notably

Medicine - drug side effects,
Customer relationship Management
hospital cost analysis, genetic
(CRM).
sequence analysis, prediction etc.

Finance - stock market
prediction, credit assessment,
fraud detection etc.

DISADVANTAGES:
Marketing/sales - product
analysis, buying patterns, sales
Data mining systems rely on
prediction, target mailing,
databases to supply the raw data for
identifying `unusual behavior'
input and this raises problems in that
etc.
databases tend be dynamic, incomplete,

Knowledge Acquisition
noisy, and large. Other problems arise as

Scientific discovery -
a result of the adequacy and relevance of
superconductivity research, etc.
the information stored.

Engineering - automotive
diagnostic expert systems, fault
detection etc.
Limited Information
A database is often designed for
purposes different from data mining and
sometimes the properties or attributes
that would simplify the learning task are
not present nor can they be requested
Missing data can be treated by discovery
from the real world. Inconclusive data
systems in a number of ways such as;
causes
problems
because
if
some
attributes essential to knowledge about
the application domain are not present in
the data it may be impossible to discover

Simply disregard missing values

Omit the corresponding records

Infer missing values from known
values
significant knowledge about a given
domain. For example cannot diagnose

value to be included additionally
malaria from a patient database if that
in the attribute domain
database does not contain the red blood

cell count of the patients.
Treat missing data as a special
Or average over the missing
values using Bayesian
techniques.
FUTURE VIEWS
The future of data mining lies in
emerging market for predictive analytics
predictive analytics. The technology
has been sustained by professional
innovations in data mining since 2000
services,
have been truly Darwinian and show
recommendation)
promise of consolidating and stabilizing
applications in verticals such as retail,
around predictive analytics. Variations,
consumer finance, telecommunications,
novelties and new candidate features
travel and leisure, and related analytic
have been expressed in a proliferation of
applications. Predictive analytics have
small start-ups that have been ruthlessly
successfully
culled from the herd by a perfect storm
applications
of bad economic news. Nevertheless, the
recommendations, customer value and
service
bureaus
and
proliferated
to
support
(rent
a
profitable
into
customer
churn
management,
campaign
Be realistic about the required complex
optimization, and fraud detection. On the
mixture of business acumen, statistical
product side, success stories in demand
processing and information technology
planning; just in time inventory and
support as well as the fragility of the
market basket optimization are a staple
resulting predictive model; but make no
of
assumptions
predictive
analytics.
Predictive
about
the
limits
of
analytics should be used to get to know
predictive analytics. Breakthroughs often
the customer, segment and predict
occur in the application of the tools and
customer behavior and forecast product
methods
demand and related market dynamics.
opportunities
.
to
new
commercial
CONCLUSION:
Comprehensive data warehouses
and prioritize information for specific
that integrate operational data with
end-user problems. The data mining
customer,
market
tools can make this leap. Quantifiable
in
an
business benefits have been proven
explosion of information. Competition
through the integration of data mining
requires
sophisticated
with current information systems, and
analysis on an integrated view of the
new products are on the horizon that will
data. However, there is a growing gap
bring this integration to an even wider
between more powerful storage and
audience of users.
supplier,
information
have
timely
and
resulted
and
retrieval systems and the users’ ability to
effectively analyze and act on the
information they contain. Both relational
and
OLAP
technologies
have

Data mining has a lot of potential

Diversity in the field of
tremendous capabilities for navigating
application
massive data warehouses, but brute force

navigation of data is not enough. A new
Estimated market for data mining
is $500 million
technological leap is needed to structure
REFERENCES:
.
1.Books Referred:
a. Data Mining: concepts and
techniques-Jiawei Han
b. Data Mining TechniquesArun k. Pujari.
c. Decision Support and Data
Warehouse systems-Efrem G.Mallach
2.
Internet Sites Availed:
a. www.kluweronline.nl
b. www.internet2.com
c. www.the-datamine.com