Download File - Mr. Stives Classroom Web Page

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Why use data mining?
Two main reasons to use data mining:


Too much data and too little information
There is a need to extract useful information from the data and to interpret the data
Facing to enormous volumes of data, human analysts with no special tools can no longer make
sense. However, Data mining can automate the process of finding relationships and patterns in
raw data and the results can be either utilized in an automated decision support system or
assessed by a human analyst. This is why to use data mining, especially in science and business
areas which need to analyze large amounts of data to discover trends which they could not
otherwise find.
If we know how to reveal valuable knowledge hidden in raw data, data might be one of our most
valuable assets. While data mining is the tool to extract diamonds of knowledge from your
historical data and predict outcomes of future situations.
What can data mining do?
Data mining is primarily used today by companies with a strong consumer focus - retail,
financial, communication, and marketing organizations. It enables these companies to determine
relationships among "internal" factors such as price, product positioning, or staff skills, and
"external" factors such as economic indicators, competition, and customer demographics. And, it
enables them to determine the impact on sales, customer satisfaction, and corporate profits.
Finally, it enables them to "drill down" into summary information to view detail transactional
data.
With data mining, a retailer could use point-of-sale records of customer purchases to send
targeted promotions based on an individual's purchase history. By mining demographic data
from comment or warranty cards, the retailer could develop products and promotions to appeal to
specific customer segments.
For example, Blockbuster Entertainment mines its video rental history database to recommend
rentals to individual customers. American Express can suggest products to its cardholders based
on analysis of their monthly expenditures.
Wal-Mart is pioneering massive data mining to transform its supplier relationships. Wal-Mart
captures point-of-sale transactions from over 2,900 stores in 6 countries and continuously
transmits this data to its massive 7.5 terabyte Teradata data warehouse. Wal-Mart allows more
than 3,500 suppliers, to access data on their products and perform data analyses. These suppliers
use this data to identify customer buying patterns at the store display level. They use this
information to manage local store inventory and identify new merchandising opportunities. In
1995, Wal-Mart computers processed over 1 million complex data queries.
The National Basketball Association (NBA) is exploring a data mining application that can be
used in conjunction with image recordings of basketball games. The Advanced Scout software
analyzes the movements of players to help coaches orchestrate plays and strategies. For example,
an analysis of the play-by-play sheet of the game played between the New York Knicks and the
Cleveland Cavaliers on January 6, 1995 reveals that when Mark Price played the Guard position,
John Williams attempted four jump shots and made each one! Advanced Scout not only finds
this pattern, but explains that it is interesting because it differs considerably from the average
shooting percentage of 49.30% for the Cavaliers during that game.
By using the NBA universal clock, a coach can automatically bring up the video clips showing
each of the jump shots attempted by Williams with Price on the floor, without needing to comb
through hours of video footage. Those clips show a very successful pick-and-roll play in which
Price draws the Knick's defense and then finds Williams for an open jump shot.
Data Mining: Issues
Issues Presented
One of the key issues raised by data mining technology is not a business or technological one,
but a social one. It is the issue of individual privacy. Data mining makes it possible to analyze
routine business transactions and glean a significant amount of information about individuals
buying habits and preferences.
Another issue is that of data integrity. Clearly, data analysis can only be as good as the data that
is being analyzed. A key implementation challenge is integrating conflicting or redundant data
from different sources. For example, a bank may maintain credit cards accounts on several
different databases. The addresses (or even the names) of a single cardholder may be different in
each. Software must translate data from one system to another and select the address most
recently entered.
A hotly debated technical issue is whether it is better to set up a relational database structure or a
multidimensional one. In a relational structure, data is stored in tables, permitting ad hoc queries.
In a multidimensional structure, on the other hand, sets of cubes are arranged in arrays, with
subsets created according to category. While multidimensional structures facilitate
multidimensional data mining, relational structures thus far have performed better in client/server
environments. And, with the explosion of the Internet, the world is becoming one big
client/server environment.
Finally, there is the issue of cost. While system hardware costs have dropped dramatically within
the past five years, data mining and data warehousing tend to be self-reinforcing. The more
powerful the data mining queries, the greater the utility of the information being gleaned from
the data, and the greater the pressure to increase the amount of data being collected and
maintained, which increases the pressure for faster, more powerful data mining queries. This
increases pressure for larger, faster systems, which are more expensive.
The immediate future
The complexity of data mining must be hidden from end-users before it will take the true center
stage in an organization. Business use cases can be designed, with tight constrains, around data
mining algorithms.
Directions: Open up Word and answer the following questions in your own words based on the
following article you have just read.
Save as: Why use Data Mining
Type the question and the answer. The font will be Times New Roman size 12.
At the top of the page make sure that you type YOUR NAME C#P_
The # is for the Computer you sit at, The _is for the period you are in
When you finish upload the word document to Mr. Stives
Questions
1. What are the two main reasons for Data Mining?
2. The companies that primarily use Data mining today have a strong consumer
focus in what?
3. Why does Blockbuster Entertainment use Data Mining?
4. Why does Wal-Mart use Data Mining?
5. Why is the NBA thinking about using Data Mining? What would the NBA
use it for?
6. What are the two key issues raised by data mining technology?
7. What is a relational database structure?
8. What is a multidimensional structure?
9. What is the issue with cost?
Save as: Why use Data Mining
Don’t Forget to Upload to Mr. Stives