Download Article - Jamia Yant

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Data Mining
By Jamia Yant
June 1st, 2012
Data Mining
Jamia Yant
Predictive Analytics and Customer Behavior
“Predictive analysis is the decision science that removes guesswork out of the decisionmaking process and applies proven scientific guidelines to find right solution in the shortest time
possible.” (Kaith, 2011) There are seven steps to Predictive Analytics: spot the business
problem, explore various data sources, extract patterns from data, build a sample model using
data and problem, Clarify data – find valuable factors – generate new variables, construct a
predictive model using sampling and validate and deploy the model. By using this method,
businesses can make fast decisions using vast amounts of data. There are three main benefits of
predictive analytics: minimizing risk, indentifying fraud, and pursuing new sources of revenue.
Being able to predict the risks involved with loan and credit origination, fraudulent insurance
claims, and making predictions with regard to promotional offers and coupons are all examples
of these benefits. It basically reduces the cost of making mistakes. This type of algorithm
allows businesses to test all sorts of situations and scenarios it could take years to test in the real
world. Studying customer behavior gives businesses a competitive advantage and allows them to
stay ahead of the competition in their market place.
Associations Discovery and Customer Purchases
Association analysis is useful for discovering interesting relationships hidden in large
amounts of data. There are two things to remember when using association analysis with regard
to market data: discovering patterns from a large transaction data set can be computationally
expensive and some of the discovered patterns are potentially spurious because they may happen
simply by chance.
Data Mining
Jamia Yant
Association discovery finds rules about items that appear together in an event such as a
purchase transaction. Market-basket analysis is a well-known example of association discovery.
This algorithm is used for recommendation engines. These engines are used to recommend
products to customers based on items they have already bought or shown interest in. This
provides a benefit to the business by allowing them to effectively stage their products, as well as,
knowing which customers to target for specific promotions or new products. (Two Crows Corp,
1999)
Web Mining to Discover Business Intelligence from Web Customers
Web data mining is the process of extracting structured information from unstructured or
semi-structured web data sources.)
Companies use web data mining as a tool to gather data
from different websites and collate it together to do analysis, build websites which provide
information from different websites. It helps the visitors to get a lot of information in one
location instead of reading information from different websites. For business intelligence,
competitiveness in the markets of ecommerce and the vast number of options customers have
today have forced business’s to employ marketing strategies that are built largely on data mined
from web mining. Web usage mining is critical for effective Web site management, creating
adaptive Web sites, business and support services, personalization, network traffic flow analysis
and more. Business intelligence keeps a business informed of market trends, alerts about new
avenues of generating revenue, and helps determine the status of the competition.
Clustering and Customer Information
Clustering analysis subdivides a market into distinct subsets of customers where any subset
may potentially be selected as a market target to be reached with a distinct marketing mix. This
Data Mining
Jamia Yant
type of analysis finds clusters of data objects that are similar in some sense to one another and
segments that data. (Oracle.com, 2008)
Businesses today collect information about what pages site users visit, and about the order in
which the pages are visited. Because the business provides online ordering, customers must log
in to the site. This provides the company with click information for each customer profile. By
using a clustering algorithm on this data, the business can find groups, or clusters, of customers
who have similar patterns or sequences of clicks. The business can then use these clusters to
analyze how users move through the Web site, to identify which pages are most closely related
to the sale of a particular product, and to predict which pages are most likely to be visited next.
Reliability of Data Mining Algorithms
Reliability of the data mining algorithms has opportunity for error and misuse. The algorithm
is only going to be reliable if they have gone through sufficient validation testing. The results
must be validated. Not all patterns discovered with data mining algorithms are going to be valid.
It is possible for a pattern to be discovered in the test data but not in the general population of the
data. There are three ways of measuring data mining: accuracy, reliability and usefulness.
Accuracy measures how the model correlates an outcome with the attributes in the data that has
been provided. Reliability focuses on how that mining model performs using different sets of
data. And Usefulness examines various metrics that tell you whether the model provides useful
information. It is possible for users of the algorithm to ask the wrong question, fail to test the
reasonableness of the results, ignoring discrepancies in the data, ignoring simple explanations
and building overly complex models, over generalizing from the results, using insufficient or
inadequate data or using a single data analysis tool.
Data Mining
Jamia Yant
Privacy Concerns when Data Mining Personal Information
In order to perform data mining, information must be gathered to enter into the system. This
information can contain private or confidential information that an individual did not release to a
third party. The data can also contain identifying information about the individuals that once the
data mine is performed is no longer anonymous. This can be a problem with regard to privacy.
Privacy is the right of individual’s to control information about them. With data mining there are
some valid concerns and they revolve around secondary use of the personal information,
handling misinformation, and granulated access to personal information. The data collected
could pose potential risks to the privacy of persons or organizations. These risks are not limited
to theft by fraud, actual identifications or incorrect identification that could threaten a person’s
life, livelihood, or reputation. There are documented cases where individuals have obtained a
person’s address and then physically did them harm. Pedophiles thrive on this type of data that
as technology advances leaves individuals vulnerable to all sorts of attacks. Hackers breaking
into large company databases have left many individuals the victim of identity fraud.
There are both mandatory and voluntary controls that cushion some of these concerns. There
are legal restrictions on the use of information and action that can be taken in the event of such
activity but it is a cumbersome and arduous process that can take a long period of time to recover
from. Sadly, the laws are lagging behind technology and insufficient to protect individuals
alone. The voluntary controls consist of technical, methodological and policy approaches to
limit opportunities for inappropriate access to insure the sound data with a desired outcome. In
some cases the consumer has no choice; we all have to give up certain information to buy homes,
vehicles and other necessities of life. But one should exercise caution from giving up essential
personal information unnecessarily.
Data Mining
Jamia Yant
Sample Business’s that have Incorporated Predictive Analysis into their Business
Successfully
1. Blue Cross and Blue Shield System (BCBS) is one organization that is already
deriving considerable benefits from predictive analytics. As an organization that
provides healthcare insurance to nearly one in three Americans, BCBS has amassed a
huge amount of claims-related data over the years. By applying predictive analytics
technologies to its vast trove of claims data, BCBS has been getting better at not only
identifying the risk factors that lead to several chronic diseases, but also identifying
individuals who are at heightened risk of getting such diseases (Vijayon, 2011)
2. Memphis Police Department (MPD) has enhanced its crime fighting techniques with
IBM predictive analytics software and reduced serious crime by more than 30 percent,
including a 15 percent reduction in violent crimes since 2006. MPD is now able to
evaluate incident patterns throughout the city and forecast criminal "hot spots" to
proactively allocate resources and deploy personnel, resulting in improved force
effectiveness and increased public safety. (Armonk, 2010)
3. Target used predictive analytics to determine based on past purchases if a woman
could be pregnant. Target assigns every customer a Guest ID number, tied to their
credit card, name, or email address that becomes a bucket that stores a history of
everything they’ve bought and any demographic information Target has collected
from them or bought from other sources. Using that, Target looked at historical buying
data for all the ladies who had signed up for Target baby registries in the past. They
successfully used this information to send out target coupons to improve the sales of
their maternity and baby products. (Hill, 2011)
Data Mining
Jamia Yant
References:

Two Crows Corporation (1999) Introduction to Data Mining and Knowledge Discovery,
http://www.twocrows.com/intro-dm.pdf

Angoss (2012) Predictive Analytics in the Cloud Solutions,
http://www.angoss.com/predictive-analytics-solutions/cloud-solutions

Oracle.com (2008) Oracle Data Mining Concepts,
http://docs.oracle.com/cd/B28359_01/datamine.111/b28129/clustering.htm

Tiwari,S. (2011) A Web Usage Mining Framework for Business Intelligence,
http://www.ijecct.org/v1n1/4.pdf

Kaith, R. (2011) Benefits of Predictive Analytics and Data Mining Services,
http://www.articlesnatch.com/Article/Benefits-Of-Predictive-Analytics-And-DataMining-Services/1394544#ixzz1wTRRkxKw

Vijayan, J. (2011) How Predictive Analytics can Deliver Strategic Benefits,
ComputerWorld.com,http://www.computerworld.com/s/article/9220131/How_predictive
_analytics_can_deliver_strategic_benefits

Armonk (2010) IBM: Memphis Police Department Reduces Crime Rates with IBM
Predictive Analytics Software, http://www03.ibm.com/press/us/en/pressrelease/32169.wss

Hill, K. (2011) How Target Figured Out a Teen Girl Was Pregnant Before Her Father
Did, Forbes.com , http://www.forbes.com/sites/kashmirhill/2012/02/16/how-targetfigured-out-a-teen-girl-was-pregnant-before-her-father-did/