Download Disadvantages of Data Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Process tracing wikipedia , lookup

Transcript
College of Computer Engineering & Science
Fall Semester 2012/2013
Artificial Inelegance COSC 4362
Home Work #4
Data Mining
Sarah Al-Bassam
200800790
Background
Data mining is a computer science field that attempts to discover patterns in
large data sets. It uses methods at the intersection of artificial intelligence, machine
learning, statistics, and database systems. The ultimate goal of the data mining
process is to get information from a data set and convert it into an understandable and
clear structure for further use. Some alternative names of data mining are: knowledge
discovery in database (KDD), knowledge extraction, pattern/data analysis, data
archeology, data dredging, information harvesting, and business intelligence. [3].
The main task of data mining is to analyze large quantities of data
automatically or semi-automatically to extract the unknown interesting patterns, such
as, groups of data records (cluster analysis), unusual records (anomaly detection) and
dependencies (association rule mining). This analysis includes using database
techniques, for example, spatial indexes. At the end, these patterns being extracted are
seen as a summary of the input data, and can be used in additional analysis, such as,
machine learning and predictive analytics [2].
Data mining includes six classes of tasks. The first class of task is named anomaly
detection, outlier, change, or deviation detection. This class of task identifies the
uncommon data records or data errors and requires further investigation. The second
class of task is called association rule learning or dependency modeling. It searches
and looks for relationships between variables. To illustrate, a supermarket is able to
collect data on customer purchasing habits. By considering the association rule
learning, the supermarket can determine the products which are regularly bought
together and use this information for marketing purposes. This is known as market
basket analysis. The third class of task is known as clustering. This task discovers
groups and structures in the data that are related or similar, without using known
structures in the data. Another class of task is classification. This task generalizes the
known structure to apply to new data. For example, an e-mail program attempts to
categorize an e-mail as "junk" or as "legitimate ". Additionally, Regression is a class
of task that tends to find a function that can model the data with the least error.
Finally, the sixth class of task is summarization. This task provides a compact
representation of the data set, including visualization and report generation [3].
Applications
Data mining applications are defined as computer software programs that
extract and identify of patterns from stored data. These applications are usually a
software
interface
that
interacts
with
a
large
database
that
has
important data. Data mining is broadly used by companies and public bodies, for
example, marketing, detection of fraudulent activity, and scientific research. [4]
There
are
many
data mining applications
for
business
uses,
for
example Customer Relationship Management (CRM). Such an application allows
marketing managers to know the behaviors of their customers so that they can predict
the possible behavior of future clients. To exemplify, a company might decide to
increase prices, and by using data mining, the company can predict the number of
customers that the company will loss for a specific percentage increase in product
price. [4]
Another application of data mining is text mining. It is defined as the process
of having high quality information from text; for example, security application tools,
such as, the study of text encryption and decryption, analyzing online text such as
news and blogs for notational security purposes. Another example of text mining are
biomedical applications, such as PubGene which is a combination of biomedical data
mining and network virtualization as an Internet service. [3]
Web mining is another application of data mining that discovers patterns from
the web. It has three types which are web usage mining, web content mining, and web
structure mining. Web usage mining means extracting valuable information from
server logs, such as, user's history. It enables knowing what users are looking for
on Internet. Web structure mining is the technique of using graph theory in the
process of analyzing the node and connection structure of a web site. Web content
mining is the process of extracting and integrating useful data and knowledge from
Web page contents. [3]
Advantages of Data Mining
Marketing / Retail
Data mining supports marketing companies in building models based
on historical data and that helps in predicting who will respond to new
marketing campaign. Accordingly, marketers will have a clear vision about the
profitable products which target customers with high satisfaction. [1]
Finance / Banking
Data mining provides financial institutions information regarding loan
information and credit reporting. Depending upon previous customer’s data
with common characteristics to build a model, the bank and financial can
approximate the god and bad loans and its risk level. Further, data mining
helps banks to identify fraudulent credit card transaction which helps credit
card’s owner to avoid losses. [1]
Governments
Data mining supports government agency by examining and studying
records of financial transaction and building patterns that can identify money
laundering or criminal activity. [1]
Disadvantages of Data Mining
Privacy Issues
The personal privacy is a major concern for many individuals mainly
when internet has many social networks, e-commerce, forums, and blogs.
Many people are concerned that their personal information will be collected
and used in unethical way. Also, businesses gather information about their
customers in more than one way in order to understand their purchasing
behaviors trends. However, some days, the businesses may be acquired by
others, bur the personal information they own might be sold to other or leaks
out. [1]
Security issues
Businesses normally have information about their employee and
customers, such as, social security number, birthday, and payroll. However
this information is not always taken properly. There have been a large number
of cases were hackers accessed and stole important data of customers from big
corporation, for example, Ford Motor Credit Company, and Sony. [1]
Misuse of information/inaccurate information
Information collected through data mining can be misused. This
information is misused by unethical people or business to take advantage of
vulnerable people or discriminate against a group of people. Moreover, data
mining technique is not always accurate, and when inaccurate information is
used for decision making it will cause serious consequence. [1]
References
[1]
Advantages and Disadvantages of Data Mining. [Online]. Available:
http://www.zentut.com/data-mining/advantages-and-disadvantages-of-data-mining/
[Accessed: December. 18, 2012].
[2]
Data Mining [Online]. Available: http://www.slideshare.net/Tommy96/data-
miningppt-4035580#btnNext [Accessed: December. 17, 2012].
[3]
The
Wikipedia
website
[Online].
Available:
http://en.wikipedia.org.
[Accessed: December. 15, 2012].
[4]
What
are
Data
Mining
Applications?
[Online].
Available:
http://www.wisegeek.com/what-are-data-mining-applications.htm#did-you-know
[Accessed: December. 14, 2012].