Download data mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
DATA MINING
An annotated bibliography
Thesis statement. Data mining means searching for certain patters within large sets of
data, which brings a lot of possibilities for business managers and decision makers. By
analyzing those patterns the better business decisions can be made in order to bring
businesses toward financial and entrepreneurial success.
Keywords: data mining, knowledge discovery in databases (KDD), data mining
technologies (DMT), decision support systems (DSS)
Academy for Computing Machinery. (2007). SIGKDD. ACM Special Interest Group on
Knowledge Discovery and Data Mining. Retrieved February 12, 2007 from
http://www.acm.org/sigs/sigkdd/webcasts.php. [Authoritative website].
This website belongs to one of the special interests group (SIG) of ACM, one of the first
academic societies that promoted computational research. This website is rather simple
looking. It has all necessary information about this group, lists the people involved in it
including their affiliations and the affiliations are all academic organizations for most of the
time. The links are working. There is also a newsletter publication available from this
website, and within the publications are the latest news in the field of data mining and
knowledge discovery. The targeted audience is primarily professionals in various computer
science disciplines, but general audiences also may benefit from the materials presented on
this site. Overall, this is a very useful website, when researching data mining.
Bramer, M.A. (1999). Knowledge discovery and data mining. London, UK: The Institution of
Electrical Engineers. [Book]
Huge volumes of data are stored in the data warehouses around the world. Without
examining it could get lost and never used for the needs of humanity. This book addresses
issues of data mining within different research subjects: chemistry, medical diagnosis, electric
load prediction, and many others. Part 1 examines a broad spectrum of technical issues of
knowledge discovery and data mining; part 2 contains articles on practical applications of
knowledge discovery and data mining. These practical applications are within fields of
health-information analysis, meteorology, chemistry and electricity-supply industry. This
book can be helpful to the researchers within the fields it is applied to and knowledge
professionals in general as well. However, the editor emphasizes that the knowledge of a
discipline of application is required in order to conduct a successful data mining experiment.
Ganguly, A.R, Gupta, A., & Khan, S. (2006). Data mining and decision support for
business and science. In J. Wang (Ed.) Encyclopedia of data warehousing and
mining (Vol.I, pp.233-238). Hershey, PA; London, UK : Idea group reference.
[Reference book]
This article introduces the field of data mining for business and science. The authors are
affiliated with the research and academic institutions in the United States, such as
University of Arizona and University of South Florida, which leads the reader to believe
that they have good background knowledge on a subject of data mining. The article
begins with the introduction, where readers can get acquainted with the subject of data
mining and the technologies and applications that are involved in data processing. There
are a lot of abbreviations used in this article. The main idea of the article is introduced in
Main thrust (it is a typical entry construction in this particular reference source) where
authors discuss scientific and business applications, present overview of emerging
technologies, previous approaches, and discuss common features of data mining for
science and technologies presenting at the same time particularities that are specific to
either science or business. This source contains a lot of references to the scientific and
technical literature, including journals, books and authoritative web sites (NASA, for
example.) The article has an extensive list of references at the end. The intended
audience is academic and business populations who are interested in data mining
applications and would like to find quick information that will direct them to further
resources on this subject. The article contains two tables that present analytical
information technologies (data mining and decision support systems) and examples of
their applications.
Guernsey, L. (2003, October 16). Digging for nuggets of wisdom. The New York Times, p.
G1. [Newspaper article].
Written by a journalist, this article is very informative and explains the data mining use
applications in various fields of science. Emphasizing that the amount of information
available on the web and in print is overwhelming and difficult to analyze, the author turns to
the practitioners who already figured out how to search through the vast amounts of data. For
example, Dr. Liebman uses a statistical software SPSS in order to do text mining, which is
derived from the idea of data mining. The main idea of an article is that it is possible to deal
with the amount if information, but it takes an intelligent human being in order to make sense
out of the results. The language of the article is popular, easily understood by general readers.
New York Times is one of the most reputable newspapers in the country; hence the article can
be of use to many readers that never heard of a concept of data or text mining. Those readers
will find the idea very practical and interesting, if not fascinating.
Hu, J., & Zhong, N. (2006). Organizing multiple data sources for developing intelligent ebusiness portals. Data Mining and Knowledge Discovery, 12(2-3), 127-150.
[Scholarly article].
Both authors are affiliated with Maebashi Institute of Technology in Japan. This article
addresses applications of data mining in business enterprise. It is organized into separate
parts beginning with introduction of a subject – creating and managing e-portals that serve as
gateways to personalized information. The authors present a three-tier work-flow model.
Those levels are data-flow, mining-flow, and knowledge-flow. All three of them contribute to
the model of a multi-layered grid, which is essential for creating e-portal. The article
logically follows the literature review and previous experience, and then there are discussions
on a major subject supplied with the graphs, tables and computations. It is a scholarly article
directed towards professionals in data mining and knowledge discovery. It is written in a
technical language that is best understood by specialists. However the average person can
make a sense of the concept by reading an introduction. It is a useful article for those who are
involved in a scientific research of data mining for business applications.
Kohavi, R. & Provost, F. (2001). Applications of data mining to electronic commerce. Data
Mining and Knowledge Discovery, 5, 5-10. [Secondary source].
A rather critical analysis of what is going on with data mining in e-commerce. The authors
talk a lot about problems and issues in this particular field acting rather cautiously about its
utilizing. “High potential reward, accompanied by high risk” – it seems to be a main theme of
this article. Written in a clear understandable language, it could be useful to business
managers and information specialists of very broad interests. It is also a literature review that
tries to summarize what was written before and what current problems are. One theme seems
to be present in every reviewed paper – a problem-specific knowledge and how to incorporate
this kind of knowledge into knowledge discovery process. At the same time it is a
philosophical essay rather than a technical article. No formulas, or graphs, or charts, just
analysis and critical opinions – this is a differentiating point from the majority of articles
written by scientists. At the end the authors express some cautious optimism about future
studies of data mining in e-commerce, pointing out that there are a lot of issues to be solved.
This is a secondary source because it addresses previously done research instead of proposing
a new original method or idea.
Kutz, G.D. (2003). Data mining: Results and challenges for government program audits and
investigations. Testimony before the Subcommittee on Technology, Information Policy,
Intergovernmental Relations and the Census, Committee on Government Reform,
House of Representatives. Washington, D.C.: United States General Accounting
Office. Retrieved January 30, 2007, from http://www.gao.gov/new.items/d03591t.pdf.
[Government document]
This document covers the issues of internal control within certain government agencies, such
as Department of Defense (DOD). The use of government credit cards was tracked down
using data mining techniques in order to scrutinize the vendors and the appropriateness of the
expenses by government employees. This process helped to uncover many abuses and waste
of government funds and helped to improve control over the travel spending. Even though
this document is written in an official language, it is easy to understand for a student or a lay
person. A summary at the beginning of a document and conclusions at the end help readers to
make clearer picture of a problem and its solution. A list of related publications by GAO is
available at the end of this paper. This source is certainly very helpful for those who would
like to learn about practical applications of data mining.
Lee, J.H., & Park, S.C. (2003). Agent and data mining based decision support system and its
adaptation to a new customer-centric electronic commerce. Expert Systems with
Applications, 25(4), 619-635. doi:10.1016/S0957-4174(03)00101-5. [Online journal]
Electronic commerce (e-commerce, EC) is a new fast developing way of conducting business.
In order to be competitive the manufacturing companies use Internet not only for promotion,
but also to buy and sell. It’s crucial for manufacturers to learn about their potential buyers’
buying behaviors and preferences in order to market their products. This article is devoted to
a new customer-centric e-commerce model using a concept called process transparency.
“Transparency is a knowledge-based concept that implies participants have intelligence about
market around them” according to the authors. Data mining process was successfully
integrated into the proposed EC model for the generation of optimal sampling method.
Mukherjee, S., Chen, Z., & Gangopadhyay, A. (2006). A privacy-preserving technique for
Euclidean distance-based mining algorithms using Fourier-related transforms.
[Electronic version]. VLDB Journal, 15, 293-315. [Primary source].
This article is a good example of a primary source. It is written by researchers from the
University of Maryland Baltimore County. The authors propose their own algorithm for
improvement of data mining methods. It is important issue especially when dealing with
large amounts of data. The problem is that often data is stored in one place and analyzed in
another, and then the third party is responsible for analyzing this data. It means that data
should be stripped of some personal characteristics in order to preserve the privacy of
customers’ information. The authors of this article came up with their original idea using
already existing Fourier-related transforms. The article is addressing the professional
researchers in the field of data mining hence the language of the article is very technical and
specifically oriented to the people working within data mining field. There are a lot of charts
and mathematical algorithms that prove and illustrate the idea of the proposed method.
Published in an academic journal this article is a good example of a primary source in
sciences.
SPSS, Inc. (2007). SPSS. Data mining. February 11, 2007 from
http://www.spss.com/data_mining/ [Authoritative website].
This is information about data mining provided by a company that introduces
pioneering software for statistical analysis (SPSS stands for Statistical Package for the Social
Sciences). Now SPSS is considered as one of the leading companies on data mining research.
Their product, Clementine, was one of the first data mining tools back in 1994. This web site
is well organized, the address and contact information is clearly shown. The list of related
business problems that can be addressed by SPSS products makes the search clear and
straightforward. The links are working and the only advertisement present is within the links
to the company’s products.