Download From Cognitive Science to Data Mining: The first intelligence amplifier

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

The Measure of a Man (Star Trek: The Next Generation) wikipedia , lookup

Pattern recognition wikipedia , lookup

Data (Star Trek) wikipedia , lookup

Time series wikipedia , lookup

Transcript
From Cognitive Science to Data Mining: The first
intelligence amplifier
Tom Khabaza
Abstract
This paper gives a brief account of two hypotheses. First that data mining is
a kind of intelligence amplifier, and second that machine learning algorithms
inspired by ideas from cognitive science contributed significantly to the field of
data mining.
1. Introduction: Intelligence Amplifiers and Data Mining
Intelligence Amplification Ashby (1956); Licklider (1960); Engelbart (1962)
refers the idea that the products of Artificial Intelligence will be used initially,
not to create fully intelligent machines, but to amplify or increase the power of
human intelligence. Data mining Berry and Linoff (1997); Helberg (2002) is one
such intelligence amplifier; data mining algorithms form the core of a process
which amplifies our ability to detect and act upon patterns in large quantities of
data.
Whether data mining is really the first intelligence amplifier is open to debate;
perhaps it is the first intelligence amplifier in widespread use. The purpose of
this claim is to emphasise that data mining enhances our mental abilities in a
way which is much closer to the idea of intelligence amplification than most of
the widespread use of IT.
2. Historical Background: Poplog, Clementine and CRISP-DM
During the 1980s, the Poplog AI programming environment du Boulay et al.
(1986) (developed at Sussex University under the leadership of Aaron Sloman)
Email address: [email protected] (Tom Khabaza)
From Animals to Robots and Back
September 8, 2011
was sold in the non-academic market by Systems Designers Ltd, which later became SD-Scicon. A management buyout from SD-Scicon in 1989 created Integral Solutions Ltd (ISL), whose core business was initially Poplog. At this stage,
ISLs product range included two machine learning modules based on decision
trees and neural networks, and ISLs early business included a series of projects
which applied machine learning to extract useful patterns from customers data
that is, data mining projects Fitzsimons et al. (1993). Based on the experience
of these projects, Colin Shearer invented the Clementine data mining workbench
Khabaza and Shearer (1995).
Despite being the first practitioner to execute ISLs commercial data mining
projects, I was initially sceptical about the prospects for data mining and the
Clementine workbench. Clearly the machine learning techniques used for data
mining could not in themselves solve business problems of any significance; how
then could data mining technology be of practical use?
The answer, which emerged from successive projects, lay in the data mining
process. Clementine had the then unique property of making data mining algorithms (at that time synonymous with machine learning algorithms) accessible to
non-technologists. This meant that the process of understanding and preparing
the data, applying the algorithms, and interpreting and using the results, could
be executed by or in close collaboration with people whose primary knowledge
was in the business domain Shearer and Khabaza (1995). This in turn meant
that business knowledge and understanding could be closely integrated with data
mining technology in the process of business problem-solving, without falling
foul of the limitations of machine knowledge representation.
The design of Clementine, and the business-oriented data mining process
which it enabled, were highly influential, and could be said to have shaped modern data mining practice and tools. The business-oriented process was later standardised in the data mining methodology CRISP-DM Chapman et al. (1999).
3. Data Mining
Data mining is the use of business knowledge to create new knowledge in
natural or artificial form by discovering and interpreting patterns in data. The
term business is used here to emphasise the use of data mining for practical purposes, but the definition would be equally correct if business were replaced with
domain. At heart, data mining is a business process, and is used in a wide variety
of applications, including customer analytics, fraud detection, risk management
and law enforcement, and also in science and medicine.
174
Figure 1: CRISP-DM diagram.
The more recent term Predictive Analytics usually refers to complete solutions in which data mining is embedded. Data mining is distinguished from other
forms of data analysis by the use of data mining algorithms, also sometimes
called predictive modelling algorithms. Knowledge in artificial form refers to
the output of these algorithms, predictive models or data mining models, which
are used to increase information locally on the basis of generalisation, and are
often embedded in Predictive Analytics solutions.
The industry standard data mining methodology is called CRISP-DM [CRISPDM] (which stands for CRoss-Industry Standard Process for Data Mining), and
is depicted in Figure 1.
CRISP-DM was created by a research consortium, based on consultation
with a wide circle of practicing data miners; during this consultation process,
it was discovered that all practicing data miners had independently discovered
approximately the same process for successful data mining.
CRISP-DM provides an accurate picture of how data mining is carried out,
but omits some key properties of the data mining process, and does Figure 1:
CRISP-DM diagram not explain why the process has the form that it does.
4. 9 Laws of Data Mining
Attempting to answer some nagging questions about data mining, I have recently published the 9 laws of data mining Khabaza (2010), listed below:
175
1. Business objectives are the origin of every data mining solution (Business
Goals Law)
2. Business knowledge is central to every step of the data mining process
(Business Knowledge Law)
3. Data preparation is more than half of every data mining process (Data
Preparation Law)
4. The right model for a given application can only be discovered by experiment or There is No Free Lunch for the Data Miner (NFL-DM)
5. There are always patterns (Watkins Law)
6. Data mining amplifies perception in the business domain (Insight Law)
7. Prediction increases information locally by generalisation (Prediction Law)
8. The value of data mining results is not determined by the accuracy or stability of predictive models (Value Law)
9. All patterns are subject to change (Law of Change)
These laws address many aspects of the data mining process, but in this paper
I will focus on the 6th law: Data mining amplifies perception in the business
domain. This is also called the Insight Law because in data mining the creation
of new knowledge in natural form (knowledge in the head) is often described as
producing insight, this being one of the two types of result from data mining, the
other being predictive models.
5. From Intelligence to Perception
How and why does the data mining process produce new knowledge? The
data mining process is essentially one of problem-solving; the business expert
works out how to achieve an objective in the business domain. Business problems are solved by humans, not by algorithms, so how does data mining play a
part in this?
The key issue addressed by data mining is that there may be useful information buried in data, where the required volume of data is too large for patterns
to be seen unaided. (Watkins Law indicates that such information is always
present.) A conventional view of data mining would suggest that business goals
are translated into data mining goals, then the algorithms are applied to the data,
producing predictive models; these models are used to make predictions and
help guide business decision-making in such a way as to help achieve the business goal. However, this view omits two crucial factors one is the pervasive
role of business knowledge (as per the 2nd law) and the other is the production
176
of insight, or new knowledge. It is on this second shortcoming that I will now
focus.
While data mining may indeed produce predictive models to aid decisionmaking, both the models themselves and the process that produces them can also
tell us new things about the business or domain. The process of understanding
and preparing the data means examining the data in a great deal of detail, and
new facts often emerge from this process; the data themselves have no intrinsic
meaning, but when interpreted in the light of business knowledge the data often
reveal important new information about the business, even before data mining
algorithms are applied. When predictive models are produced, these will also
often tell us important information about the business this may be revealed by
the behaviour of the model, or by the model itself, such as the readable rules in a
decision-tree model, or by the relative importance of different input variables in
unreadable models. Again this information has no intrinsic importance, but can
be seen to be important when interpreted in the light of business knowledge.
It is a characteristic of these processes that they take place in the business
domain; every piece of data and every action has a business meaning. The data
miner works, not in the realm of bits, bytes and algorithms, but in the domain
of enquiry. The data mining process enables the data miner to see things which
would not be visible unaided. We know that perception is an active, knowledgebased process. The data miner sees things in the business domain by knowing
what they are looking at.
My first hypothesis in this paper is that data mining amplifies perception in
the following way: data mining algorithms can detect patterns in data which
are not visible to the naked eye, but the algorithms themselves have no domain
knowledge. The business expert has the business knowledge but cannot see the
patterns unaided. The data mining process (as described by CRISP-DM) enables
the business expert to incorporate the pattern discovery capabilities of the algorithms into their own perceptual process. There is nothing mysterious about this
the process is mostly a codification of common sense but it explains why data
miners have the experience of seeing things in the data. It is because data mining
is like a perceptual process.
I have always wondered why machine learning algorithms (from the field
of AI) seem to work better for data mining than those originating in the field of
statistics. My second hypothesis in this paper is that machine learning algorithms
work well for data miners because they are designed to be part of a cognitive system. Machine learning systems tend to be based on intuitively plausible models
of knowledge. For the purposes of the data miner, it matters little whether these
177
models are correct descriptions of human cognition; what makes them helpful
for data miners is the plausible nature of the knowledge they create or the patterns they discover. This makes the algorithms easier to use as an extension of
ones own cognition.
6. Conclusion: The Impact of Cognitive Science
A birds-eye view of the activities of data miners in organisations would not
immediately reveal anything to do with cognition. A data miner appears to (and
does in fact) work in the domain of application they would seem like marketeers, or fraud detection operatives, or police intelligence officers, or geneticists,
or medics. They are exactly this, but augmented by having their perceptual abilities, within their domain of operation, enhanced by the ability to see meaningful
patterns in data. Data mining is acting, for data miners, as an intelligence amplifier.
This kind of intelligence amplifier does not provide the expanded human intellect envisioned by Ashby Asaro (2008); nevertheless, the expanded perceptual
abilities of data miners can be used to make the world a better place (e.g. Van
(2003); Piatetsky-Shapiro et al. (2003); Adderley and Musgrove (1999); McCue
(2006); Chang and Shyue (2009)).
If my second hypothesis is correct, then this ability of data mining to enhance
the perception of domain workers is the result of the output of Cognitive Science
research. By focussing on cognition, we have produced tools which can become
part of cognition.
References
Adderley, R., Musgrove, P., 1999. Bcs special group expert systems. In: Data mining at the West
Midlands Police: A study of bogus official burglaries. Springer-Verlag, London, pp. 191–203.
Asaro, P., 2008. From mechanisms of adaptation to intelligence amplifiers: The philosophy of
w. ross ashby. In: Husbands, P., Holland, O., Wheeler, M. (Eds.), The Mechanical Mind in
History. MIT Press.
Ashby, W., 1956. An Introduction to Cybernetics. Chapman and Hall.
Berry, M., Linoff, G., 1997. Data Mining Techniques: For Marketing, Sales and Customer Support. Wiley.
Chang, C., Shyue, S., 2009. A study on the application of data mining to disadvantaged social
class in taiwan’s population census. Expert Systems with Applications 36, 510–518.
Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R., 1999.
Crisp-dm 1.0: Step-by-step data mining guide. http://www.crisp-dm.org.
du Boulay, J., Khabaza, T., Elsom-Cook, M., Taylor, J., 1986. Poplog and the learner: An artificial intelligence environment used in education. In: Directory of Computer Training 1986.
Badegmore part Enterprises for Hoskyns Education.
178
Engelbart, D., Oct 1962. Augmenting human intellect: A conceptual framework. Tech. Rep.
Summary Report AFOSR-3233, Stanford Research Institute, Menlo Park, CA.
Fitzsimons, M., Khabaza, T., Shearer, C., November 1993. The application of rule induction and neural networks for television audience prediction. In: Proceedings of ESOMAR/EMAC/AFM Symposium on Information Based Decision Making in Marketing. Paris,
pp. 69–82.
Helberg, C., 2002. Data Mining with Confidence. SPSS Inc., Chicago.
Khabaza, T., 2010. Nine laws of data mining. www.khabaza.com/9laws.
Khabaza, T., Shearer, C., 1995. Data mining with clementine. In: IEE Colloquium on Knowledge
Discovery in Databases in Digest No 1995/021(B). IEE, London.
Licklider, J., 1960. Man-computer symbosis. IRE Transactions on Human Factors in Electronics
HFE-1, 4–11.
McCue, C., 2006. Data Mining and Predictive Analysis: Intelligence Gathering and Crime Analysis. Butterworth-Heinemann.
Piatetsky-Shapiro, G., Khabaza, T., Ramaswamy, S., August 2003. Capturing best practice for
microarray gene expression analysis. In: SIGKDD 2003.
Shearer, C., Khabaza, T., 1995. Data mining by data owners. In: Intelligent Data Analysis.
Baden-Baden, Germany.
Van, J., 11th January 2003. Spss tools unravel secrets of disease. Chicago Tribune.
179