Download - VTUPlanet

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
CHAPTER 2
THE DATA MINING TECHNOLOGY
Data Mining
Data mining is the process of sorting through large database or data warehouse and extracting
knowledge interested by the people. The extracted knowledge may be represented as concept, rule,
law and model. The purpose of data mining is to help the decision- maker in order to find potential
association between data, found neglected elements which might be very useful for trends and
decision- making behavior. It has been described as “the nontrivial extraction of implicit, previously
unknown, and potentially useful information from data” [5] and “the science of extracting useful
information from large data sets or databases” [6].
Data mining identifies trends within data that go beyond simple analysis. Through the use of
sophisticated algorithms, non-statistician users have the opportunity to identify key attributes of any
kind of real life problems like Intrusion Detection Activities, Face recognition problem, Image
processing, business processes and any other target opportunities. However abdicating control on
these process from the statistician to the machine may or may not result in positives or useful results
[1] until one can assure that the data on which the operations are supposed to be performed are
complete in all respect. Figure 2 shows the basic approach of Data Mining.
Figure 2.1: The transition from raw data to valuable knowledge.
Here are a few specific things that data mining might contribute to an intrusion detection project:
•
Remove normal activity from alarm data to allow analysts to focus on real attacks
•
Identify false alarm generators and ”bad” sensor signatures
•
Find anomalous activity that uncovers a real attack
•
Identify long, ongoing patterns (different IP address, same activity)
To accomplish these tasks, data miners employ one or more of the following techniques:
•
Data summarization with statistics, including finding outliers
•
Visualization: presenting a graphical summary of the data
•
Clustering of the data into natural categories
•
Association rule discovery: defining normal activity and enabling the discovery of anomalies
•
Classification: predicting the category to which a particular record belongs