Download Uses of Data Mining in Information Operations

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Data Mining
Pat Talbot
Ryan Sanders
Dennis Ellis
11/14/01
Background
TRW
• 1998 Scientific Data Mining at the JNIC (CRAD)
• Tools: Mine Set 2.5 was used. Trade study =>Clementine scored highest, but $$$
• Training: Knowledge Discovery in Databases conference (KDD ’98)
• Data Base: Model Reuse Repository with 325 records, 12 fields of metadata
• Results: “Rules Visualizer” algorithm provided “prevalence” and “predictability”
Clustering algorithm showed correlation of fields
• 1999 Wargame 200 Performance Patterns (CRAD)
• Tools: Mine Set 2.5 was again used
• Training: KDD ’99 tutorials and workshops
• Data Base: numerical data from Wargame 2000 performance benchmarks
• Results: insight into the speedup obtainable from a parallel discrete event simulation
• 2000 Strategic Offense/Defense Integration (IR&D)
• Tools: Weka freeware from the University of New Zealand
• Training: purchased Weka text: Data Mining by Ian Witten and Eibe Frank
• Data Bases: STRATCOM Force Readines, heuristics for performance Metrics
• Results: Rule Induction tree provided tabular understanding of structure
• 2001 Offense/Defense Integration (IR&D)
• Tools Weka with GUI. Also evaluated Oracle’s Darwin
• Database: Master Integrated Data Base (MIDB), 300 record, 14 fields (uncl)
• Results: Visual display of easy to understand IF_THEN rule is a tree structure
Performance benchmarking of file sizes and execution time
2
TRW Private/Proprietary
Success Story
TRW
• Contractual Work
• Joint National Integration Center (1998 – 1999)
• Under the Technology Insertion Studies and Analysis (TISA) Delivery Order
• Results: “Rules Visualizer” algorithm provided “prevalence” and “predictability”
Clustering algorithm showed correlation of fields
• Importance:
• Showed practical applications of the technology
• Provided training and attendance at at Data Mining Conferences
• Attracted the attention of the JNIC Chief Scientist who used it for analysis
3
TRW Private/Proprietary
Data Mining Techniques
Uses algorithmic* techniques for
information extraction:
Shallow Data
Multi-Dimensional Data
(discover with SQL)
- Rule Induction
- Neural Networks
- Regression Modeling
- K Nearest Neighbor Clustering
- Radial Basis Functions
(discover with OLAP)
Hidden Data
(discover with Weka)
Preferred when an explanation
is required
Deep Data
(discover only with
clues)
* Non-parametric statistics, machine learning, connectionist
Data Mining Process
Data Sources:
• STRATCOM Data Stores
• Requirements Database
• Simulation Output
• MIDB
Data
Mining
Patterns
& Models
Transformation
& Reduction
Storage/Retrieval
Mechanisms
Indexing:
• Hypercube
• Multicube
Database
Datamart
Warehouse
Preprocessing
& Cleaning
Situation
Assessment
Visualization
Evaluation
Target Data
Knowledge
Data Mining Flow Diagram
External
Inputs
Weka
Data Mining
Software
Force
Readiness
Input
Processing
Data File
Defaults:
• Algorithm
Choice
Quinlan Rule
Induction Tree
• Output
Parameters
Clustering
Algorithm
• Control
Parameters
TRW
Output
Rule Tree / Clusters
• underlying structure
• patterns
• hidden problems
- consistency
- missing data
- corrupt data
- outliers
- exceptions
- old data
Link to Effects
6
TRW Private/Proprietary
Current Objectives
TRW
• Automated Assessment and Prediction of Threat Activity
• Discover patterns in threat data
• Predict future threat activity
• Use data mining to construct Dempster-Shafer Belief Networks
• Integrate with Conceptual Clustering, Data Fusion, and Terrain Reasoning
Terrain Reasoning
Conceptual Clustering
new
concepts
text
Noun
Geographic
data on routes
threat
Movements
isa
automate
automate
attributes
outcome
Relations
Rules
7
evidence
attributes
cluster
new
patterns
evidence
?
likely
activities
Data Fusion
Data Mining
IF THEN
Belief in
hypotheses
?
automate
Hypothesis Impact
TRW Private/Proprietary
Rule Induction Tree Format
TRW
• Example: Rule induction tree for weather data
Data Base:
# Outlook Humidity Winds
1 Sunny
51%
True
2 Rainy
70%
False
3 .
.
n .
Had no effect!
Temp
78
98
Deploy
Yes
No
Decision Supported:
Weather influence on
asset deployment
no (3.0/2.0)
3 met the rule, 2 did not
8
Deploy Asset?
TRW Private/Proprietary
Results: READI Rule Tree
TRW
Reference: Talbot,P., TRW Technical Review Journal, Spring/Summer 2001, Pages 92,93.
Quinlan
C4.5 Classifier
C-0
training
C- 0 rating
48 occur
>4
- Links are IFs
Overall Training
C-1
training
Mission Capable
C-3
training - White Nodes are THENs
- Yellow nodes are C-ratings
C-2
training
C-3 rating
16 occur
<=4
Authorized Platforms
C-5 rating
16 occur.
>9
<= 2
>2
<=9
Category
C-1 rating
Comments
C-3 rating 564 occur
Edited
ICBM
16 occur
Spares
C-5 rating
C-3 rating
Training
C-1 rating
4 occur
16 occur
C-2 rating
16 occur
C-2 rating
16 occur
60 occur
Platforms Ready
Example Rule: In the “Overall Training” database, if C-1 training was received, then all but 4
squadrons are Mission Capable and all but 2 are then Platform Ready. If Platform Ready, 564 are
9
then
rated C-1.
TRW Private/Proprietary
Training
Results: READI Clustering
TRW
STRATCOM Force Readiness Database
• 1136 instances
C=3
Readiness
Ratings
C=2
C=1
Category
10
TRW Private/Proprietary
TRW
Results: Heuristics Rule Tree
Six Variable Damage Expectancy
Variable Type
SVbr
ARbr
ARrb
DMbr
Intent
90%
14 occur.
Weapon Type
Destroy
Defend
Preempt
100%
4 occur.
80%
2 occur.
C
60%
4 occur.
Deny
Retaliate
90%
14 occur./2 misses
70%
8 occur
5 misses
DMrb
N
90%
6 occur./1 miss
70%
14 occur./1 miss
Deny
40%
2 occur./1 miss
Destroy
Retaliate
100%
2 occur.
90%
2 occur.
Arena
Strategic
Preempt
Arena
11
Intent
Defend
50%
2 occur.
IF Variable type is DMbr
AND
Weapon type is conventional THEN
DMbr=80% occurs 8 times AND does
not occur 5 times
SVrb
Strategic
20%
2 occur./1 miss
40%
2 occur./1 miss
Tactical
100%
2 occur.
20%
2 occur./1 miss
Tactical
10%
6 occur./1 miss
TRW Private/Proprietary
Results: MIDB – 1
TRW
MIDB Data Table: difficult to see patterns!
10001001012345,TI5NA,80000,DBAKN12345,000001,KN,A,OPR,40000000N,128000000E,19970101235959,0290,SA-2
10001001012346,TI5NA,80000,DBAKN12345,000002,KN,A,OPR,39500000N,127500000E,19970101225959,0290,SA-2
10001001012347,TI5CA,80000,DBAKN12345,000003,KN,A,OPR,39400000N,127400000E,19970101215959,0290,SA-3
.
.
10001001012345,TI5NA,80000,DBAKN12345,000001,KN,A,OPR,40000000N,128000000E,19970101235959,0290,SA-2
Rules that determine if a threat Surface-to-Air site is operational (OPR) or not (NOP):
1) if SA-2 and lat <= 39.1 then 3 are NOP
2) if SA-2 and lat > 39.1 then 9 are OPR
3) if SA-3 and lat<= 38.5 then 3 are OPR
4) if SA-3 and lat > 38.5  then 9 are NOP
5) If SA-13 then 6 are NOP
MIDB rule tree: easy to see patterns!
12
TRW Private/Proprietary
TRW
Results: MIDB 300 Records
Rules that determine if a threat Surface-to-Air site is operational (OPR) or not (NOP):
1) if lat > 39.4 then 59 are OPR, 2 aren’t
2) if lat <= 39.4  and lon >127.2  then 56 are NOP, 2 aren’t
3) if lon <= 127.2 and lat > 39.1  then 31 are OPR, 2 aren’t
4)
If lon <=127.2 lat <= 39.1  and if:
SA-2 then 30 are NOP
SA-3 then if lat <= 38.5 then 30 are OPR, 1 isn’t
SA-3 then if lat < 38.5 then 30 are NOP, 1 isn’t
SA-13 then if lat <= 36.5 then 2 are OPR
SA-13 then if lat >36.5 then 60 are OPR, 6 aren’t
13
TRW Private/Proprietary
Current Work: SUBDUE
TRW
Hierarchical Conceptual Clustering
• Structured and unstructured data
• Clusters attributes w/graphs
• Hypothesis generation
Subclass 7: SA-13s (29)
at (34.3 N, 129.3 E)
are not operational
14
TRW Private/Proprietary
TRW
Applicability
Benefits:
• Automatically discovers patterns in data
• Resulting rules are easy to understand in “plain English”
• Quantifies rules in executable form
• Explicitly picks out corrupt data, outliers, and exceptions
• Graphic user interface allows easy understanding
Example Uses:
• Database validation
• 3-D sortie deconfliction
• Determine trends in activity
• Find hidden structure and dependencies
• Create or modify belief networks
15
TRW Private/Proprietary
Lessons Learned
TRW
• Data Sets: choose one that you understand – makes cleaning, formatting, default
parameter settings, and interpretation much easier.
• Background: knowledge of non-parametric statistics helps determine what patterns
are statistically significant
• Tools: many are just fancy GUIs with database query and plot functionality. Most
are overpriced (100K/seat for high end tools for mining business data)
• Uses: new one discovered in every task; e.g., consistency & completeness of rules.
May be be useful for organizing textual evidence
• Algorithms: must provide understandable patterns; e.g., some algorithms do not!
• Integration: challenging to interface these inductive and abductive methods with
deductive methods like belief networks
16
TRW Private/Proprietary
TRW
Summary
• TRW has many technical people in Colorado with data mining experience
• Hands-on with commercial and academic tools
• Interesting and useful results have been produced
• Patterns in READI and MIDB using rule induction algorithms
• Outliers, corrupt data, and exceptions are flagged
• Novel uses, such as consistency and completeness of rule sets, demonstrated
• Lessons learned have been described
• Good starting point for future work
• Challenge is interfacing data mining algorithms with others
17
TRW Private/Proprietary
Related documents