Download Powerpoint - University of California, Riverside

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Intelligent Icons:
Integrating Lite-Weight Data Mining and
Visualization into GUI Operating Systems
Eamonn Keogh
Li Wei
Xiaopeng Xi
Stefano Lonardi
Jin Shieh
Computer Science & Engineering Dept.
University of California – Riverside
Time Series Data Mining Group
Scott Sirowy
Outline
•
•
•
•
•
Overview
An Example: DNA to Intelligent Icon
Icon Generation Algorithm
Experimental Evaluation
Conclusion
Eamonn, patent this idea!
Christos Faloutsos
Time Series Data Mining Group
Dataset Kalpakis_ECG
Icons in a traditional browser
Time Series Data Mining Group
Dataset Kalpakis_ECG
Suppose I magically..
1) Color the icons to somehow reflect
the contents of the file.
2) Position the icons based on their
colors/patterns
normal9.txt
normal8.txt
normal5.txt
normal1.txt normal10.txt normal11.txt
normal15.txt normal14.txt
normal13.txt normal7.txt normal2.txt
normal16.txt normal18.txt
normal4.txt
normal3.txt normal12.txt
normal6.txt
Time Series Data Mining Group
normal17.txt
Let us start with visualizing
a special data type, DNA.
The DNA of two species…
Are they similar?
TGGCCGTGCTAGGCCCCACCCCTACCTTGC
GTCCCCGCAAGCTCATCTGCGCGAACCAGA
ACGCCCACCACCCTTGGGTTGAAATTAAGG
GGCGGTTGGCAGCTTCCCAGGCGCACGTA
CTGCGAATAAATAACTGTCCGCACAAGGAG
CCGACGATAGTCGACCCTCTCTAGTCACGA
CTACACACAGAACCTGTGCTAGACGCCATG
GATAAGCTAACACAAAAACATTTCCCACTAC
TGCTGCCCGCGGGCTACCGGCCACCCCTG
CTCAGCCTGGCGAAGCCGCCCTTCA
CCGTGCTAGGGCCACCTACCTTGGTCC
CCGCAAGCTCATCTGCGCGAACCAGAA
GCCACCACCTTGGGTTGAAATTAAGGA
GCGGTTGGCAGCTTCCAGGCGCACGTA
CTGCGAATAAATAACTGTCCGCACAAG
AGCCGACGATAAAGAAGAGAGTCGACC
CTCTAGTCACGACCTACACACAGAACC
GTGCTAGACGCCATGAGATAAGCTAAC
Time Series Data Mining Group
C
T
A
G
0.20 0.24
0.26 0.30
Time Series Data Mining Group
CCGTGCTAGGGCCACCTACCTTGGTCCG
CCGCAAGCTCATCTGCGCGAACCAGAA
GCCACCACCTTGGGTTGAAATTAAGGAG
GCGGTTGGCAGCTTCCAGGCGCACGTA
CTGCGAATAAATAACTGTCCGCACAAGG
AGCCGACGATAAAGAAGAGAGTCGACC
CTCTAGTCACGACCTACACACAGAACCT
GTGCTAGACGCCATGAGATAAGCTAACA
CC CT TC TT
C
T
CA CG TA TG
TC
CCC CCT CTC
CCA CCG CTA
CAC CAT
CAA
AC AT GC GT
A
G
AA AG GA GG
CCGTGCTAGGGCCACCTACCTTGGTCC
CCGCAAGCTCATCTGCGCGAACCAGAA
GCCACCACCTTGGGTTGAAATTAAGGA
GCGGTTGGCAGCTTCCAGGCGCACGT
CTGCGAATAAATAACTGTCCGCACAAG
AGCCGACGATAAAGAAGAGAGTCGAC
CTCTAGTCACGACCTACACACAGAACC
GTGCTAGACGCCATGAGATAAGCTAAC
Time Series Data Mining Group
1
0.02 0.04 0.09 0.04
CA 0.03 0.07 0.02
AC AT 0.11 0.03
AA AG
0
Time Series Data Mining Group
CCGTGCTAGGCCCCACCCCTACCTTGC
GTCCCCGCAAGCTCATCTGCGCGAACC
GAACGCCCACCACCCTTGGGTTGAAAT
AAGGAGGCGGTTGGCAGCTTCCCAGG
CACGTACCTGCGAATAAATAACTGTCC
ACAAGGAGCCCGACGATAGTCGACCCT
TCTAGTCACGACCTACACACAGAACCT
TGCTAGACGCCATGAGATAAGCTAACA
OK. Given any DNA
string I can make a
colored bitmap, so what?
CCGTGCTAGGCCCCACCCCTACCTTGC
GTCCCCGCAAGCTCATCTGCGCGAACC
GAACGCCCACCACCCTTGGGTTGAAAT
AAGGAGGCGGTTGGCAGCTTCCCAGG
CACGTACCTGCGAATAAATAACTGTCC
ACAAGGAGCCCGACGATAGTCGACCCT
TCTAGTCACGACCTACACACAGAACCT
TGCTAGACGCCATGAGATAAGCTAACA
Time Series Data Mining Group
Indian
rhinoceros.dna
white
white
rhinoceros.dna
rhinoceros.dna
rhesus
monkey.dna
pygmy
chimpanzee.dna
Indian
elephant.dna
sperm
whale.dna
hippopotamus.dna
chimpanzee.dna
Human.dna
Human.dna
African
elephant.dna
orangutan.dna
pygmy
sperm whale.dna
Time Series Data Mining Group
Note Elephas maximus is the Indian Elephant, Loxodonta africana is
the African elephant and Pan troglodytes is the chimpanzee.
Time Series Data Mining Group
Can we make Intelligent Icons for time series?
Yes, with SAX!
accbabcdbcabdbcadbacbdbdcadbaacb…
c
b
a
c
b
d
aa
ac
ca
cc
ab
ad
cb
cd
ba
bc
da
dc
bb
bd
db
dd
aaa
aab
aba
aac
aad
abc
aca
acb
acc
Time Series Bitmap
Time Series Data Mining Group
a
a
b
c
c
b
While they are all example of EEGs, example_a.dat is
from a normal trace, whereas the others contain examples
of spike-wave discharges.
Time Series Data Mining Group
We can further enhance
the time series bitmaps
by arranging the
thumbnails by “cluster”,
instead of arranging by
date, size, name etc
We can achieve this with
MDS.
August.txt
July.txt
June.txt
May.txt
Sept.txt
April.txt
Oct.txt
Feb.txt
March.txt
Nov.txt
Dec.txt
Jan.txt
300
One Year of Italian Power Demand
200
100
January
0
Time Series Data Mining Group
December
August
Text Example
Here are some papers that reference Eamonn Keoghs work…
Tree augmented naive
Bayes ensembles…
Discriminative versus
generative parameter…
Floating search algorithm
for structure…
FEATURE SELECTION
FOR THE NAÏVE…
A Heuristic Lazy
Bayesian Rule…
Detection of surface
defects on raw…
Combining Naive Bayes
and nGram Language…
Learning Recursive
Bayesian Multinets…
Naive Bayes with
Higher Order Attributes…
Boosted Bayesian
Network Classifiers…
Applying general
Bayesian techniques…
An efficient data mining
method for…
Decision tree Induction
from Time series…
Indexing spatio temporal
trajectories…
Averaged OneDependence Estimators…
Making Time series
Classification More….
Learning Bayesian
network classifiers…
LB Keogh Supports
Exact Indexing of…
WARP accurate retrieval
of shapes…
Augmenting Naive Bayes
Classifiers with…
Warping the Time on
Data Streams…
Efficiently and
Accurately Comparing…
Estensione del Classificatore
Naive Bayes…
Time Series Data Mining Group
Warp Metric Distance
Aprimorando o Uso de…
Clustering Multidimensional
Trajectories…
Lower Bounding of
Dynamic Time Warping….
Efficient subsequence
matching in time…
FTW fast similarity
search…
Elastic Translation Invariant
Matching…
Robust and fast similarity
search…
FastDTW Toward Accurate
Dynamic Time…
A PCA based similarity
measure for…
Indexing multidimensional
time-series…
Efficient subsequence
matching for…
A novel technique for
indexing…
Scaling and time
warping in time series…
Warping indexes with
envelope…
Rotation invariant distance
measures for…
Text Example
Paper on using
“warping” to
classify
Cluster of
classification papers
“classification”
papers
Cluster of “warping” papers
Classification
paper in Italian
“Warping” paper
in Portuguese
Time Series Data Mining Group
Intelligent Icon Search
Icon Search
Time Series Data Mining Group
Paper Summary
• We show how to map DNA, time series and
natural language into intelligent icons.
• We give a generic framework for mapping any
kind of data into intelligent icons.
• We show the utility of intelligent icons for
finding patterns (clusters, outliers etc)
Time Series Data Mining Group
Questions?
Time Series Data Mining Group
Related documents