Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Intelligent Icons: Integrating Lite-Weight Data Mining and Visualization into GUI Operating Systems Eamonn Keogh Li Wei Xiaopeng Xi Stefano Lonardi Jin Shieh Computer Science & Engineering Dept. University of California – Riverside Time Series Data Mining Group Scott Sirowy Outline • • • • • Overview An Example: DNA to Intelligent Icon Icon Generation Algorithm Experimental Evaluation Conclusion Eamonn, patent this idea! Christos Faloutsos Time Series Data Mining Group Dataset Kalpakis_ECG Icons in a traditional browser Time Series Data Mining Group Dataset Kalpakis_ECG Suppose I magically.. 1) Color the icons to somehow reflect the contents of the file. 2) Position the icons based on their colors/patterns normal9.txt normal8.txt normal5.txt normal1.txt normal10.txt normal11.txt normal15.txt normal14.txt normal13.txt normal7.txt normal2.txt normal16.txt normal18.txt normal4.txt normal3.txt normal12.txt normal6.txt Time Series Data Mining Group normal17.txt Let us start with visualizing a special data type, DNA. The DNA of two species… Are they similar? TGGCCGTGCTAGGCCCCACCCCTACCTTGC GTCCCCGCAAGCTCATCTGCGCGAACCAGA ACGCCCACCACCCTTGGGTTGAAATTAAGG GGCGGTTGGCAGCTTCCCAGGCGCACGTA CTGCGAATAAATAACTGTCCGCACAAGGAG CCGACGATAGTCGACCCTCTCTAGTCACGA CTACACACAGAACCTGTGCTAGACGCCATG GATAAGCTAACACAAAAACATTTCCCACTAC TGCTGCCCGCGGGCTACCGGCCACCCCTG CTCAGCCTGGCGAAGCCGCCCTTCA CCGTGCTAGGGCCACCTACCTTGGTCC CCGCAAGCTCATCTGCGCGAACCAGAA GCCACCACCTTGGGTTGAAATTAAGGA GCGGTTGGCAGCTTCCAGGCGCACGTA CTGCGAATAAATAACTGTCCGCACAAG AGCCGACGATAAAGAAGAGAGTCGACC CTCTAGTCACGACCTACACACAGAACC GTGCTAGACGCCATGAGATAAGCTAAC Time Series Data Mining Group C T A G 0.20 0.24 0.26 0.30 Time Series Data Mining Group CCGTGCTAGGGCCACCTACCTTGGTCCG CCGCAAGCTCATCTGCGCGAACCAGAA GCCACCACCTTGGGTTGAAATTAAGGAG GCGGTTGGCAGCTTCCAGGCGCACGTA CTGCGAATAAATAACTGTCCGCACAAGG AGCCGACGATAAAGAAGAGAGTCGACC CTCTAGTCACGACCTACACACAGAACCT GTGCTAGACGCCATGAGATAAGCTAACA CC CT TC TT C T CA CG TA TG TC CCC CCT CTC CCA CCG CTA CAC CAT CAA AC AT GC GT A G AA AG GA GG CCGTGCTAGGGCCACCTACCTTGGTCC CCGCAAGCTCATCTGCGCGAACCAGAA GCCACCACCTTGGGTTGAAATTAAGGA GCGGTTGGCAGCTTCCAGGCGCACGT CTGCGAATAAATAACTGTCCGCACAAG AGCCGACGATAAAGAAGAGAGTCGAC CTCTAGTCACGACCTACACACAGAACC GTGCTAGACGCCATGAGATAAGCTAAC Time Series Data Mining Group 1 0.02 0.04 0.09 0.04 CA 0.03 0.07 0.02 AC AT 0.11 0.03 AA AG 0 Time Series Data Mining Group CCGTGCTAGGCCCCACCCCTACCTTGC GTCCCCGCAAGCTCATCTGCGCGAACC GAACGCCCACCACCCTTGGGTTGAAAT AAGGAGGCGGTTGGCAGCTTCCCAGG CACGTACCTGCGAATAAATAACTGTCC ACAAGGAGCCCGACGATAGTCGACCCT TCTAGTCACGACCTACACACAGAACCT TGCTAGACGCCATGAGATAAGCTAACA OK. Given any DNA string I can make a colored bitmap, so what? CCGTGCTAGGCCCCACCCCTACCTTGC GTCCCCGCAAGCTCATCTGCGCGAACC GAACGCCCACCACCCTTGGGTTGAAAT AAGGAGGCGGTTGGCAGCTTCCCAGG CACGTACCTGCGAATAAATAACTGTCC ACAAGGAGCCCGACGATAGTCGACCCT TCTAGTCACGACCTACACACAGAACCT TGCTAGACGCCATGAGATAAGCTAACA Time Series Data Mining Group Indian rhinoceros.dna white white rhinoceros.dna rhinoceros.dna rhesus monkey.dna pygmy chimpanzee.dna Indian elephant.dna sperm whale.dna hippopotamus.dna chimpanzee.dna Human.dna Human.dna African elephant.dna orangutan.dna pygmy sperm whale.dna Time Series Data Mining Group Note Elephas maximus is the Indian Elephant, Loxodonta africana is the African elephant and Pan troglodytes is the chimpanzee. Time Series Data Mining Group Can we make Intelligent Icons for time series? Yes, with SAX! accbabcdbcabdbcadbacbdbdcadbaacb… c b a c b d aa ac ca cc ab ad cb cd ba bc da dc bb bd db dd aaa aab aba aac aad abc aca acb acc Time Series Bitmap Time Series Data Mining Group a a b c c b While they are all example of EEGs, example_a.dat is from a normal trace, whereas the others contain examples of spike-wave discharges. Time Series Data Mining Group We can further enhance the time series bitmaps by arranging the thumbnails by “cluster”, instead of arranging by date, size, name etc We can achieve this with MDS. August.txt July.txt June.txt May.txt Sept.txt April.txt Oct.txt Feb.txt March.txt Nov.txt Dec.txt Jan.txt 300 One Year of Italian Power Demand 200 100 January 0 Time Series Data Mining Group December August Text Example Here are some papers that reference Eamonn Keoghs work… Tree augmented naive Bayes ensembles… Discriminative versus generative parameter… Floating search algorithm for structure… FEATURE SELECTION FOR THE NAÏVE… A Heuristic Lazy Bayesian Rule… Detection of surface defects on raw… Combining Naive Bayes and nGram Language… Learning Recursive Bayesian Multinets… Naive Bayes with Higher Order Attributes… Boosted Bayesian Network Classifiers… Applying general Bayesian techniques… An efficient data mining method for… Decision tree Induction from Time series… Indexing spatio temporal trajectories… Averaged OneDependence Estimators… Making Time series Classification More…. Learning Bayesian network classifiers… LB Keogh Supports Exact Indexing of… WARP accurate retrieval of shapes… Augmenting Naive Bayes Classifiers with… Warping the Time on Data Streams… Efficiently and Accurately Comparing… Estensione del Classificatore Naive Bayes… Time Series Data Mining Group Warp Metric Distance Aprimorando o Uso de… Clustering Multidimensional Trajectories… Lower Bounding of Dynamic Time Warping…. Efficient subsequence matching in time… FTW fast similarity search… Elastic Translation Invariant Matching… Robust and fast similarity search… FastDTW Toward Accurate Dynamic Time… A PCA based similarity measure for… Indexing multidimensional time-series… Efficient subsequence matching for… A novel technique for indexing… Scaling and time warping in time series… Warping indexes with envelope… Rotation invariant distance measures for… Text Example Paper on using “warping” to classify Cluster of classification papers “classification” papers Cluster of “warping” papers Classification paper in Italian “Warping” paper in Portuguese Time Series Data Mining Group Intelligent Icon Search Icon Search Time Series Data Mining Group Paper Summary • We show how to map DNA, time series and natural language into intelligent icons. • We give a generic framework for mapping any kind of data into intelligent icons. • We show the utility of intelligent icons for finding patterns (clusters, outliers etc) Time Series Data Mining Group Questions? Time Series Data Mining Group