Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1/25 Intro VA Apps Dist Func ATG Wrap-up Visual Analytics Research at Tufts Remco Chang Assistant Professor Tufts University 2/25 Intro VA Apps Dist Func ATG Wrap-up Problem Statement • The growth of data is exceeding our ability to analyze them. • The amount of digital information generated in the years 2002, 2006, 2010: – 2002: 22 EB (exabytes, 1018) – 2006: 161 EB – 2010: 988 EB (almost 1 ZB) 1: Data courtesy of Dr. Joseph Kielman, DHS 2: Image courtesy of Dr. Maria Zemankova, NSF 3/25 Intro VA Apps Dist Func ATG Wrap-up Problem Statement • The data is often complex, ambiguous, noisy. Analysis of which requires human understanding. – About 2 GB of digital information is being produced per person per year – 95% of the Digital Universe’s information is unstructured 1: Data courtesy of Dr. Joseph Kielman, DHS 2: Image courtesy of Dr. Maria Zemankova, NSF Intro 4/25 VA Apps Dist Func ATG Wrap-up Example: What Does Fraud Look Like? • Financial Institutions like Bank of America have legal responsibilities to report all suspicious activities • Data size: approximately 200,000 transactions per day (73 million transactions per year) • Problems: – – – – – Automated approach can only detect known patterns Bad guys are smart: patterns are constantly changing No single transaction appears fraudulent Few experts: fraud detection is considered an “art” Data is messy: lack of international standards resulting in ambiguous data • Current methods: – 10 analysts monitoring and analyzing all transactions – Using SQL queries and spreadsheet-like interfaces – Limited to the time scale (2 weeks) 5/25 Intro VA Apps Dist Func ATG Wrap-up WireVis: Financial Fraud Analysis • In collaboration with Bank of America – Looks for suspicious wire transactions – Currently beta-deployed at WireWatch – Visualizes 7 million transactions over 1 year • Uses interaction to coordinate four perspectives: – – – – Keywords to Accounts Keywords to Keywords Keywords/Accounts over Time Account similarities (search by example) 6/25 Intro VA Apps Dist Func ATG Wrap-up WireVis: Financial Fraud Analysis Heatmap View (Accounts to Keywords Relationship) Search by Example (Find Similar Accounts) Keyword Network (Keyword Relationships) Strings and Beads (Relationships over Time) R. Chang et al., Scalable and interactive visual analysis of financial wire transactions for fraud detection. Information Visualization,2008. R. Chang et al., Wirevis: Visualization of categorical, time-varying data from financial transactions. IEEE VAST, 2007. 7/25 Intro VA Apps Dist Func ATG Wrap-up What is Visual Analytics? • Visual analytics is the science of analytical reasoning facilitated by interactive visual interfaces [Thomas & Cook 2005] • Since 2004, the field has grown significantly. Aside from tens to hundreds of domestic and international partners, it now has a IEEE conference (IEEE VAST), an NSF program (FODAVA), and a forthcoming IEEE Transactions journal. 8/25 Intro VA Apps Dist Func ATG Wrap-up Individually Not Unique • Data Mining • Machine Learning • Databases • Information Retrieval • etc Analytical Reasoning and Interaction Data Representation Transformation Production, Presentation Dissemination • Tech Transfer • Report Generation • etc • • • • Interaction Design Cognitive Psychology Intelligence Analysis etc. Visual Representation • • • • InfoVis SciVis Graphics etc Validation and Evaluation • Quality Assurance • User studies (HCI) • etc 9/25 Intro VA Apps Dist Func ATG Wrap-up In Combinations of 2 or 3… • Data Mining • Machine Learning • Databases • Information Retrieval • etc Analytical Reasoning and Interaction Data Representation Transformation Production, Presentation Dissemination Visual Representation Validation and Evaluation • • • • InfoVis SciVis Graphics etc 10/25 Intro VA Apps Dist Func ATG In Combinations of 2 or 3… Analytical Reasoning and Interaction Data Representation Transformation Production, Presentation Dissemination • Tech Transfer • Report Generation • etc • • • • Interaction Design Cognitive Psychology Intelligence Analysis etc. Visual Representation Validation and Evaluation Wrap-up 11/25 Intro VA Apps Dist Func ATG Wrap-up Extending Visual Analytics Principles Who • Global Terrorism Database – Application of the investigative 5 W’s • Bridge Maintenance Where What Evidence Box Original Data – Exploring subjective inspection reports • Biomechanical Motion – Interactive motion comparison methods R. Chang et al., Investigative Visual Analysis of Global Terrorism, Journal of Computer Graphics Forum, 2008. When 12/25 Intro VA Apps Dist Func ATG Wrap-up Extending Visual Analytics Principles • Global Terrorism Database – Application of the investigative 5 W’s • Bridge Maintenance – Exploring subjective inspection reports • Biomechanical Motion – Interactive motion comparison methods R. Chang et al., An Interactive Visual Analytics System for Bridge Management, Journal of Computer Graphics Forum, 2010. To Appear. 13/25 Intro VA Apps Dist Func ATG Wrap-up Extending Visual Analytics Principles • Global Terrorism Database – Application of the investigative 5 W’s • Bridge Maintenance – Exploring subjective inspection reports • Biomechanical Motion – Interactive motion comparison methods R. Chang et al., Interactive Coordinated Multiple-View Visualization of Biomechanical Motion Data, IEEE Vis (TVCG) 2009. 14/25 Intro VA Apps Dist Func ATG Wrap-up Human + Computer A Mixed-Initiative Perspective • So far, our approach is mostly user-driven • Human vs. Artificial Intelligence Garry Kasparov vs. Deep Blue (1997) – Computer takes a “brute force” approach without analysis – “As for how many moves ahead a grandmaster sees,” Kasparov concludes: “Just one, the best one” • Artificial Intelligence vs. Augmented Intelligence Hydra vs. Cyborgs (1998) – Grandmaster + 1 computer > Hydra (equiv. of Deep Blue) – Amateur + 3 computers > Grandmaster + 1 computer1 • How to systematically repeat the success? – Unsupervised machine learning + User – User’s interactions with the computer 1. http://www.collisiondetection.net/mt/archives/2010/02/why_cyborgs_are.php Computer Translation Human 15/25 Intro VA Apps Dist Func ATG Wrap-up Examples of Human + Computer Computing • CAPCHA – RE-CAPCHA – General Crowd-Sourcing • Adaptive / Intelligent User Interfaces (IUI) • User assisted clustering / searching 16/25 Intro VA Apps Dist Func ATG Wrap-up Simple Example • Distance Function Achange xi , x j | x i Y1 , x j Y2 or x i Y2 , x j Y1 Aother xi , x j | xi , x j Achange arg min D x , x x i , x j Ach a n g e i D x , x x i , x j Ao th er i j j | I D xi , x j | t 1 | D xi , x j | t 1 17/25 Intro VA Apps Dist Func ATG Wrap-up Application 1: Find Important Features • Data set: X, 178x13 • 3 classes • add 10 random number columns as extra features 0.2 0.15 0.1 0.05 0 -0.05 -0.1 -0.15 -0.2 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25 Intro 18/25 VA Apps Dist Func ATG Wrap-up 1st Step: Success Trying to separate circled green dots from all blue dots 0.2 0.25 0.15 0.2 0.15 0.1 0.1 0.05 0.05 0 0 -0.05 -0.05 -0.1 -0.1 -0.15 -0.2 -0.2 -0.15 -0.1 0 0.1 0.2 0.3 -0.2 -0.3 -0.2 -0.1 0 0.1 0.2 Intro 19/25 VA Apps Dist Func ATG Wrap-up Result • Recall the structure of data set 10 Randomly generated feature values for every instance Original Wine Dataset, each instance has 13 feature values • Weight vector: – Randomly generated features gets low weights 0.096 0.150 0.062 0 0.018 0.011 0.025 0.039 0.037 0.047 0.038 0.011 0 0.017 0 0.046 0 0 0 0 0.091 0.186 0.127 20/25 Intro VA Apps Dist Func ATG Visual Analytics for Political Science Wrap-up 21/25 Intro VA Apps Dist Func Aggregate Temporal Graph 1000 simulations 60 time steps in each simulation (time step == a node) (edge == transition) Merged time steps if two states are the same ATG Wrap-up 22/25 Intro VA Apps Dist Func Aggregate Temporal Graph ATG Wrap-up 23/25 Intro VA Apps Dist Func ATG Wrap-up Gateways and Terminals Each of the yellow vertices is a Gateway to the vertex set of {A}. That is, every maximal path leaving a yellow vertex eventually passes through A. Vertex G is a Gateway to each of the yellow vertices, or Terminals. That is, every maximal path leaving G passes eventually through each of the yellow vertices. 24/25 Intro VA Apps Dist Func ATG Wrap-up Applications of Aggregate Temporal Graphs • A generalizable representation of problems involving parameter spaces that are too large to explore as a whole, but which are composed of related individual parts can be examined independently • Collaborative Analysis – Each analyst’s trail is a simulation – Each configuration state is a node • Web Analytics – Each visit is a simulation – Each configuration of a page is a node 25/25 Intro VA Apps Dist Func ATG Wrap-up Conclusion • Visual Analytics is a growing new area that is looking to address some pressing needs Analytical Reasoning and Interaction Data Representat ion Transformat ion Production, Presentatio n Disseminati on Visual Represent ation Validation and Evaluation – Too much (messy) data, too little time • By combining strengths and findings in existing disciplines, we have demonstrated that – There are some great benefits – But there are also some difficult challenges 26/25 Intro VA Apps Dist Func Questions? Thank you! ATG Wrap-up 27/25 Intro VA Backup Slides Apps Dist Func ATG Wrap-up Intro 28/25 VA Apps Dist Func ATG Wrap-up (2) Investigative GTD Who Where What Evidence Box Original Data R. Chang et al., Investigative Visual Analysis of Global Terrorism, Journal of Computer Graphics Forum (Eurovis), 2008. When 29/25 Intro VA Apps Dist Func ATG Wrap-up (2) Investigative GTD: Revealing Global Strategy This group’s attacks are not bounded by geo-locations but instead, religious beliefs. Its attack patterns changed with its developments. 30/25 Intro VA Apps Dist Func ATG (2) Investigative GTD: Discovering Unexpected Temporal Pattern A geographicallybounded entity in the Philippines. The ThemeRiver shows its rise and fall as an entity and its modus operandi. Domestic Group Wrap-up 31/25 Intro VA Apps Dist Func ATG Wrap-up What is in a User’s Interactions? Keyboard, Mouse, etc Input Visualization Human Output Images (monitor) • Types of Human-Visualization Interactions – Word editing (input heavy, little output) – Browsing, watching a movie (output heavy, little input) – Visual Analysis (closer to 50-50) 32/25 Intro VA Apps Dist Func ATG Wrap-up Discussion • What interactivity is not good for: – Presentation – YMMV = “your mileage may vary” • Reproducibility: Users behave differently each time. • Evaluation is difficult due to opportunistic discoveries.. – Often sacrifices accuracy • iPCA – SVD takes time on large datasets, use iterative approximation algorithms such as onlineSVD. • WireVis – Clustering of large datasets is slow. Either pre-compute or use more trivial “binning” methods. 33/25 Intro VA Apps Dist Func Discussion • Interestingly, – It doesn’t save you time… – And it doesn’t make a user more accurate in performing a task. • However, there are empirical evidence that using interactivity: – Users are more engaged (don’t give up) – Users prefer these systems over static (query-based) systems – Users have a faster learning curve • We need better measurements to determine the “benefits of interactivity” ATG Wrap-up