Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Usability and Integration H. V. Jagadish Many Sources of Data • • • • Text XML/semi-structured Experimental measurements Public databases • Some data may have time/space variation • Need to make sense of this big mess Find Patterns in Data • Conventional data mining seeks patterns that can be mathematically specified over (usually) global extents. • Typically assume simple data structure. • Need new approaches to find patterns in messy data. Human in the Loop • Hard for a machine to tell an interesting pattern apart from one that is not. • Problem exacerbated when we seek smaller/localized patterns, or work with large vocabularies of possible patterns. • Need human in the loop to make this judgment. Computer-Assisted (Human) Analytics • Patterns found by human and not by computer. • Job of computer is to make patterns easy to find. • So computer system must effectively support queries and display results. • Eg.Visual Analytics Organize Data for Analysis • Join multiple complex temporal data streams into a “windowed” model suitable for efficient analysis. [Manish Singh] • Permit organic change to schema as information needs evolve. [Eric Qian] • Provide a spreadsheet interface for direct manipulation of complex and large data. Choose small sets of representatives effectively. [Ben Liu] Access Data for Analysis • Under-specified queries, particularly keyword queries. Derive “qunit” as response unit, mined from observed query logs. [Arnab Nandi] • Visual manipulation algebra for analyzing large time-varying graphs with data on nodes and edges. [Anna Shaverdian] Scientific Data Analysis • Explain analysis results in terms of source data, even when the source may have been updated since. [Jing Zhang] • Analyze gene expression microarray data, and electronic health record data, in light of known biomedical knowledge. [Fernando Farfan]