Download Slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Usability and Integration
H. V. Jagadish
Many Sources of Data
•
•
•
•
Text
XML/semi-structured
Experimental measurements
Public databases
• Some data may have time/space variation
• Need to make sense of this big mess
Find Patterns in Data
• Conventional data mining seeks patterns
that can be mathematically specified over
(usually) global extents.
• Typically assume simple data structure.
• Need new approaches to find patterns in
messy data.
Human in the Loop
• Hard for a machine to tell an interesting
pattern apart from one that is not.
• Problem exacerbated when we seek
smaller/localized patterns, or work with
large vocabularies of possible patterns.
• Need human in the loop to make this
judgment.
Computer-Assisted (Human)
Analytics
• Patterns found by human and not by
computer.
• Job of computer is to make patterns easy
to find.
• So computer system must effectively
support queries and display results.
• Eg.Visual Analytics
Organize Data for Analysis
• Join multiple complex temporal data
streams into a “windowed” model suitable
for efficient analysis. [Manish Singh]
• Permit organic change to schema as
information needs evolve. [Eric Qian]
• Provide a spreadsheet interface for direct
manipulation of complex and large data.
Choose small sets of representatives
effectively. [Ben Liu]
Access Data for Analysis
• Under-specified queries, particularly
keyword queries. Derive “qunit” as
response unit, mined from observed query
logs. [Arnab Nandi]
• Visual manipulation algebra for analyzing
large time-varying graphs with data on
nodes and edges. [Anna Shaverdian]
Scientific Data Analysis
• Explain analysis results in terms of source
data, even when the source may have
been updated since. [Jing Zhang]
• Analyze gene expression microarray data,
and electronic health record data, in light
of known biomedical knowledge.
[Fernando Farfan]