Download MIAS Mission - Multimodal Information Access and Synthesis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Multimodal Information Access
and
Synthesis
A DHS Institute of Discrete Science Center @ UIUC
Dan Roth
Department of Computer Science
University of Illinois at Urbana/Champaign
MIAS Mission

Most of the data today is unstructured


books, newspaper articles, journal publications, field reports, images, and
audio and video streams.
How to deal with the huge amount of unstructured data as if it was
organized in a database with a known schema.

how to locate, access, organize, analyze and synthesize unstructured data.
MIAS Mission:

develop the theories, algorithms, and tools for analysts to



access a variety of data formats and models
integrate them with existing resources
transform raw data into useful and understandable information.
MIAS
2
Multi-Modal Information Access & Synthesis
Task Perspective

In the next decade, intelligence analysts will need to



monitor a huge number of interesting events and entities
formulate and evaluate hypotheses with respect to them.
Analysts must interact, at the appropriate level of semantic
abstraction, with a system that can



synthesize, summarize and interpret vast amounts of multimodal
information,
integrate observed data with domain models and background
information in multiple formats,
propose hypotheses, and help verify them.
MIAS
3
Multi-Modal Information Access & Synthesis
Scenario

Consider an intelligence analyst researching a problem




Iranian nuclear program – generate a list of Iranian nuclear scientists,
affiliations, specialties, biographies, photos, and notable recent activities.
Medical treatment – what is known about it; who are the experts; what
do users say about it; what side effects have been reported
Current technologies have solved the problem of collecting and
storing huge amounts of information; it would be reasonable
to assume that the information she is after does exist;
However, multiple barriers exist on the way to a successful
completion of the analysis, each posing a significant research
challenge.
MIAS
4
Multi-Modal Information Access & Synthesis
Multimodal Information Access & Synthesis
Online Data
Sources
Saeed
Zakeri
Web pages
**
*
*
Text documents
News
Articles
Field reports
Specific
Web sites
Text
Repositories
Relational
databases
Surveillance
Videos
...
* *
*
**
Relational data
Images
MIAS
Discover unusual
events, entities, and
U. of
Tehran associations.
Elkhan
Continuous
Factory
monitoring of events,
entities & associations
visited
**
*
Rapid retrieval of all
info. about a
particular entity
loc = Northern Iran
name = Elkhan Factory
topics = fertilizer, enrichment
Semantic categories
Temporal categories
Subjectivity/Opinions
Infer Metadata:
Semantic entities
Focused Multimodal
Data Retrieval
**
*
Elkhan
Factory
Saeed
Zakeri
attended
Efficient search,
querying, question
answering, browsing,
mining,
Discover Relations
Between Semantic Entities
Semantic Disambiguation &
Integration across multiple
sources and modalities
Support Information
Analysis, Knowledge
Discovery, Monitoring
5
Multi-Modal Information Access & Synthesis
MIAS Processes

Focused data retrieval and integration


Semantic data enrichment




Identify real-world entities and relations among them
Relate them to existing institutional resources for information integration
Knowledge discovery and hypotheses generation and verification


Infer semantics from unstructured data and images;
Allow navigation and search across disparate data modalities; augment KBs
Entity identification and relations discovery


Identify and collect relevant data from multiple sources
Construct the rich semantic structure and hidden networks of entity linkages
Foundations


Machine learning, database and data mining, natural language processing,
inference and optimization and computer vision techniques
Required for and driven by the aforementioned problems.
MIAS
6
Multi-Modal Information Access & Synthesis
Integrated Mission: Research & Education


Develop diverse human resources to enhance the scientific
research, educational, and governmental workforce in MIAS
Educational and Outreach Initiatives:




Target students from small research programs & minority-serving
Expose them to the national labs
Open opportunities for bigger impact
A comprehensive education program designed to increase
participation in the study and practice of MIAS topics:



Provide substantive training for a new generation of experts in the field,
Serve as a tool for recruiting an experienced group of undergraduates
into graduate study in one of the broad fields of information science
Be an intellectual community center, where participants at all levels of
expertise come together in an enriched environment of collaboration.
MIAS
7
Multi-Modal Information Access & Synthesis
Data Science Summer Institute at UIUC
Intensive Course
1st DSSI: May 2007
Research Projects
in
The Math of Data
Sciences
1.
Probability and
Statistics
2.
Linear Algebra
3.
Data Structures
and Algorithms
4.
Optimization
5.
Learning &
Clustering
Led by co-PIs and
Grad Students
Huge Success
Topics:
8 weeks course
Mathematical Foundations
of Information Science
25 studentsResearch
1.
Virtual Web:
Focused Crawling
2.
Relations and
Entities
3.
Text and Images
Research
Faculty
from UIUC, Kansas
State,
Tutorial Tutorial Tutorial
Tutorial Tutorial
UTEP4
1
2UTSA,
3
5
4 weeks
4 weeks
Advanced MIAS Related Tutorials
Speaker Series
MIAS
8
Multi-Modal Information Access & Synthesis
Data Science Summer Institute at UIUC
Advanced MIAS Related Tutorials
Mathematical Foundations
of Information Science
Research
Research
Databases &
Information
Integration
Tutorial
1
Tutorial
3
4 weeks
Natural
Language
Processing &
Information
Extraction
MIAS
Tutorial
2
Tutorial
4
Machine
Learning &
Data Mining
Tutorial
5
4 weeks
Computer
Vision
Information
Retrieval &
Web
Information
Access
9
Multi-Modal Information Access & Synthesis
Data Science Summer Institute at UIUC
MIAS
10
Multi-Modal Information Access & Synthesis
Our Team


Leading researchers in intelligent information access & analysis
and its foundations
A large number of affiliates/consultants covering all areas of
interest to the MIAS center.
MIAS
11
Multi-Modal Information Access & Synthesis
Kevin C. Chang:
Data Integration and Retrieval
Deep Web
MetaQuerier: Large-scale
integration over deep Web





pioneered the “holistic
integration” paradigm
Widely published at SIGMOD, VLDB, ICDE
System building and demo at CIDR, SIGMOD, VLDB, …
PC/editor/organizers of SIGMOD, ICDE, WWW, SIGKDD special
issue, WIRI06, IIWeb06 workshops
Awards: NSF CAREER, IBM Faculty, NCSA Faculty VLDB00 Best
Paper Selection
MIAS
12
Multi-Modal Information Access & Synthesis
AnHai Doan:
Data Integration








Data integration
Matching Schemas, Ontologies, Entities
Integrating databases and text
ACM Doctoral Dissertation Award
2004
Monitoring People and Events
Sloan Fellowship 2006
Edited special issues on Data
Integration
Co-chaired workshops on data
integration, Web technologies,
machine learning
Co-writing a book titled “Data
Integration”
MIAS
13
Multi-Modal Information Access & Synthesis
David Forsyth:
Computer Vision and Learning
Linking Text and Images
Labeling images via (a lot of)
caption text





MIAS
Leading Computer Vision researcher,
Over 110 papers on vision, graphics,
learning applications
Program Chair, CVPR 2000, General
chair CVPR 2006, regular member of PC
in all major vision conferences
IEEE Technical Achievement Award,
2006
Lead author of main textbook, widely
adopted
14
Multi-Modal Information Access & Synthesis
Jiawei Han:
Data Mining







Mining patterns and knowledge discovery from
massive data
Research focus: Mining data streams, frequent
patterns, sequential patterns, graph patterns, and
their applications
Privacy preserving Data Mining
Developed many popular data mining
algorithms, e.g., FPgrowth, PrefixSpan,
gSpan, StarCubing, CrossMine,
RankingCube, and CrossClus
Over 300 research papers published in
conferences and journals
Editor-in-Chief, ACM Transactions on
Knowledge Discovery from Data
Textbook, “Data mining: Concepts and
Techniques,” adopted worldwide
MIAS
15
Multi-Modal Information Access & Synthesis
Cinda Heeren:
Data Mining, Education


MIAS Summer School Director
Mathematical Foundation of Data Science Discrete
UIUC CS Department Director of Diversity Programs

Lecturer for Discrete Math, Data Structures courses at UIUC

One of the leading teachers and educational leaders at
UIUC.

Research in Data Mining and Data bases

Speaker and regular presenter at conferences for young
women, including:




GAMES and WYSE summer camps
Expanding Your Horizons careers conference
Grace Hopper Celebration of Women in Computing 2004
- panel on best practices for recruiting women into
undergraduate programs in CS.
SIGCSE 2006 - workshop, “How to host your own Small
Regional Celebration of Women in Computing.”
MIAS
16
Multi-Modal Information Access & Synthesis
Summer School
ChengXiang Zhai:
Information Retrieval and Text Mining




Probabilistic Paradigms for Information Retrieval
Personalized/Context Dependent Search
Relation Identification and Data Integration
Leading expert in information retrieval and
search technologies

Recipient of the 2004 Presidential Early Career
Award for Scientists and Engineers (PECASE),
Main architect and key contributor of the
Lemur Toolkit (being used by many research
groups and IR companies around the world)

ACM SIGIR’04 best paper award


Selected services include Program Chair of ACM
CIKM 2004; HLT/NAACL 06; SIGIR’09
MIAS
17
Multi-Modal Information Access & Synthesis
Dan Roth:
Machine Learning, NLP









Semantic Analysis and Data enrichment
Entity and Relation Identification and Integration
Textual Entailment
Machine Learning Methods for NLP and IE
Leading Researcher in Machine Learning, NLP, AI
Developed Popular Machine Learning system and
machine learning based NLP tools used in
industry and NLP classes.
Program Chair, ACL’03, CoNLL’02; Regular
senior PC member in all major Machine
Learning, NLP and AI conferences
Associate Editor: Journal of Artificial Intelligence
Research; Machine Learning
Multiple papers awards
MIAS
18
Multi-Modal Information Access & Synthesis
Information Access & Synthesis Processes



Identify and collect relevant data from multiple sources
Semantic data enrichment,

Knowledge discovery and hypotheses generation and verification,
Machine
Construct the rich semantic structure and hidden networks of entity linkages
Learning &
Data Mining
Foundations

Machine learning, database and data mining, natural language processing,
Integrating
inference and optimization and computer vision techniques
Text &

Called for and driven by the aforementioned problems.
Images


Text
Processing&
Analysis
Infer semantics from unstructured data and images;

Allow navigation and search across disparate data modalities; augmentSemantic
KBs
Analysis &
Entity identification and relations discovery,
Information
Extraction

Identify real-world entities and relations among them

Relate them to existing institutional resources for information integration
Information
Integration


Tools
Focused data retrieval and integration,
MIAS
19
Multi-Modal Information Access & Synthesis