Download Hi there

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Massive Data Analysis Lab
(MassDAL)
S. Muthukrishnan
CS Dept
MassDAL
• Agenda: Gather, manage and process massive data logs---Web, IP/wireless traffic data, location trajectories of objects,
sensor readings of physical world.
• Key Challenges:
– Scale: Beyond the traditional “human” scale. Eg., IP data at a
single router interface for an hour exceeds total yearly worldwide
credit card transactions!
– Data Collection: probes/sensors with associated data quality and
communication problems.
• Need breakthroughs in Mathematics, Algorithms, Systems
and Engineering, to meet these challenges.
• Potential: Major impact in Homeland Security, Telecom,
Transportation and Society-at-large.
State of MassDAL
• Mathematics and Computer Science.
– Algorithmic tools for embedding vectors, strings, trees and
other objects for “compact” representation.
– Algorithmic tools for analyzing data summaries for heavy
hitters, deviants, clustering, decision trees, etc.
– Invited talks at ACM, SIAM, European conferences in
Algorithms, Databases, Statistics, and Data Mining on
novel models and algorithms.
– Over dozen research papers in last 2 years on experience
with massive data analysis.
– Supported by NSF grants. Partner: MIT, DIMACS.
State of MassDAL
• Science
– Developing wearable sensors for tracking location
of objects as well as “interactions” between
objects. Measuring behavioral data.
– Current partner: Telcordia. Their initial
investment: $300k/3 months (est). Potential parter
in works: Los Alamos National Lab.
– Potential: Analysis of social networks for
Epidemiology and Homeland Security, and health
industry.
State of MassDAL
• Engineering.
– Consulting in analysis of wireless network logs.
AT&T Wireless, 3rd largest in US, 20 Million
customers. Terabytes/month. Fully operational, telcograde!
– Incorporated novel algorithms in operational IP
network data analysis tools. Partner: Gigascope.
– Developed principled approach to data cleaning and
data quality monitoring for operational IP network.
Partner: PACMAN.
– Developed new burst-detection algorithms for text
streams. Partner: DIMACS, Monitoring message
streams.
Future
• See
http://cs.rutgers.edu/~muthu/massdal.html
Future of MassDAL
• Research: Need breakthrough research in mathematics,
systems, databases, algorithms, sensor networking.
• Expand data domains.
– Potential partners: Google, NJ auto insurance fraud data,
USPTO patent data, AWS location trajectories, etc.
• Build state-of-art facility at Rutgers.
– Secure, 24X7, data hosting and analysis infrastructure capable
of gathering and processing petabytes of data/month across
domains, data sources, etc. Unique in the world!
• Potential.
– Every wireless, telecom, internet service provider is looking to
farm out this crucial piece of their operations. Estimated
market for these services: 100’s of millions in US $ per year.
Crucial for NJ State. Interest from multiple VCs now.