Download here

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CMU SCS
Big (graph) data analytics
Christos Faloutsos
CMU
CMU SCS
CONGRATULATIONS!
CMU SCS IC '14
C. Faloutsos
2
CMU SCS
Outline
•
•
•
•
•
Q+A
Problem definition / Motivation
Graphs, tensors and brains
Anomaly detection
Conclusions
CMU SCS IC '14
C. Faloutsos
3
CMU SCS
Q+A
•
•
•
•
•
Are you recruiting? How many?
How many do you have?
How frequently you meet them?
What is your advising style?
How do you feel about summer
internships?
CMU SCS IC '14
C. Faloutsos
4
CMU SCS
Q+A
•
•
•
•
•
Are you recruiting? How many? •
•
How many do you have?
How frequently you meet them? •
•
What is your advising style?
How do you feel about summer •
internships?
CMU SCS IC '14
C. Faloutsos
1 or 2
6 (+5pdocs)
1/week
results
Yes/Maybe
(FB, MSR,
IBM, ++)
5
CMU SCS
Outline
•
•
•
•
•
Q+A
Problem definition / Motivation
Graphs, tensors and brains
Anomaly detection
Conclusions
CMU SCS IC '14
C. Faloutsos
6
CMU SCS
Motivation
• Data mining: ~ find patterns (rules, outliers)
• How do real graphs look like? Anomalies?
• Time series / Monitoring
Measles @ PA, NY, …
CMU SCS IC '14
C. Faloutsos
7
CMU SCS
Graphs - why should we care?
CMU SCS IC '14
C. Faloutsos
8
CMU SCS
Graphs - why should we care?
Food Web
[Martinez ’91]
~1B users
$10-$100B revenue
Internet Map
[lumeta.com]
CMU SCS IC '14
C. Faloutsos
9
CMU SCS
Outline
•
•
•
•
•
Q+A
Problem definition / Motivation
Graphs, tensors and brains
Anomaly detection
Conclusions
CMU SCS IC '14
C. Faloutsos
10
CMU SCS
NELL & concepts (=groups)
• Predicates (subject, verb, object) in knowledge
base
Vagelis Papalexakis
CMU-CS
“Eric Clapton plays
guitar”
“Barack Obama is
the president of
U.S.”
CMU SCS IC '14
Tom Mitchell
CMU/CS-MLD
(48M)
(26M)
NELL (Never Ending
Language Learner) data
Nonzeros =144M
(26M)
C. Faloutsos
11
CMU SCS
Answer : tensor factorization
• Recall: (SVD) matrix factorization: finds
blocks
‘meat-eaters’‘vegetarians’ ‘kids’
M
‘steaks’
‘plants’ ‘cookies’
products
N
users
CMU SCS IC '14
+
~
C. Faloutsos
+
12
CMU SCS
Answer : tensor factorization
• PARAFAC decomposition
artists
politicians
subject
+
=
athletes
+
object
CMU SCS IC '14
C. Faloutsos
13
CMU SCS
Answer : tensor factorization
• PARAFAC decomposition
• Results for who-calls-whom-when
??
– 4M x 15 days ??
caller
+
=
??
+
callee
CMU SCS IC '14
C. Faloutsos
14
CMU SCS
Concept Discovery
• Concept Discovery in Knowledge Base
CMU SCS IC '14
C. Faloutsos
15
CMU SCS
Concept Discovery
• Concept Discovery in Knowledge Base
NP1: Internet, file, data
NP2: Protocol, software, suite
CMU SCS IC '14
C. Faloutsos
16
CMU SCS
Neuro-semantics
• Brain Scan Data*
• 9 persons
• 60 nouns
• Questions
• 218 questions
• ‘is it alive?’,
‘can you eat it?’
*Mitchell et al. Predicting human brain activity associated
with CMUthe
meanings of C. Faloutsos
nouns. Science,2008. Data@
SCS IC '14
17
www.cs.cmu.edu/afs/cs/project/theo-73/www/science2008/data.html
CMU SCS
Neuro-semantics
• Brain Scan Data*
• 9 persons
• 60 nouns
• Questions
• 218 questions
• ‘is it alive?’,
‘can you eat it?’
Patterns?
CMU SCS IC '14
C. Faloutsos
18
CMU SCS
Neuro-semantics
• Brain Scan Data*
• 9 persons
• 60 nouns
• Questions
• 218 questions
• ‘is it alive?’,
‘can you eat it?’
questions
Patterns?
airplane
CMU SCS IC '14
nouns
C. Faloutsos
…
dog
voxels
19
CMU SCS
Neuro-semantics
CMU SCS IC '14
C. Faloutsos
=
20
CMU SCS
Neuro-semantics
=
Small items ->
Premotor cortex
CMU SCS IC '14
C. Faloutsos
21
CMU SCS
Neuro-semantics
Small items ->
Premotor cortex
Evangelos Papalexakis, Tom Mitchell, Nicholas Sidiropoulos,
Christos Faloutsos, Partha Pratim Talukdar, Brian Murphy, Turbo-SMT:
Accelerating
Coupled Sparse Matrix-Tensor
Factorizations by22 200x,
CMU SCS IC '14
C. Faloutsos
SDM 2014
CMU SCS
Scalability
• Google: > 450,000 processors in clusters of ~2000
processors each [Barroso+, “Web Search for a Planet:
The Google Cluster Architecture” IEEE Micro 2003]
• Yahoo: 5Pb of data [Fayyad, KDD’07]
• Google-NY, Aug’14: ‘graph with 1T edges, 300B
nodes’
• Problem: machine failures, on a daily basis
• How to parallelize data mining tasks, then?
• A: map/reduce – hadoop (open-source clone)
http://hadoop.apache.org/
CMU SCS IC '14
C. Faloutsos
23
CMU SCS
Outline
•
•
•
•
•
Q+A
Problem definition / Motivation
Graphs, tensors and brains
Anomaly/fraud detection
Conclusions
CMU SCS IC '14
C. Faloutsos
24
CMU SCS
App-store fraud
Opinion Fraud Detection in Online Reviews
using Network Effects
Leman Akoglu, Rishi Chandy, CF
ICWSM’13
(NSF grant, with Alex Beutel)
CMU SCS IC '14
C. Faloutsos
25
CMU SCS
• Given
Problem
– user-product review network
– review sign (+/-)
• Classify
– objects into type-specific classes:
users: `honest’ / `fraudster’
products: `good’ / `bad’
reviews: `genuine’ / `fake’
No side data!
(e.g., timestamp, review text)
CMU SCS IC '14
C. Faloutsos
26
CMU SCS
Formulation: BP
User
honest
honest
Product
–
+
bad
good
Before
After
CMU SCS IC '14
C. Faloutsos
27
CMU SCS
Users
Top scorers
Products
+ positive (4-5) rating
o negative (1-2) rating
CMU SCS IC '14
C. Faloutsos
28
CMU SCS
Users
Top scorers
Products
+ positive (4-5) rating
o negative (1-2) rating
CMU SCS IC '14
C. Faloutsos
29
CMU SCS
‘Fraud-bot’ member reviews
Same developer!
CMU SCS IC '14
Duplicated text!
C. Faloutsos
Same day activity!
30
CMU SCS
Outline
•
•
•
•
•
•
Q+A
Problem definition / Motivation
Graphs, tensors and brains
Anomaly/fraud detection
Time series, monitoring / forecasting
Conclusions
CMU SCS IC '14
C. Faloutsos
31
CMU SCS
‘Tycho’ – epidemics analysis
Yasuko Matsubara
50 states x
46 diseases
CMU SCS IC '14
C. Faloutsos
32
CMU SCS
‘Tycho’ – epidemics analysis
Prof. Yasuko Matsubara
CMU SCS IC '14
C. Faloutsos
33
CMU SCS
‘Tycho’ – epidemics analysis
Flu?
Measles?
August?
No periodicity?
CMU SCS IC '14
Prof. Yasuko Matsubara
C. Faloutsos
34
CMU SCS
‘Tycho’ – epidemics analysis
Flu?
Measles?
August?
No periodicity?
CMU SCS IC '14
Prof. Yasuko Matsubara
C. Faloutsos
35
CMU SCS
‘Tycho’ – epidemics analysis
Flu?
Measles?
August?
No periodicity?
CMU SCS IC '14
Prof. Yasuko Matsubara
C. Faloutsos
36
CMU SCS
‘Tycho’ – epidemics analysis
Flu?
Measles?
August?
No periodicity?
CMU SCS IC '14
Prof. Yasuko Matsubara
C. Faloutsos
37
CMU SCS
‘Tycho’ – epidemics analysis
Flu?
Measles?
August?
No periodicity?
CMU SCS IC '14
Prof. Yasuko Matsubara
C. Faloutsos
38
CMU SCS
‘Tycho’ – epidemics analysis
Prof. Yasuko Matsubara
https://www.tycho.pitt.edu/resources.php
from U. Pitt (epidemiology dept.)
Yasuko Matsubara, Yasushi Sakurai, Willem van Panhuis, and Christos
Faloutsos,
FUNNEL: Automatic
Mining of Spatially Coevolving
CMU SCS IC '14
C. Faloutsos
39
Epidemics, KDD 2014, New York City, NY, USA, Aug. 24-27, 2014.
CMU SCS
Open research questions
• Patterns/anomalies for time-evolving
graphs (Call graph, 3M people x 6mo)
• Spot fraudsters in soc-net (eg., Twitter
‘$10 -> 1000 followers’)
• How is the human brain wired
CMU SCS IC '14
C. Faloutsos
40
CMU SCS
Contact info
• www.cs.cmu.edu/~christos
• GHC 8019
• Ph#: x8.1457
• www.cs.cmu.edu/~christos/TALKS/1409-ic/
• FYI: Course: 15-826, Tu-Th
3:00-4:20
CMU SCS IC '14
C. Faloutsos
41
Related documents