Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CMU SCS Big (graph) data analytics Christos Faloutsos CMU CMU SCS CONGRATULATIONS! CMU SCS IC '14 C. Faloutsos 2 CMU SCS Outline • • • • • Q+A Problem definition / Motivation Graphs, tensors and brains Anomaly detection Conclusions CMU SCS IC '14 C. Faloutsos 3 CMU SCS Q+A • • • • • Are you recruiting? How many? How many do you have? How frequently you meet them? What is your advising style? How do you feel about summer internships? CMU SCS IC '14 C. Faloutsos 4 CMU SCS Q+A • • • • • Are you recruiting? How many? • • How many do you have? How frequently you meet them? • • What is your advising style? How do you feel about summer • internships? CMU SCS IC '14 C. Faloutsos 1 or 2 6 (+5pdocs) 1/week results Yes/Maybe (FB, MSR, IBM, ++) 5 CMU SCS Outline • • • • • Q+A Problem definition / Motivation Graphs, tensors and brains Anomaly detection Conclusions CMU SCS IC '14 C. Faloutsos 6 CMU SCS Motivation • Data mining: ~ find patterns (rules, outliers) • How do real graphs look like? Anomalies? • Time series / Monitoring Measles @ PA, NY, … CMU SCS IC '14 C. Faloutsos 7 CMU SCS Graphs - why should we care? CMU SCS IC '14 C. Faloutsos 8 CMU SCS Graphs - why should we care? Food Web [Martinez ’91] ~1B users $10-$100B revenue Internet Map [lumeta.com] CMU SCS IC '14 C. Faloutsos 9 CMU SCS Outline • • • • • Q+A Problem definition / Motivation Graphs, tensors and brains Anomaly detection Conclusions CMU SCS IC '14 C. Faloutsos 10 CMU SCS NELL & concepts (=groups) • Predicates (subject, verb, object) in knowledge base Vagelis Papalexakis CMU-CS “Eric Clapton plays guitar” “Barack Obama is the president of U.S.” CMU SCS IC '14 Tom Mitchell CMU/CS-MLD (48M) (26M) NELL (Never Ending Language Learner) data Nonzeros =144M (26M) C. Faloutsos 11 CMU SCS Answer : tensor factorization • Recall: (SVD) matrix factorization: finds blocks ‘meat-eaters’‘vegetarians’ ‘kids’ M ‘steaks’ ‘plants’ ‘cookies’ products N users CMU SCS IC '14 + ~ C. Faloutsos + 12 CMU SCS Answer : tensor factorization • PARAFAC decomposition artists politicians subject + = athletes + object CMU SCS IC '14 C. Faloutsos 13 CMU SCS Answer : tensor factorization • PARAFAC decomposition • Results for who-calls-whom-when ?? – 4M x 15 days ?? caller + = ?? + callee CMU SCS IC '14 C. Faloutsos 14 CMU SCS Concept Discovery • Concept Discovery in Knowledge Base CMU SCS IC '14 C. Faloutsos 15 CMU SCS Concept Discovery • Concept Discovery in Knowledge Base NP1: Internet, file, data NP2: Protocol, software, suite CMU SCS IC '14 C. Faloutsos 16 CMU SCS Neuro-semantics • Brain Scan Data* • 9 persons • 60 nouns • Questions • 218 questions • ‘is it alive?’, ‘can you eat it?’ *Mitchell et al. Predicting human brain activity associated with CMUthe meanings of C. Faloutsos nouns. Science,2008. Data@ SCS IC '14 17 www.cs.cmu.edu/afs/cs/project/theo-73/www/science2008/data.html CMU SCS Neuro-semantics • Brain Scan Data* • 9 persons • 60 nouns • Questions • 218 questions • ‘is it alive?’, ‘can you eat it?’ Patterns? CMU SCS IC '14 C. Faloutsos 18 CMU SCS Neuro-semantics • Brain Scan Data* • 9 persons • 60 nouns • Questions • 218 questions • ‘is it alive?’, ‘can you eat it?’ questions Patterns? airplane CMU SCS IC '14 nouns C. Faloutsos … dog voxels 19 CMU SCS Neuro-semantics CMU SCS IC '14 C. Faloutsos = 20 CMU SCS Neuro-semantics = Small items -> Premotor cortex CMU SCS IC '14 C. Faloutsos 21 CMU SCS Neuro-semantics Small items -> Premotor cortex Evangelos Papalexakis, Tom Mitchell, Nicholas Sidiropoulos, Christos Faloutsos, Partha Pratim Talukdar, Brian Murphy, Turbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by22 200x, CMU SCS IC '14 C. Faloutsos SDM 2014 CMU SCS Scalability • Google: > 450,000 processors in clusters of ~2000 processors each [Barroso+, “Web Search for a Planet: The Google Cluster Architecture” IEEE Micro 2003] • Yahoo: 5Pb of data [Fayyad, KDD’07] • Google-NY, Aug’14: ‘graph with 1T edges, 300B nodes’ • Problem: machine failures, on a daily basis • How to parallelize data mining tasks, then? • A: map/reduce – hadoop (open-source clone) http://hadoop.apache.org/ CMU SCS IC '14 C. Faloutsos 23 CMU SCS Outline • • • • • Q+A Problem definition / Motivation Graphs, tensors and brains Anomaly/fraud detection Conclusions CMU SCS IC '14 C. Faloutsos 24 CMU SCS App-store fraud Opinion Fraud Detection in Online Reviews using Network Effects Leman Akoglu, Rishi Chandy, CF ICWSM’13 (NSF grant, with Alex Beutel) CMU SCS IC '14 C. Faloutsos 25 CMU SCS • Given Problem – user-product review network – review sign (+/-) • Classify – objects into type-specific classes: users: `honest’ / `fraudster’ products: `good’ / `bad’ reviews: `genuine’ / `fake’ No side data! (e.g., timestamp, review text) CMU SCS IC '14 C. Faloutsos 26 CMU SCS Formulation: BP User honest honest Product – + bad good Before After CMU SCS IC '14 C. Faloutsos 27 CMU SCS Users Top scorers Products + positive (4-5) rating o negative (1-2) rating CMU SCS IC '14 C. Faloutsos 28 CMU SCS Users Top scorers Products + positive (4-5) rating o negative (1-2) rating CMU SCS IC '14 C. Faloutsos 29 CMU SCS ‘Fraud-bot’ member reviews Same developer! CMU SCS IC '14 Duplicated text! C. Faloutsos Same day activity! 30 CMU SCS Outline • • • • • • Q+A Problem definition / Motivation Graphs, tensors and brains Anomaly/fraud detection Time series, monitoring / forecasting Conclusions CMU SCS IC '14 C. Faloutsos 31 CMU SCS ‘Tycho’ – epidemics analysis Yasuko Matsubara 50 states x 46 diseases CMU SCS IC '14 C. Faloutsos 32 CMU SCS ‘Tycho’ – epidemics analysis Prof. Yasuko Matsubara CMU SCS IC '14 C. Faloutsos 33 CMU SCS ‘Tycho’ – epidemics analysis Flu? Measles? August? No periodicity? CMU SCS IC '14 Prof. Yasuko Matsubara C. Faloutsos 34 CMU SCS ‘Tycho’ – epidemics analysis Flu? Measles? August? No periodicity? CMU SCS IC '14 Prof. Yasuko Matsubara C. Faloutsos 35 CMU SCS ‘Tycho’ – epidemics analysis Flu? Measles? August? No periodicity? CMU SCS IC '14 Prof. Yasuko Matsubara C. Faloutsos 36 CMU SCS ‘Tycho’ – epidemics analysis Flu? Measles? August? No periodicity? CMU SCS IC '14 Prof. Yasuko Matsubara C. Faloutsos 37 CMU SCS ‘Tycho’ – epidemics analysis Flu? Measles? August? No periodicity? CMU SCS IC '14 Prof. Yasuko Matsubara C. Faloutsos 38 CMU SCS ‘Tycho’ – epidemics analysis Prof. Yasuko Matsubara https://www.tycho.pitt.edu/resources.php from U. Pitt (epidemiology dept.) Yasuko Matsubara, Yasushi Sakurai, Willem van Panhuis, and Christos Faloutsos, FUNNEL: Automatic Mining of Spatially Coevolving CMU SCS IC '14 C. Faloutsos 39 Epidemics, KDD 2014, New York City, NY, USA, Aug. 24-27, 2014. CMU SCS Open research questions • Patterns/anomalies for time-evolving graphs (Call graph, 3M people x 6mo) • Spot fraudsters in soc-net (eg., Twitter ‘$10 -> 1000 followers’) • How is the human brain wired CMU SCS IC '14 C. Faloutsos 40 CMU SCS Contact info • www.cs.cmu.edu/~christos • GHC 8019 • Ph#: x8.1457 • www.cs.cmu.edu/~christos/TALKS/1409-ic/ • FYI: Course: 15-826, Tu-Th 3:00-4:20 CMU SCS IC '14 C. Faloutsos 41