Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Mining for the Interconnected and Mobile-Centric World December 19, 2014 Jae-Gil Lee Dept. of Knowledge Service Engineering KAIST 2014-12-19 한국정보과학회 동계학술발표회 특별세션 1 Brief Bio • 약력 – 2010년 12월~현재: KAIST 지식서비스공학과 조교수/부교수 – 2008년 9월~2010년 11월: IBM Almaden Research Center 연구원 – 2006년 7월~2008년 8월: University of Illinois at UrbanaChampaign 박사후연구원 • 연구분야 – 시공간 데이터 마이닝 (경로 및 교통 데이터) – 소셜 네트워크 및 그래프 데이터 마이닝 – 빅 데이터 분석 (MapReduce 및 Hadoop) • 연락처 – E-mail: [email protected] – 홈페이지: http://dm.kaist.ac.kr/jaegil/ 2014-12-19 한국정보과학회 동계학술발표회 특별세션 2 Mobile-Centric World (1/2) • Various mobile / wearable devices • Implications – Trajectory data – Human behavior data – … 2014-12-19 한국정보과학회 동계학술발표회 특별세션 3 Mobile-Centric World (2/2) Daily Routes 2014-12-19 Activities (Pedometer) 한국정보과학회 동계학술발표회 특별세션 4 Interconnected World (1/2) • Social networks – Facebook, Twitter, LinkedIn, … – Wiring the whole world • Most humans (85%) already have the Internet access • Implications 2014-12-19 한국정보과학회 동계학술발표회 특별세션 5 Interconnected World (2/2) • Internet-of-Things Connected • Implications – Will collect huge amounts of data • 50 billion connected devices by 2020 ← CISCO • 1GB per mobile user per day by 2020 ← Nokia 2014-12-19 한국정보과학회 동계학술발표회 특별세션 6 My Research Directions Scaling up algorithms to cope with Big Data Trajectory Data Improving the knowledge quality by combining multiple data sources 2014-12-19 Social Network Data Modeling the human behaviors precisely from the human activity data 한국정보과학회 동계학술발표회 특별세션 7 2014-12-19 한국정보과학회 동계학술발표회 특별세션 8 Multi-Layer Networks • Consist of multiple layers of weighted graph – Reflecting the multiple types of relationships between persons – e.g., AUCS data set (a) Work. (b) Lunch. (c) Facebook. (d) Friend. (e) Coauthor. • 61 employees of a University department • Five different aspects (layers) 2014-12-19 한국정보과학회 동계학술발표회 특별세션 9 Community Detection in ML Networks • Should reflect the structures of all the layers – e.g., A Multi-Layer Graph Three Clusters Xiaowen Dong, Pascal Frossard, Pierre Vandergheynst, Nikolai Nefedov: Clustering on Multi-Layer Graphs via Subspace Analysis on Grassmann Manifolds. IEEE Transactions on Signal Processing 62(4): 905-918 (2014) 2014-12-19 한국정보과학회 동계학술발표회 특별세션 10 Algorithm Insensitivity A Multi-Layer Graph Community Detection Algorithm Differential Flattening for Multi-Layer Graphs Any community detection algorithms for single graphs Algorithm 1 2014-12-19 Algorithm 2 Algorithm n 한국정보과학회 동계학술발표회 특별세션 11 Differential Flattening [Kim et al., under review] • Making a flattened graph from a multi-layer graph with an optimal set of layer coefficients {𝛼1 , 𝛼2 , … , 𝛼𝑚 } 𝛼1 𝛼2 Tries to maximize the clustering coefficient of the flattened graph 𝛼𝑚 1 2 𝛼1 ∙ 𝑤𝑖𝑗 + 𝛼2 ∙ 𝑤𝑖𝑗 + ⋯ 𝑚 + 𝛼𝑚 ∙ 𝑤𝑖𝑗 2014-12-19 한국정보과학회 동계학술발표회 특별세션 12 2014-12-19 한국정보과학회 동계학술발표회 특별세션 13 Location-Based Questions • Informally defined as “search for a business or place of interest that is tied to a specific geographical location” • Very popular especially in mobile search and typically subjective – About 10% of 10 million Bing mobile queries were identified as location-based questions – In a set of location-based questions, 63% of them were nonfactual, and the remaining 37% of them were factual Mobile social search is the best way to process location-based questions 2014-12-19 한국정보과학회 동계학술발표회 특별세션 14 Glaucus: A Social Search Engine for Location-Based Questions [Choy et al., 2014] 1. 2. 3. 4. 5. Asking a question to Glaucus Selecting proper experts Routing the question to the experts Returning an answer to the questioner (Optional) Rating the answer 2: Selected Experts 1: Query 4: Answer 5: Feedback Glaucus 3: Query Social Search Engine Answer User Database Crawling Questioner Users 2014-12-19 한국정보과학회 동계학술발표회 특별세션 15 User Interface • An Android app has been developed and is under (closed) beta testing Questioner 2014-12-19 한국정보과학회 동계학술발표회 특별세션 Answerer 16 Data Collection • Being able to collect who visited where and when on geosocial networking services such as Foursquare – Users check-in to a venue and also may leave a tip – Our crawler collects such information upon user approval 2014-12-19 한국정보과학회 동계학술발표회 특별세션 17 Expert Finding Location Aspect Model Venue Venue Location Location Category Similarity Calculation Category Time Time Misc. Misc. Other Users Top-k Score Score Questioner Online Friend? Score Score 2014-12-19 한국정보과학회 동계학술발표회 특별세션 18 Mobile User Availability • Motivation • Study methodology Context 26 Features Class Label Smart Phone Log External Information (Time, Date) Classifier Training Decision Tree, SVM, Random Forest … Prediction 26 Features Availability? Availability Classification Model 2014-12-19 한국정보과학회 동계학술발표회 특별세션 19 User Behavior Collection 분류 데이터 종류 배터리 정보(배터리 잔량, 충전 여부, 충전 모드) 전화 정보(통화 시작시간, 통화 소요시간, 수신/ 발신 여부) 메시지 정보(문자 시간, 수신/발신 여부) GPS 정보(위도, 경도) 스마트폰 기기 정보(진동모드, 무음모드, 비행기모드, CPU Context 사용량, 헤드폰모드, 스크린 점등) Data 주위 정보(주변 조명 밝기, 주변 소음 세기) WIFI 정보(WIFI On/Off, SSID, 신호 세기) Cellular 정보(Cellular On/Off, 신호 세기) 애플리케이션 정보(애플리케이션 이름, 애플리케이션 구동 시간) 가용성 특정 시각에서의 응답 가능 여부 Data 2014-12-19 수집 방법 백그라운드 수집 직접 입력 한국정보과학회 동계학술발표회 특별세션 20 Preliminary Results • Accuracy – 10-fold cross validation – 10 users for 5 weeks Model Accuracy Baseline (Always Available) 0.53 Naïve Bayesian 0.66 SVM 0.64 KNN 0.62 Decision Tree 0.64 Adaboost 0.61 Random Forest 0.7 10 • Important features – Time, Day of Week – 2nd: Running Apps – 3rd: WIFI SSID, # of Apps (30 mins), Time of Day 1st: 2014-12-19 9 8 7 6 5 4 3 2 1 0 한국정보과학회 동계학술발표회 특별세션 21 2014-12-19 한국정보과학회 동계학술발표회 특별세션 22 Wearable Cameras • Narrative Clip http://getnarrative.com/ – – – – – 8 GB flash storage (4000 pictures) 5.0 megapixel camera GPS, accelerometer, magnetometer JPEG 2560 x 1920 2 pictures every minute 2014-12-19 Example of Pictures Taken 한국정보과학회 동계학술발표회 특별세션 23 Cognitive Therapy (인지치료, 人知治療) • Effects – Stimulation-oriented treatments • e.g., SenseCam • Study methodology Evolve the selection procedure (emphasizing wrong answers) Quizzes What happened today? What happened the first? 1 2 1 2 3 4 3 4 2014-12-19 Participation 한국정보과학회 동계학술발표회 특별세션 User Feedback Analysis 24 Daily Photos General Rules Personalized Rules [Lee et al., Korean patent applied] Personalized Rule Evolution Photo Selection and Quiz Generation Image Feature Extraction Face Detection Feedback Quizzes Selected Photos System Architecture Location Weather 김대훈, 최민수, 김대훈, 이재길, “치매 및 노인성 건망증의 치료와 예방을 위한 웨어러블 카메라 이미지 데이터 처리 시스템,” 한국정보과학회 동계학술대회, 2014년 12월 18일. 2014-12-19 한국정보과학회 동계학술발표회 특별세션 25 User Feedback Analysis OpenCV ⇒ Chroma Histogram Caffe ⇒ Entities, Background Face++ ⇒ Persons 기상청 ⇒ Weather Location Date, Time Association Rules (with Support and Confidence) + User Feedback (Answers) • • • • OpenCV: http://opencv.org/ Caffe: http://caffe.berkeleyvision.org/ Face++: http://www.faceplusplus.com/ 기상청 Open API: http://www.kma.go.kr/weather/lifenindustry/sevice_rss.jsp 2014-12-19 한국정보과학회 동계학술발표회 특별세션 26 Concluding Remarks 2014-12-19 한국정보과학회 동계학술발표회 특별세션 27 References • • • • • • • • • • [Lee et al., 2015] Lee, J., Han, J., and Li, X., "A Unifying Framework of Mining Trajectory Patterns of Various Temporal Tightness," IEEE Trans. on Knowledge and Data Engineering (TKDE), accepted (SCI Core, impact factor: 1.815). [Lee et al., 2014] Lee, J. et al., "Joins on Encoded and Partitioned Data," In Proc. 40th Int'l Conf. on Very Large Data Bases (VLDB) / Proc. of The VLDB Endowment (PVLDB), Vol. 7, No. 13, pp. 1355 ~ 1366 (top conference, industrial track). [Choy et al., 2014] Choy, M., Lee, J., Gweon, G., and Kim, D., "Glaucus: Exploiting the Wisdom of Crowds for Location-Based Queries in Mobile Environments," In Proc. 8th Int'l AAAI Conf. on Weblogs and Social Media (ICWSM), Ann Arbor, Michigan, pp. 61 ~ 70, June 2014 (acceptance rate: 23.0%). [Lim et al., 2014] Lim, S., Ryu, S., Kwon, S., Jung, K., and Lee, J., "LinkSCAN*: Overlapping Community Detection Using the Link-Space Transformation," In Proc. 30th Int'l Conf. on Data Engineering (IEEE ICDE), Chicago, Illinois, pp. 292~303, Apr. 2014 (top conference, acceptance rate: 20.0%). [Moon et al., 2014] Moon, S., Lee, J., and Kang, M., "Scalable Community Detection from Networks by Computing Edge Betweenness on MapReduce," In Proc. 2014 Int'l Conf. on Big Data and Smart Computing (BigComp), Bangkok, Thailand, pp. 145 ~ 148, Jan. 2014. This paper received the Best Paper Award. [Sung et al., 2013] Sung, J., Lee, J., and Lee, U., "Booming Up the Long Tails: Discovering Potentially Contributive Users in Community-Based Question Answering Services," In Proc. 7th Int'l AAAI Conf. on Weblogs and Social Media (ICWSM), Cambridge, Massachusetts, pp. 602 ~ 610, July 2013 (acceptance rate: 20.6%). This paper received the Best Paper Award. [Lee et al., 2012] Lee, S., Ko, M., Han, K., and Lee, J., "On Finding Fine-Granularity User Communities by Profile Decomposition," In Proc. 2012 IEEE/ACM Int'l Conf. on Advances in Social Networks Analysis and Mining (ASONAM), Istanbul, Turkey, pp. 631 ~ 639, Aug. 2012 (full paper, acceptance rate: 16.0%). [Lee et al., 2011] Lee, J., Han, J., Li, X., and Cheng, H., "Mining Discriminative Patterns for Classifying Trajectories on Road Networks," IEEE Trans. on Knowledge and Data Engineering (TKDE), Vol. 25, No. 5, pp. 713 ~ 726, May 2011 (SCI Core, impact factor: 1.657, 5-year impact factor: 2.084). [Kim et al., 201?] Kim, J., Lee, J., and Lim, S., "Differential Flattening: A Novel Framework for Community Detection in Multi-Layer Graphs," submitted to ACM Trans. on Intelligent Systems and Technology (TIST). [Lee et al., 201?] Lee, J. and Kang, M., "Device of Data Processing for Preventing Dementia and Method of Data Processing for Preventing Dementia Using the Same," Korean Patent, Application No: 10-2014-0079108, Application Date: June 26, 2014. 2014-12-19 한국정보과학회 동계학술발표회 특별세션 28 2014-12-19 한국정보과학회 동계학술발표회 특별세션 29