Download Slide 1 - Data Mining Lab

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Data Mining for the Interconnected
and Mobile-Centric World
December 19, 2014
Jae-Gil Lee
Dept. of Knowledge Service Engineering
KAIST
2014-12-19
한국정보과학회 동계학술발표회 특별세션
1
Brief Bio
• 약력
– 2010년 12월~현재: KAIST 지식서비스공학과 조교수/부교수
– 2008년 9월~2010년 11월: IBM Almaden Research Center
연구원
– 2006년 7월~2008년 8월: University of Illinois at UrbanaChampaign 박사후연구원
• 연구분야
– 시공간 데이터 마이닝 (경로 및 교통 데이터)
– 소셜 네트워크 및 그래프 데이터 마이닝
– 빅 데이터 분석 (MapReduce 및 Hadoop)
• 연락처
– E-mail: [email protected]
– 홈페이지: http://dm.kaist.ac.kr/jaegil/
2014-12-19
한국정보과학회 동계학술발표회 특별세션
2
Mobile-Centric World (1/2)
• Various mobile / wearable devices
• Implications
– Trajectory data
– Human behavior data
– …
2014-12-19
한국정보과학회 동계학술발표회 특별세션
3
Mobile-Centric World (2/2)
Daily Routes
2014-12-19
Activities (Pedometer)
한국정보과학회 동계학술발표회 특별세션
4
Interconnected World (1/2)
• Social networks
– Facebook, Twitter, LinkedIn, …
– Wiring the whole world
• Most humans (85%) already have the
Internet access
• Implications
2014-12-19
한국정보과학회 동계학술발표회 특별세션
5
Interconnected World (2/2)
• Internet-of-Things
Connected
• Implications
– Will collect huge amounts of data
• 50 billion connected devices by 2020 ← CISCO
• 1GB per mobile user per day by 2020 ← Nokia
2014-12-19
한국정보과학회 동계학술발표회 특별세션
6
My Research Directions
Scaling up algorithms to
cope with Big Data
Trajectory Data
Improving the knowledge
quality by combining
multiple data sources
2014-12-19
Social Network Data
Modeling the human
behaviors precisely from
the human activity data
한국정보과학회 동계학술발표회 특별세션
7
2014-12-19
한국정보과학회 동계학술발표회 특별세션
8
Multi-Layer Networks
• Consist of multiple layers of weighted graph
– Reflecting the multiple types of relationships between persons
– e.g., AUCS data set
(a) Work.
(b) Lunch.
(c) Facebook.
(d) Friend.
(e) Coauthor.
• 61 employees of a University department
• Five different aspects (layers)
2014-12-19
한국정보과학회 동계학술발표회 특별세션
9
Community Detection in ML Networks
• Should reflect the structures of all the layers
– e.g.,
A Multi-Layer Graph
Three Clusters
Xiaowen Dong, Pascal Frossard, Pierre Vandergheynst, Nikolai Nefedov: Clustering on Multi-Layer Graphs via
Subspace Analysis on Grassmann Manifolds. IEEE Transactions on Signal Processing 62(4): 905-918 (2014)
2014-12-19
한국정보과학회 동계학술발표회 특별세션
10
Algorithm Insensitivity
A Multi-Layer Graph
Community Detection Algorithm
Differential Flattening
for Multi-Layer Graphs
Any community
detection algorithms
for single graphs
Algorithm
1
2014-12-19
Algorithm
2
Algorithm
n
한국정보과학회 동계학술발표회 특별세션
11
Differential Flattening
[Kim et al., under review]
• Making a flattened graph from a multi-layer graph with
an optimal set of layer coefficients {𝛼1 , 𝛼2 , … , 𝛼𝑚 }
𝛼1
𝛼2
Tries to maximize the
clustering coefficient
of the flattened graph
𝛼𝑚
1
2
𝛼1 ∙ 𝑤𝑖𝑗 + 𝛼2 ∙ 𝑤𝑖𝑗 + ⋯
𝑚
+ 𝛼𝑚 ∙ 𝑤𝑖𝑗
2014-12-19
한국정보과학회 동계학술발표회 특별세션
12
2014-12-19
한국정보과학회 동계학술발표회 특별세션
13
Location-Based Questions
• Informally defined as “search for a business or place of
interest that is tied to a specific geographical location”
• Very popular especially in mobile search and typically
subjective
– About 10% of 10 million Bing mobile queries were identified as
location-based questions
– In a set of location-based questions, 63% of them were nonfactual, and the remaining 37% of them were factual
 Mobile social search is the best way to process
location-based questions
2014-12-19
한국정보과학회 동계학술발표회 특별세션
14
Glaucus: A Social Search Engine for
Location-Based Questions [Choy et al., 2014]
1.
2.
3.
4.
5.
Asking a question to Glaucus
Selecting proper experts
Routing the question to the experts
Returning an answer to the questioner
(Optional) Rating the answer
2: Selected Experts
1: Query
4: Answer
5: Feedback
Glaucus
3: Query
Social Search
Engine
Answer
User Database
Crawling
Questioner
Users
2014-12-19
한국정보과학회 동계학술발표회 특별세션
15
User Interface
• An Android app has been developed and is under
(closed) beta testing
Questioner
2014-12-19
한국정보과학회 동계학술발표회 특별세션
Answerer
16
Data Collection
• Being able to collect who visited where and when on
geosocial networking services such as Foursquare
– Users check-in to a venue and also may leave a tip
– Our crawler collects such information upon user approval
2014-12-19
한국정보과학회 동계학술발표회 특별세션
17
Expert Finding
Location Aspect Model
Venue
Venue
Location
Location
Category
Similarity
Calculation
Category
Time
Time
Misc.
Misc.
Other
Users
Top-k
Score
Score
Questioner
Online Friend?
Score
Score
2014-12-19
한국정보과학회 동계학술발표회 특별세션
18
Mobile User Availability
• Motivation
• Study methodology
Context
26
Features
Class Label
Smart Phone Log
External Information
(Time, Date)
Classifier
Training
Decision Tree,
SVM,
Random Forest
…
Prediction
26
Features
Availability?
Availability
Classification
Model
2014-12-19
한국정보과학회 동계학술발표회 특별세션
19
User Behavior Collection
분류
데이터 종류
배터리 정보(배터리 잔량, 충전 여부, 충전 모드)
전화 정보(통화 시작시간, 통화 소요시간, 수신/
발신 여부)
메시지 정보(문자 시간, 수신/발신 여부)
GPS 정보(위도, 경도)
스마트폰
기기 정보(진동모드, 무음모드, 비행기모드, CPU
Context
사용량, 헤드폰모드, 스크린 점등)
Data
주위 정보(주변 조명 밝기, 주변 소음 세기)
WIFI 정보(WIFI On/Off, SSID, 신호 세기)
Cellular 정보(Cellular On/Off, 신호 세기)
애플리케이션 정보(애플리케이션 이름,
애플리케이션 구동 시간)
가용성
특정 시각에서의 응답 가능 여부
Data
2014-12-19
수집 방법
백그라운드
수집
직접 입력
한국정보과학회 동계학술발표회 특별세션
20
Preliminary Results
• Accuracy
– 10-fold cross validation
– 10 users for 5 weeks
Model
Accuracy
Baseline (Always Available)
0.53
Naïve Bayesian
0.66
SVM
0.64
KNN
0.62
Decision Tree
0.64
Adaboost
0.61
Random Forest
0.7
10
• Important features
–
Time, Day of Week
– 2nd: Running Apps
– 3rd: WIFI SSID,
# of Apps (30 mins),
Time of Day
1st:
2014-12-19
9
8
7
6
5
4
3
2
1
0
한국정보과학회 동계학술발표회 특별세션
21
2014-12-19
한국정보과학회 동계학술발표회 특별세션
22
Wearable Cameras
• Narrative Clip
http://getnarrative.com/
–
–
–
–
–
8 GB flash storage (4000 pictures)
5.0 megapixel camera
GPS, accelerometer, magnetometer
JPEG 2560 x 1920
2 pictures every minute
2014-12-19
Example of Pictures Taken
한국정보과학회 동계학술발표회 특별세션
23
Cognitive Therapy (인지치료, 人知治療)
• Effects
– Stimulation-oriented
treatments
• e.g., SenseCam
• Study methodology
Evolve the selection procedure
(emphasizing wrong answers)
Quizzes
What happened today?
What happened the first?
1
2
1
2
3
4
3
4
2014-12-19
Participation
한국정보과학회 동계학술발표회 특별세션
User Feedback Analysis
24
Daily Photos
General
Rules
Personalized
Rules
[Lee et al., Korean patent applied]
Personalized
Rule
Evolution
Photo Selection and Quiz Generation
Image Feature
Extraction
Face Detection
Feedback
Quizzes
Selected
Photos
System Architecture
Location Weather
김대훈, 최민수, 김대훈, 이재길, “치매 및 노인성 건망증의 치료와 예방을 위한 웨어러블 카메라 이미지 데이터 처리
시스템,” 한국정보과학회 동계학술대회, 2014년 12월 18일.
2014-12-19
한국정보과학회 동계학술발표회 특별세션
25
User Feedback Analysis
OpenCV ⇒ Chroma Histogram
Caffe
⇒ Entities, Background
Face++
⇒ Persons
기상청
⇒ Weather
Location
Date, Time
Association
Rules
(with Support
and Confidence)
+
User Feedback
(Answers)
•
•
•
•
OpenCV: http://opencv.org/
Caffe: http://caffe.berkeleyvision.org/
Face++: http://www.faceplusplus.com/
기상청 Open API: http://www.kma.go.kr/weather/lifenindustry/sevice_rss.jsp
2014-12-19
한국정보과학회 동계학술발표회 특별세션
26
Concluding Remarks
2014-12-19
한국정보과학회 동계학술발표회 특별세션
27
References
•
•
•
•
•
•
•
•
•
•
[Lee et al., 2015] Lee, J., Han, J., and Li, X., "A Unifying Framework of Mining Trajectory Patterns of Various
Temporal Tightness," IEEE Trans. on Knowledge and Data Engineering (TKDE), accepted (SCI Core, impact
factor: 1.815).
[Lee et al., 2014] Lee, J. et al., "Joins on Encoded and Partitioned Data," In Proc. 40th Int'l Conf. on Very Large
Data Bases (VLDB) / Proc. of The VLDB Endowment (PVLDB), Vol. 7, No. 13, pp. 1355 ~ 1366 (top conference,
industrial track).
[Choy et al., 2014] Choy, M., Lee, J., Gweon, G., and Kim, D., "Glaucus: Exploiting the Wisdom of Crowds for
Location-Based Queries in Mobile Environments," In Proc. 8th Int'l AAAI Conf. on Weblogs and Social Media
(ICWSM), Ann Arbor, Michigan, pp. 61 ~ 70, June 2014 (acceptance rate: 23.0%).
[Lim et al., 2014] Lim, S., Ryu, S., Kwon, S., Jung, K., and Lee, J., "LinkSCAN*: Overlapping Community
Detection Using the Link-Space Transformation," In Proc. 30th Int'l Conf. on Data Engineering (IEEE ICDE),
Chicago, Illinois, pp. 292~303, Apr. 2014 (top conference, acceptance rate: 20.0%).
[Moon et al., 2014] Moon, S., Lee, J., and Kang, M., "Scalable Community Detection from Networks by
Computing Edge Betweenness on MapReduce," In Proc. 2014 Int'l Conf. on Big Data and Smart Computing
(BigComp), Bangkok, Thailand, pp. 145 ~ 148, Jan. 2014. This paper received the Best Paper Award.
[Sung et al., 2013] Sung, J., Lee, J., and Lee, U., "Booming Up the Long Tails: Discovering Potentially
Contributive Users in Community-Based Question Answering Services," In Proc. 7th Int'l AAAI Conf. on Weblogs
and Social Media (ICWSM), Cambridge, Massachusetts, pp. 602 ~ 610, July 2013 (acceptance rate: 20.6%). This
paper received the Best Paper Award.
[Lee et al., 2012] Lee, S., Ko, M., Han, K., and Lee, J., "On Finding Fine-Granularity User Communities by Profile
Decomposition," In Proc. 2012 IEEE/ACM Int'l Conf. on Advances in Social Networks Analysis and Mining
(ASONAM), Istanbul, Turkey, pp. 631 ~ 639, Aug. 2012 (full paper, acceptance rate: 16.0%).
[Lee et al., 2011] Lee, J., Han, J., Li, X., and Cheng, H., "Mining Discriminative Patterns for Classifying
Trajectories on Road Networks," IEEE Trans. on Knowledge and Data Engineering (TKDE), Vol. 25, No. 5, pp.
713 ~ 726, May 2011 (SCI Core, impact factor: 1.657, 5-year impact factor: 2.084).
[Kim et al., 201?] Kim, J., Lee, J., and Lim, S., "Differential Flattening: A Novel Framework for Community
Detection in Multi-Layer Graphs," submitted to ACM Trans. on Intelligent Systems and Technology (TIST).
[Lee et al., 201?] Lee, J. and Kang, M., "Device of Data Processing for Preventing Dementia and Method of Data
Processing for Preventing Dementia Using the Same," Korean Patent, Application No: 10-2014-0079108,
Application Date: June 26, 2014.
2014-12-19
한국정보과학회 동계학술발표회 특별세션
28
2014-12-19
한국정보과학회 동계학술발표회 특별세션
29