Download preference learning: machine learning meets preference modeling

Document related concepts
no text concepts found
Transcript
PREFERENCE LEARNING: MACHINE LEARNING MEETS PREFERENCE MODELING Eyke Hüllermeier Intelligent Systems Group Department of Computer Science University of Paderborn, Germany !
[email protected]!
MDAI 2015, SKÖVDE, 21-­‐SEP-­‐2015 P R E F E R E N C E S A R E U B I Q U I T O U S Preferences play a key role in many applicaHons of computer science and modern informaHon technology: COMPUTATIONAL ADVERTISING RECOMMENDER SYSTEMS COMPUTER GAMES AUTONOMOUS AGENTS ELECTRONIC COMMERCE ADAPTIVE USER INTERFACES PERSONALIZED MEDICINE ADAPTIVE RETRIEVAL SYSTEMS SERVICE-­‐ORIENTED COMPUTING 2
P R E F E R E N C E S A R E U B I Q U I T O U S Preferences play a key role in many applicaHons of computer science and modern informaHon technology: COMPUTATIONAL ADVERTISING RECOMMENDER SYSTEMS COMPUTER GAMES AUTONOMOUS AGENTS ELECTRONIC COMMERCE ADAPTIVE USER INTERFACES PERSONALIZED MEDICINE ADAPTIVE RETRIEVAL SYSTEMS SERVICE-­‐ORIENTED COMPUTING medicaHons or therapies specifically tailored for individual paHents 3
C O M M E R C I A L I N T E R E S T 4
P R E F E R E N C E I N F O R M A T I O N 5
P R E F E R E N C E I N F O R M A T I O N NOT CLICKED ON CLICKED ON -  Preferences are not necessarily expressed explicitly, but can be extracted implictly from people‘s behavior! -  Massive amounts of very noisy data! 6
P R E F E R E N C E L E A R N I N G Fostered by the availability of large amounts of data, PREFERENCE LEARNING has recently emerged as a new subfield of machine learning, dealing with the learning of (predicHve) preference models from observed, revealed or automaHcally extracted preference informaHon. 7
T Y P E S O F P R E F E R E N C E S -  binary vs. graded (e.g., relevance judgements vs. raHngs) -  absolute vs. relaJve (e.g., assessing single alternaHves vs. comparing pairs) -  explicit vs. implicit (e.g., direct feedback vs. click-­‐through data) -  structured vs. unstructured (e.g., raHngs on a given scale vs. free text) -  single user vs. mulJple users (e.g., document keywords vs. social tagging) -  single vs. mulJ-­‐dimensional -  ... A w i d e s p e c t r u m o f l e a r n i n g p r o b l e m s ! 8
P R E F E R E N C E L E A R N I N G T A S K S Preference learning problems are challenging, because -  sought predicHons are complex/structured, top-­‐K ranking -  supervision is weak (parHal, noisy, ...), clickthrough data -  performance metrics are hard to opHmize, NDCG@K -  ... 9
P L I S A N A C T I V E F I E L D Tutorials: - 
- 
- 
- 
- 
European Conf. on Machine Learning, 2010 Int. Conf. Discovery Science, 2011 Int. Conf. Algorithmic Decision Theory, 2011 European Conf. on ArHficial Intelligence, 2012 Int. Conf. Algorithmic Learning Theory, 2014 Special Issue on RepresenHng, Processing, and Learning Preferences: TheoreHcal and PracHcal Challenges (2011) J. Fürnkranz & E. Hüllermeier (eds.) Preference Learning Springer-­‐Verlag 2011 Special Issue on Preference Learning Forthcoming 10
P L I S A N A C T I V E F I E L D § 
§ 
§ 
§ 
§ 
§ 
§ 
§ 
§ 
§ 
§ 
§ 
§ 
§ 
§ 
§ 
NIPS 2001: New Methods for Preference ElicitaHon NIPS 2002: Beyond ClassificaHon and Regression: Learning Rankings, Preferences, Equality Predicates, and Other Structures KI 2003: Preference Learning: Models, Methods, ApplicaHons NIPS 2004: Learning with Structured Outputs NIPS 2005: Workshop on Learning to Rank IJCAI 2005: Advances in Preference Handling SIGIR 07–10: Workshop on Learning to Rank for InformaHon Retrieval ECML/PDKK 08–10: Workshop on Preference Learning NIPS 2009: Workshop on Advances in Ranking American InsHtute of MathemaHcs Workshop in Summer 2010: The MathemaHcs of Ranking NIPS 2011: Workshop on Choice Models and Preference Learning EURO 2009-­‐12: Special Track on Preference Learning ECAI 2012: Workshop on Preference Learning: Problems and ApplicaHons in AI DA2PL 2012: From Decision Analysis to Preference Learning Dagstuhl Seminar on Preference Learning (2014) NIPS 2014: Analysis of Rank Data: Confluence of Social Choice, OperaHons Research, and Machine Learning 11
C O N N E C T I O N S T O O T H E R F I E L D S Structured Output PredicHon Learning Monotone Models InformaHon Retrieval Recommender Systems Learning with weak supervision Preference Learning StaHsHcs Graph theory ClassificaHon (ordinal, mulHlabel, ...) Economics & Decison Science Social Choice OpHmizaHon OperaHons Research MulHple Criteria Decision Making 12
O U T L I N E P A R T 1 IntroducHon to preference learning P A R T 2 Machine learning vs. MCDA P A R T 3 MulH-­‐criteria preference learning P A R T 4 Preference-­‐based online learning 13
M A C H I N E L E A R N I N G SUPERVISED LEARNING: Algorithms and methods for discovering dependencies and regulariHes in a domain of interest, expressed through appropriate models, from specific observaHons or examples. inducHon learning principle algorithm background knowledge data/observaJons MODEL INDUCTION … used for §  predicHon, classificaHon §  adaptaHon, control §  systems analysis model 14
T E S T T R A I N I N G S U P E R V I S E D L E A R N I N G X1
X2
X3
X4
obs 0.34
0
10
174
*** 1.45
0
32
277
* 1.22
1
46
421
**** 0.74
1
25
165
*** 0.95
1
72
273
***** 1.04
0
33
158
*** 0.85
0
45
194
? 0.57
1
65
403
? 1.32
1
26
634
? ... ... ... ... ? 15
T E S T T R A I N I N G S U P E R V I S E D L E A R N I N G X1
X2
X3
X4
obs pred 0.34
0
10
174
*** ** 1.45
0
32
277
* * 1.22
1
46
421
**** **** 0.74
1
25
165
*** **** 0.95
1
72
273
***** **** 1.04
0
33
158
*** ** 0.85
0
45
194
? *** 0.57
1
65
403
? * 1.32
1
26
634
? ***** ...
...
...
...
? ... 16
M O D E L I N D U C T I O N I N M L data generaHng process, the „ground truth“ P O P U L A T I O N ... S T O C H A S T I C P R O C E S S E S T I M A T E D M O D E L O B S E R V E D D A T A L E A R N I N G A L G O R I T H M -  The model refers to an underlying popula6on of individuals. -  Knowing the model allows for making good predicIons on average. -  The dependency tried to be captured by the model is not determinis6c (variability due to aggregaIon, „noisy“ observaIons, etc.) 17
M O D E L I N D U C T I O N I N M L data generaHng process, the „ground truth“ P O P U L A T I O N ... S T O C H A S T I C P R O C E S S E S T I M A T E D M O D E L O B S E R V E D D A T A L E A R N I N G A L G O R I T H M -  AssumpIons about the „ground truth“ allow for deriving theore6cal results (given enough data, the learner is likely to get close to the target). -  Focus on predicIve accuracy allows for simple empirical comparison. 18
P R E F E R E N C E E L I C I T A T I O N I N M C D A decision maker preference model R E V E A L E D P R E F E R E N C E S D E C I S I O N A N A L Y S T preference queries -  Single user, interacIve process. -  Inconsistencies can be discovered and corrected. -  ConstrucIve process, no „ground truth“, no true vs. esImated model (construcIon vs. inducIon). 19
M A C H I N E L E A R N I N G V S . M C D A This comparison is obviously simplified .... §  Other serngs in ML: semi-­‐ and unsupervised learning, acHve learning, online learning, reinforcement learning, ... §  Bayesian approaches to preference elicitaHon (e.g. Viappini and BouHlier): stochasHc serng, acHve learning using expected value of informaHon. §  Machine learning with humans in the loop (Joachims): focusing on the interface between the human and a conHnuously learning system (beyond mere labeling). §  ... 20
O U T L I N E P A R T 1 IntroducHon to preference learning P A R T 2 Machine learning vs. MCDA P A R T 3 MulH-­‐criteria preference learning P A R T 4 Preference-­‐based online learning 21
M U L T I -­‐ C R I T E R I A P R E F E R E N C E L E A R N I N G P R E F E R E N C E M O D E L I N G -  Choquet integral [Fallah Tehrani et al., 2011, 2012, 2013] How do value and ranking funcIons generalize, what is their pedicIve accuracy? -  CP nets, GAI nets [Chevaleyre et al., 2010] -  Lexicographic orders [Bräuning and EH, 2012] -  Majority rule models, MR Sort, ... [Leroy et al., 2011; Sobrie et al., 2013] How to represent value and ranking funcIons, and what properIes should they obey? -  TOPSIS-­‐like models [Agarwal et al., 2014] P R E F E R E N C E L E A R N I N G 22
M U L T I -­‐ C R I T E R I A P R E F E R E N C E L E A R N I N G Using established decision models in a machine learning context ... 23
A G G R E G A T I O N O F C R I T E R I A criteria aSributes/features Math CS StaHsHcs English score 16
17
12
19
0.7
10
12
18
9
0.4
19
10
18
10
0.6
…
…
…
…
....
8
18
10
18
0.5
Going beyond simple averaging ... 24
N O N -­‐ A D D I T I V E M E A S U R E S We require ... normalizaIon monotonicity not necessarily addiIvity 25
N O N -­‐ A D D I T I V E M E A S U R E S In a machine learning context: criteria = axributes/features 26
D I S C R E T E C H O Q U E T I N T E G R A L How to aggregate individual values, giving the right weight/importance to each subset of criteria? 27
D I S C R E T E C H O Q U E T I N T E G R A L 28
A G G R E G A T I O N O P E R A T O R S very strict c o n j u n c H v e d i s j u n c H v e very tolerant a v e r a g i n g Special cases: -  min -  max -  weighted average (addiHve measure) -  ordered weighted average (OWA) 29
D E C I S I O N B O U N D A R Y I N 2 D CS 1!
linear decision boundary c!
0!
0!
c!
1!
Math 30
D E C I S I O N B O U N D A R Y I N 2 D CS 1!
c!
0!
0!
c!
1!
Math 31
D E C I S I O N B O U N D A R Y I N 2 D CS 1!
c!
0!
0!
c!
1!
Math 32
D E C I S I O N B O U N D A R Y I N 2 D CS 1!
c!
0!
0!
c!
1!
Math 33
D E C I S I O N B O U N D A R Y I N 2 D CS 1!
c!
0!
0!
c!
1!
Math 34
D E C I S I O N B O U N D A R Y I N 2 D CS 1!
c!
0!
0!
c!
1!
Math 35
V C D I M E N S I O N 36
O U T P U T G E N E R A L I Z A T I O N I N P U T 37
G E N E R A L I Z A T I O N O U T P U T low training error but poor generalizaIon ... I N P U T 38
L E A R N I N G W I T H T H E C H O Q U E T I N T E G R A L §  We advocate the Choquet integral as a versaHle tool in the context of (supervised) machine learning, especially for learning monotone models. §  Being used as a predicHon funcHon to combine several input features (criteria) into an output, the Choquet integral nicely combines -  monotonicity -  non-­‐linearity -  interpretability (importance, interacIon) 39
L E A R N I N G W I T H T H E C H O Q U E T I N T E G R A L global uIlity AGGREGATION local uIlity funcIons 40
R A N K I N G TRAINING DATA: The goal is to find a Choquet integral whose uIlity degrees tend to agree with the observed pairwise preferences! 41
O R D I N A L C L A S S I F I C A T I O N / S O R T I N G TRAINING DATA: The goal is to find a Choquet integral whose uIlity degrees tend to agree with the observed classificaIons! 42
T H E B I N A R Y C A S E TRAINING DATA: disHnguishing between „good“ and „bad“ The goal is to find a Choquet integral whose uIlity degrees tend to agree with the observed classificaIons! 43
M O D E L I D E N T I F I C A T I O N ProbabilisIc modeling allows for the use of inducIon principles such as maximum likelihood (ML) esImaIon! M O D E L ( C H O Q U E T ) S T O C H A S T I C P R O C E S S O B S E R V E D D A T A ... in contrast to (linear) programming techniques as used in preference elicitaIon. 44
S T O C H A S T I C M O D E L I N G 45
S T O C H A S T I C M O D E L I N G LOGISTIC NOISY RESPONSE MODEL precision of the model uHlity threshold 46
S T O C H A S T I C M O D E L I N G γ = 0 γ = ∞ 47
S T O C H A S T I C M O D E L I N G 48
C H O Q U I S T I C R E G R E S S I O N LogisHc ChoquisHc Choquet integral of (normalized) axribute values -  A. Fallah Tehrani, W. Cheng, K. Dembczynski, EH. Learning Monotone Nonlinear Models using the Choquet Integral. Machine Learning, 89(1), 2012. -  A. Fallah Tehrani, W. Cheng, EH. Preference Learning using the Choquet Integral: The Case of MulHparHte Ranking. IEEE TransacHons on Fuzzy Systems, 2012. -  EH, A. Fallah Tehrani. Efficient Learning of Classifiers based on the 2-­‐addiHve Choquet Integral. In: ComputaHonal Intelligence in Intelligent Data Analysis. Springer, 2012. -  A. Fallah Tehrani and EH. Ordinal ChoquisHc Regression. EUSFLAT 2013. 49
M A X I M U M L I K E L I H O O D E S T I M A T I O N 50
M A X I M U M L I K E L I H O O D E S T I M A T I O N ML esHmaHon leads to a constrained opJmizaJon problem: condiHons on bias and scale parameter normalizaHon and monotonicity of the non-­‐addiHve measure à computaIonally expensive! 51
E X P E R I M E N T A L E V A L U A T I O N 20% 50% 80% monotone classifier nonlinear classifier O U T L I N E P A R T 1 IntroducHon to preference learning P A R T 2 Machine learning vs. MCDA P A R T 3 MulH-­‐criteria preference learning P A R T 4 Preference-­‐based online learning 53
M U L T I -­‐ A R M E D B A N D I T S „pulling an arm“ choosing an opHon parIal informaIon online learning sequenIal decision process 54
M U L T I -­‐ A R M E D B A N D I T S „pulling an arm“ purng an adverHsement on a website choice of an opHon/strategy (arm) yields a random reward parIal informaIon online learning sequenIal decision process 55
M U L T I -­‐ A R M E D B A N D I T S „pulling an arm“ picking a traffic route from source to target choice of an opHon/strategy (arm) yields a random reward parIal informaIon online learning sequenIal decision process 56
M U L T I -­‐ A R M E D B A N D I T S „pulling an arm“ choosing an opHon choice of an opHon/strategy (arm) yields a random reward parIal informaIon online learning sequenIal decision process 57
M U L T I -­‐ A R M E D B A N D I T S Immediate reward: CumulaHve reward: 2.5
2.5
58
M U L T I -­‐ A R M E D B A N D I T S Immediate reward: CumulaHve reward: 2.5 3.1
2.5 5.6
59
M U L T I -­‐ A R M E D B A N D I T S Immediate reward: CumulaHve reward: 2.5 3.1 1.7
2.5 5.6 7.3
60
M U L T I -­‐ A R M E D B A N D I T S Immediate reward: CumulaHve reward: 2.5 3.1 1.7 3.7 ...
2.5 5.6 7.3 11.0 ...
maximize cumulaHve reward à explore and exploit (tradeoff) find best opHon à pure exploraIon (effort vs. certainty) 61
P R E F E R E N C E -­‐ B A S E D B A N D I T S In many applicaHons, -  the assignment of (numeric) rewards to single outcomes (and hence the assessment of individual opHons on an absolute scale) is difficult, -  while the qualitaJve comparison between pairs of outcomes (arms/
opHons) is more feasible. 62
P R E F E R E N C E -­‐ B A S E D B A N D I T S RETRIEVAL FUNCTION 1 RETRIEVAL FUNCTION 2 RETRIEVAL FUNCTION 3 The result returned by the third retrieval funcIon, for a given query, is preferred to the result returned by the first search engine. RETRIEVAL FUNCTION 4 RETRIEVAL FUNCTION 5 Noisy preferences can be inferred from how a user clicks through an interleaved list of documents [Radlinski et al., 2008]. 63
P R E F E R E N C E -­‐ B A S E D B A N D I T S PLAYER 1 PLAYER 2 PLAYER 3 PLAYER 4 PLAYER 5 Third player has beaten first player in a match. 64
P R E F E R E N C E -­‐ B A S E D B A N D I T S PLAYER 1 PLAYER 2 PLAYER 3 PLAYER 4 PLAYER 5 -  This seYng has first been introduced as the dueling bandits problem (Yue and Joachims, 2009). -  More generally, we speak of preference-­‐based mul6-­‐armed bandits (PB-­‐MAB). 65
P R E F E R E N C E -­‐ B A S E D R A N K E L I C I T A T I O N 66
P R E F E R E N C E -­‐ B A S E D R A N K E L I C I T A T I O N 67
S U M M A R Y & C O N C L U S I O N Preference learning is -  methodologically interesHng, -  theoreJcally challenging, -  and pracJcally useful, with many potenHal applicaJons; -  interdisciplinary (connecHons to operaHons research, decision sciences, economics, social choice, recommender systems, informaHon retrieval, ...). Established methods exist, but the field is sHll developing (e.g., online preference learning, preference-­‐based reinforcement learning, ...) In parIcular, there are many links between preference learning and decision analysis, most of which are sIll to be explored! 68
S E L E C T E D R E F E R E N C E S § 
§ 
§ 
§ 
§ 
§ 
§ 
§ 
A. Fallah Tehrani, W. Cheng, EH. ChoquisHc Regression: Generalizing LogisHc Regression using the Choquet Integral. EUSFLAT 2011. A. Fallah Tehrani, W. Cheng, K. Dembczynski, E. Hüllermeier. Learning Monotone Nonlinear Models using the Choquet Integral. Machine Learning, 89(1):183—211, 2012. A. Fallah Tehrani, W. Cheng, K. Dembczynski, EH. Learning Monotone Nonlinear Models using the Choquet Integral. ECML/PKDD 2011. A. Fallah Tehrani, W. Cheng, E. Hüllermeier. Preference Learning using the Choquet Integral: The Case of MulHparHte Ranking. IEEE TransacHons on Fuzzy Systems, 2012. E. Hüllermeier, A. Fallah Tehrani. On the VC Dimension of the Choquet Integral. IPMU 2012. R. Busa-­‐Fekete and E. Hüllermeier. A Survey of Preference-­‐based Online Learning with Bandit Algorithms. Proc. ALT-­‐2014, Int. Conf. Algorithmic Learning Theory, Bled, 2014. R. Busa-­‐Fekete, E. Hüllermeier, and B. Szorenyi. Preference-­‐based rank elicitaHon using staHsHcal models: The case of Mallows. ICML 2014. R. Busa-­‐Fekete, B. Szorenyi, and E. Hüllermeier. PAC rank elicitaHon through adapHve sampling of stochasHc pairwise preferences. AAAI 2014. 69