Download Data mining, interactive semantic structuring, and

Data mining, interactive semantic structuring, and collaboration: A diversity-aware method for sense-making in search Mathias Verbeke, Bettina Berendt, Siegfried Nijssen Dept. Computer Science, KU Leuven Agenda  Motivation Diversity  Diversity-aware tools  (our) Context  Main part Measures of diversity  Tool  Outlook Motivation (1): Diversity is ... Speaking different languages (etc.)  localisation / internationalisation  Having different abilities  accessibility  Liking different things  collaborative filtering  Structuring the world in different ways  ?  Motivation (2): Diversity-aware applications ... Must have a (formal) notion of diversity  Can follow a  – “personalization approach“  adapt to the user‘s value on the diversity variable(s)  transparently? Is this paternalistic? – “customization approach“  show the space of diversity  allow choice / semi-automatic! (Our) Context Diversity and Web usage: language, culture 2. Family of tools focussing on interactive sensemaking helped by data mining 1. – PORPOISE: global and local analysis of news and blogs + their relations – STORIES: finding + visualisation of “stories” in news – CiteseerCluster: literature search + sense-making – Damilicious: CiteseerCluster + re-use/transfer of semantics + diversity Measuring grouping diversity Diversity = 1 – similarity = 1 - Normalized mutual information By colour & NMI = 0 NMI = 0.35 Measuring user diversity “How similarly do two users group documents?“  For each query q, consider their groupings gr:   For various queries: aggregate ... and now: the application domain ... that‘s only the 1st step! Workflow 1. 2. 3. 4. Query Automatic clustering Manual regrouping Re-use 1. Learn + present way(s) of grouping 2. Transfer the constructed concepts Concepts  Extension – the instances in a group  Intension – Ideally: “squares vs. circles“ – Pragmatically: defined via a classifier Step 1: Retrieve CiteseerX via OAI  Output: set of  – document IDs, – document details – their texts Step 2: Cluster “the classic bibliometric solution“  CiteseerCluster:  – Similarity measure: co-citation, bibliometric coupling, word or LSA similarity, combinations – Clustering algorithm: k-means, hierarchical Damilicious: phrases  Lingo  How to choose the “best“?  – Experiments: Lingo better than k-means at reconstruction and extension-over-time Step 3 (a): Re-organise & work on document groups Step 3 (b): Visualising document groups Steps 4+5: Re-use  Basic idea: 1. learn a classifier from the final grouping (Lingo phrases) 2. apply the classifier to a new search result  “re-use semantics“  Whose grouping? – One‘s own – Somebody else‘s  Which search result? – – – – “ the same“ (same query, structuring by somebody else) “ More of the same“ (same query, later time  more doc.s) “ related“ (... Measured how? ...) arbitrary Visualising user diversity (1) Simulated users with different strategies  U0: did not change anything (“System“)  U1: tried produce a better fit of the document groups to the cluster intensions; 5 regroupings  U2: attempted to move everything that did not fit well into the remainder group “Other topics”, & better fit; 10 regroupings  U3: attempted to move everything from „Other topics“ into matching real groups; 5 regroupings  U4: regrouping by author and institution; 5 regroupings  5*5 matrix of diversities gdiv(A,B,q)  multidimensional scaling Visualising user diversity (2) Web mining Data mining RFID aggregated using gdiv(A,B) Evaluating the application  Clustering only: Does it generate meaningful document groups? – yes (tradition in bibliometrics) – but: data? – Small expert evaluation of CiteseerCluster  Clustering & regrouping – End-user experiment with CiteseerCluster – 5-person formative user study of Damilicious Summary and (some) open questions  Damilicious: a tool that helps users in sense-making, exploring diversity, and re-using semantics  diversity measures when queries and result sets are different? how to best present of diversity?  – How to integrate into an environment supporting user and community contexts (e.g., Niederée et al. 2005)?    Incentives to use the functionalities? how to find the best balance between similarity and diversity? which measures of grouping diversity are most meaningful? – Extensional? – Intensional? Structure-based? Hybrid? (cf. ontology matching)  which other sources of user diversity? Thanks!

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Data mining, interactive semantic structuring, and