Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
How to teach LOD? Bettina Berendt Dept. Computer Science KU Leuven 1 Who am I ? Privacy, Discrimination 2 Research: One persisting question 3 Research: One specific question – How do blogs and tweets spread, change, create news? 5 Workshop series (with, a.o., Markus L.-R.) […] synergy between semantics and semantic-web technology on the one hand, and the analysis and mining of usage data on the other hand. […] First, semantics can be used to enhance the analysis of usage data. Second, usage data analysis can enhance semantic resources as well as Semantic Web applications; traces of users can be used to evaluate, adapt or personalize Semantic Web applications. The emerging Web of Data demands a re-evaluation of existing evaluation techniques: the Linked Data community is recognizing that it needs to move beyond triple counts because real value of Web data needs to be measured by real use. 6 Another persisting question: Data Mining for Information Literacy 8 Data Mining for Information Literacy: How? 9 “Knowledge and the Web“ course: Curricular context Based on experiences in HU Master specialisation „Wirtschaftsinformatik“ • 2007-2012: KUL Master specialisation „Databases“ • Students: mostly Computer Science students NOT specialising in databases 2013+: KUL Master specialisation „Artificial Intelligence“ (+ Master AI) • Students: Wirtschaftsinformatik, Computer Science, miscellaneous Students: we‘ll see! 6 ECTS Student numbers over the years: between ~ 6 and ~ 20 10 … & a big thanks to the teaching assistants! Ilija Subašić Thomas Peetz 11 Concept of the course 3 blocks: • • • Web data, integrating Web data Mining Web data Applications and implications Lecture + exercise session + mini-workshop at the end One invited talk Evaluation based on homeworks • Progression from „exercise“ to „self-defined project“ 12 Lecture 2012 Lecture 2013 & more depth about the other topics 13 Homeworks 1. Modelling 2. Populating 3. Integrating 4. (optional) Data mining basics 5. (a non-graded exercise) Reporting on data mining projects / Reading data mining papers 6. Your own project 14 Semantic Web / LOD intro topics The Semantic Web: Motivation and overview Very brief recap of XML (& why it’s not semantic) RDF and RDFS OWL and ontologies Linked (Open) Data (LOD) Storing, accessing and combining SW data 15 Inference topics Introduction / motivation; kinds of reasoning Properties of Properties (cf. the Pizza Tutorial) Class descriptions, cardinality, & value constraints Does this type of knowledge exist in LOD? Common problems in using OWL reasoning 16 Schema / ontology matching topics Core ideas of federated databases The match problem & what info to use for matching (Semi-)automated matching: Example CUPID (Semi-)automated matching: Example iMAP Ontology matching, with Example BLOOMS Evaluating matching Involving the user: Explanations; mass collaboration 17 Identity, inconsistency, provenance topics Introduction: The promise and risks of openness Identity crises: owl:sameAs Inconsistencies and provenance 18 Privacy topics: (1) Preparatory questions What does privacy mean for you concretely? Can you remember situations where it was important for you to show yourself in a different way than you are? Do you expect such situations in the future? Privacy also involves the possibility of lying. Is this possibility a right? Give concrete examples and discuss them. Think of a case where someone would want to not disclose some information and where you would think "this is not right". Does this person claim their privacy? Would your desired outcome be a privacy violation? Who do you think should be watched most closely when it comes to handling personal information: the government? companies? anyone else? why? So what does privacy mean for databases and data mining? What problems would you like to see addressed? (Questions from/inspired by Martens, B., Dierick, G., & Noot, W. (2008). Ethiek en weerbarheid in de informatiesamenleving, Uitgeverij LannooCampus, Leuven & Academic Service, Den Haag, p. 75) 19 Privacy topics (2): Lecture agenda Three types of privacy … and how the law respects them Societal conventions that allow for secrecy Surveillance, democracy, and … Whose privacy? and: when privacy is traded off against other goods “Data aggregation and record linkage” Trackers and anti-trackers 20 Homework 1: Modelling 21 Homework 2: populating 22 Homework 3: integrating 23 Homework 4 (optional): Basic data mining 24 Reading exercise (1) (from Justin Zobel: Writing for Computer Science) 25 Reading exercise (2) 4. Now consider the guidelines for structuring a data-mining exercise from the CRISP-DM model and manual. A good description of a data-mining project will contain sections on each of the main phases in CRISP-DM. 5. Please identify and highlight passages in the paper you have read that correspond to those phases. 26 Homework 6: Your own project (1) This homework is your final project for this course. It will take you through much of what you learned throughout the semester, and result in a small yet genuine data mining project. With the proposal you have sent us, and the feedback you got during the discussion, you by now have a clear idea of what you will be doing. If you run into any problems that you cannot solve in a reasonable amount of time, please contact us as soon as possible. This homework consists of two parts. 27 Homework 6: Your own project (2) Sharing your data The first three homework sets [… a reminder of what this was …] Any scientific work is only as good as its reproducibility. If you report the results of data mining without disclosing the data used, you are asking the reader for blind faith. In order to make data mining meaningful, its data sources must be available for followup work. Your first task is to do precisely this. Describe the ontology that you have built and specifically the subset of it that you are using for this project. in terms of its purpose, its schema, and basic statistics about its entities. Important questions include the following. Note: You may copy the answers to these texts from your previous homework sets if they are there already. We mention them again here so you can critically check and, if applicable, extend what you have written earlier. Where exactly did the data originate from? Are there any problems with these sources? (Example: Do the creators of the source follow a political agenda, only listing Muslims as terrorists? Are you even allowed to redistribute the data?) What is the overall schema of the ontology? How did you map and match the ontologies/schemas you found? Which strategies did you use? Which problems arose? Which attributes are guaranteed to exist for members of the most important classes? Which attributes may exist, but are not always present? How many individuals do have these attributes? Which decisions did you make for selecting the subset of data you are working with for Homework 6 from the “full” ontology you built? For example, did you select classes, instances, attributes? Did you aggregate attributes? If so, how? 28 […] Homework 6: Your own project (3) Data mining In the second step, you perform the project that you prepared so far. A good report will include the following: A very clear description of the research question you seek to answer. A good motivation for this research question. A critical review of the data, especially if you can expect it to contain the answer to your research question. A precise description of your experiments and their validation, with a motivation for the chosen setup. The reader must be able to obtain the exact same result, so they need to know every single parameter. A discussion of the results, given the data review and the experiments discussion. A conclusion that gives an answer to the research question. A list of things you would have liked to do, but didn’t due to time constraints. Do not forget to carefully evaluate the results of the experiments, using whatever metric is applicable (significance, confidence, accuracy, precision/recall, etc.) in order to supplement the qualitative assessment of the experiments. 29 Homework 6: Example topics from 2012 (Terrorism) and 2011 (Twitter) Relation between oil and war The relation between politician, his country, and terrorist attacks Predict attack type and victim type for new organizations Converting tweets from mobile speed controls into an historical overview on a map Where should I go on vacation based on recent tweets? Seasonal sentiment analysis in tweets (data sets: Libya, Syria) 30 What‘s good: students … like the course … are surprised participate very actively get hands-on experience are creative! reflect on data and on methods obtain insights • E.g. from goal: predicting who‘s a terrorist to goal: finding correlations between a country‘s military expenditure, level of schooling, and incidence of terrorism 31 What‘s not so good / challenges (1) Prerequisites • • To be able to interpret the results properly, would need o Proper background in statistics 2013+: better given students with more DM background? o Background knowledge about the application area Idea for 2013+: tailor the Invited Talk more closely to the project To be able to make more of Semantic Web reasoning, would need o More background in logics Idea for 2013+: interface more closely with parallel logics course Didactical method: Capacity limitations and „cue-based learning“?! • • • Breadth vs. depth … Practical learning tends to overtake theory learning Difficult to integrate background reading with project (easier for Twitter than for terrorism) 32 What‘s not so good / challenges (2) Die Mühen der Ebene (“the difficulties on the ground“) in data handling and analysis • • • Sparsity of data and lacking empirical regularities are frustrating Preference for mashups vs. Data integration?! Laborious data preparation is boring and time-intensive 33 Outlook: Next possible student-project field? ParlBench An LOD of Dutch parliamentary proceedings (Tarasova & Marx, Proc. USEWOD/BerSys 2013) See also (Juric, Hollink, & Houben, Proc. DeRIVE 2012) OR Use a similar, but not yet semantified, Flemish dataset Dutch language: + and – 34 Outlook: curricular changes in 2013+, 2014+(?) FROM mandatory course in the specialization „Databases“, taken largely by a non-database, heterogeneous audience TO optional course in the specialization „Artificial Intelligence“, presumably taken by a more homogeneous, largely AI audience The 6-ECTS course can also be taken, as a 4-ECTS course, by students in the Master of Artificial Intelligence, with • • the Web mining option (focus on modelling and mining) the Web data fusion option (focus on modelling and integrating) To be supplemented by a data course in the Master Digital Humanities (currently under review) • Chance of joint projects in which expertise can be pooled 35 Outlook: sharing 36 37 Der titel Bla der text • • Dflkjfd o Dsflkjdsf Eraelkj text text Erlajeklj nmnm Text text [Quelle, XXX] 38 Noch ein Titel Jkljklllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllll lllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllll llllllllllllllllllllllllllllllll lllllllllllllll llllll lllll l lllllllll 39