Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
書目探勘 Bibliomining 圖書推薦應用 1/47 2/47 3/47 4/47 Patterns of Behaviors (1/2) Individuals 5/47 Patterns of Behaviors (2/2) Groups 6/47 Library Decisions • Quantity v.s Quality – Intuitive and scientific • Flexible and systematic • To help librarians make decisions – Garbage-in Garbage-out • Data warehouse • Data mining 7/47 Data Warehouse(1/3) • What is data warehouse? – A subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision making process – A construct integrating multiple data sources – A database isolated from transaction databases 8/47 Data Warehouse (2/3) Time Key day Cat. Key month First Time Key Semester Second Col. Key year Dimension table Cat. Key Book Sum Fact table Multidimensional database (data cube) Dimension table Col. key Depart. 9/47 Data Warehouse(3/3) • OLAP (On-Line Analytical Processing) category quarter quarter Slice College: Management college Category: all Year: all (2003~2006) Dice college Roll-up year category Drill-down department College: Management college Category: 800 Quarter: 1 college category 10/47 Bibliomining (1) • What is Bibliomining? – 書目探勘 • Data mining for libraries – Scott (2003) • the application of statistical and pattern-recognition tools to large amounts of data associated with library systems in order to aid decision-making or justify services • the combination of data mining, bibliometrics, statistics, and reporting tools used to extract patterns of behavior-based artifacts from library systems. 11/47 Bibliomining (2) • The Bibliomining Process – Determining areas of focus (1/6) • Directed data mining – Problem-focused » Predict the chance patrons will return the materials once it is one week late in order to prioritize calling lists • Undirected data mining – Better idea of a general topical area » How are different departments and types of patrons using the electronic journals – Identifying internal and external data sources (2/6) • Garbage-in garbage-out • Internal data (in library system) – Patrons database, Web server logs, circulation database • External data 12/47 Bibliomining (3) – Collecting, cleaning and anonymizing data into a data warehouse (3/6) • Integrating different data sources • Protecting patrons privacy – During extraction and cleaning process – Save needed demographic information • Building the data warehouse Circulation Records Book ID Subject Patron ID QA76.9 Computer Science 392-33 PS159.G8 American Literature 575-49 Book ID Subject HF5415.125 Marketing 392-33 QA76.9 Computer Science Ugrad Math PS159.G8 American Literature Faculty English HF5415.125 Marketing Ugrad Math Data Warehouse Patrons Database Patron ID Name Class Dept. 373-34 John Grad Psych 392-33 James Ugrad Math 575-49 Richards Faculty English Patron Class Patron Dept. 13/47 Bibliomining (4) – Selecting the appropriate analysis tools (4/6) • MIS, DSS, OLAP, Data mining package – – – – SQL server enterprise SPSS Clementine IBM DBminer Weka (free data mining tool) » http://www.cs.waikato.ac.nz/ml/weka/ • What is the appropriate method? – Discovering the patterns and creating reports (5/6) – Evaluating and implementing the results (6/6) • Sample data to build the model • Test data to validate the model – Librarians examine the models » Familiar with the library context 14/47 Bibliomining (5) • Critical Problems of Librarians – Doing what? • Specific areas? • Decision makings? – What data sources are required? • Who knows what? • In place? – Cleverest housewife can't cook a meal without rice. – Nobody can accomplish anything without the necessary means. – One cannot make a silk purse out of a sow's ear. – Advanced database techniques? • Database literature? • Technique-free or transparent? 15/47 Bibliomining (6) • Bibliomining User Behaviors (High School Case) 讀者記錄 讀者代號 學號 姓名 班級 性別 Fact Table 借閱彙整 流水號 讀者代號 借閱次數 時間 Dimension Table 學業表現紀錄 學號 科目代號 學業成績 授課教師代號 (教務處資料庫) Dimension Table 借閱記錄 流水號 書本號 讀者代號 借閱時間 Dimension Table 特殊表現紀錄 學號 居住地區 特殊表現 (學務處資料庫) Dimension Table 館藏記錄 書本號 索書號 書名 作者 出版社 出版年代 到館日期 Dimension Table 讀者特性 16/47 Bibliomining (7) – Library Operation Management • Peak service hours (clustering) – manpower scheduling • Borrowing preference (clustering) – 3 class in morning hours – 7 and 9 classes in afternoon hours – Promotion activities • Books recommendation – For different reading groups » Grades(年級), accomplishments(成績), levels(組) 17/47 Bibliomining (8) – Collection Management • Association of book classes selected – 8 class then 3 class for the specific reading group • Courses have effects on students’ borrowing behaviors – Coordinated teaching (team teaching) with librarians – Personalization • Preference, trend, associations from personal borrowing records 18/47 Bibliomining (9) 19/47 Bibliomining (10) • University or Public Libraries – Library Operation Management – Collection Management – Personalization – Budget Allocation – Digital Resources Management – …… 20/47 Privacy (1) • What happens? – Patrons’ Privacy • The leak of personal data • Data mining gets the privacy violation • data using amiss – Security • Patrons’ data protection – No circulation data kept ?! – Other choice ?! 21/47 Privacy (2) • Privacy violation example Data mining tools Past circulation data (non-or-late return) 3 Grade Male Prefer 3 and 5 class Education College If borrowing over 5 books this time Data mining results (association rule, class, cluster) In library use only !! 22/47 Privacy (3) • Resolving strategies – What principles applied? • OECD (Organisation for Economic Co-operation and Development) – Guidelines on the protection of the privacy and transborder flows of personal data » Collection limitation principle » Data quality principle » Purpose specification principle » Use limitation principle » Security safeguards principle » Openness principle » Individual participation principle » Accountability principle 23/47 Privacy (4) – Management solution • Statements • Process re-engineering • Education 24/47 Privacy (5) • Privacy Literacy – Information ethics (PAPA) • • • • Privacy Accuracy Property Accessibility – Another big story!! 25/47 Are You Ready for Bibliomining? (1) • Bibliomining is a management issue – What purpose? • Another solution?! – What information is required? • Make sure?! – What data sources are needed to generate the information? • Enough?! – Are the data sources ready? 26/47 Are You Ready for Bibliomining? (2) • What database and tools in place? – Database literacy ready? – Tools are OK? • What methods are appropriate? – Classification, clustering, association rules, … • Are there library specialists to validate the results? – Who confirm and check the results? • Next ?! 27/47 Reference • Bibliomining information center – http://www.bibliomining.com/ • 書目探勘 – 文華圖書館管理資訊 28/47