Download Bibliomining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
書目探勘
Bibliomining
圖書推薦應用
1/47
2/47
3/47
4/47
Patterns of Behaviors (1/2)
Individuals
5/47
Patterns of Behaviors (2/2)
Groups
6/47
Library Decisions
• Quantity v.s Quality
– Intuitive and scientific
• Flexible and systematic
• To help librarians make decisions
– Garbage-in Garbage-out
• Data warehouse
• Data mining
7/47
Data Warehouse(1/3)
• What is data warehouse?
– A subject-oriented, integrated, time-variant,
and nonvolatile collection of data in support of
management’s decision making process
– A construct integrating multiple data sources
– A database isolated from transaction
databases
8/47
Data Warehouse (2/3)
Time Key
day
Cat. Key
month
First
Time Key
Semester
Second
Col. Key
year
Dimension table
Cat. Key
Book Sum
Fact table
Multidimensional database
(data cube)
Dimension table
Col. key
Depart.
9/47
Data Warehouse(3/3)
• OLAP (On-Line Analytical Processing)
category
quarter
quarter
Slice
College: Management college
Category: all
Year: all (2003~2006)
Dice
college
Roll-up
year
category
Drill-down
department
College: Management college
Category: 800
Quarter: 1
college
category
10/47
Bibliomining (1)
• What is Bibliomining?
– 書目探勘
• Data mining for libraries
– Scott (2003)
• the application of statistical and pattern-recognition
tools to large amounts of data associated with library
systems in order to aid decision-making or justify
services
• the combination of data mining, bibliometrics,
statistics, and reporting tools used to extract
patterns of behavior-based artifacts from library
systems.
11/47
Bibliomining (2)
• The Bibliomining Process
– Determining areas of focus (1/6)
• Directed data mining
– Problem-focused
» Predict the chance patrons will return the materials once it is
one week late in order to prioritize calling lists
• Undirected data mining
– Better idea of a general topical area
» How are different departments and types of patrons using the
electronic journals
– Identifying internal and external data sources (2/6)
• Garbage-in garbage-out
• Internal data (in library system)
– Patrons database, Web server logs, circulation database
• External data
12/47
Bibliomining (3)
– Collecting, cleaning and anonymizing data
into a data warehouse (3/6)
• Integrating different data sources
• Protecting patrons privacy
– During extraction and cleaning process
– Save needed demographic information
• Building the data warehouse
Circulation Records
Book ID
Subject
Patron ID
QA76.9
Computer Science
392-33
PS159.G8
American Literature
575-49
Book ID
Subject
HF5415.125
Marketing
392-33
QA76.9
Computer Science
Ugrad
Math
PS159.G8
American Literature
Faculty
English
HF5415.125
Marketing
Ugrad
Math
Data Warehouse
Patrons Database
Patron ID
Name
Class
Dept.
373-34
John
Grad
Psych
392-33
James
Ugrad
Math
575-49
Richards
Faculty
English
Patron Class Patron Dept.
13/47
Bibliomining (4)
– Selecting the appropriate analysis tools (4/6)
• MIS, DSS, OLAP, Data mining package
–
–
–
–
SQL server enterprise
SPSS Clementine
IBM DBminer
Weka (free data mining tool)
» http://www.cs.waikato.ac.nz/ml/weka/
• What is the appropriate method?
– Discovering the patterns and creating reports (5/6)
– Evaluating and implementing the results (6/6)
• Sample data to build the model
• Test data to validate the model
– Librarians examine the models
» Familiar with the library context
14/47
Bibliomining (5)
• Critical Problems of Librarians
– Doing what?
• Specific areas?
• Decision makings?
– What data sources are required?
• Who knows what?
• In place?
– Cleverest housewife can't cook a meal without rice.
– Nobody can accomplish anything without the necessary means.
– One cannot make a silk purse out of a sow's ear.
– Advanced database techniques?
• Database literature?
• Technique-free or transparent?
15/47
Bibliomining (6)
• Bibliomining User Behaviors (High School Case)
讀者記錄
讀者代號
學號
姓名
班級
性別
Fact Table
借閱彙整
流水號
讀者代號
借閱次數
時間
Dimension Table
學業表現紀錄
學號
科目代號
學業成績
授課教師代號
(教務處資料庫)
Dimension Table
借閱記錄
流水號
書本號
讀者代號
借閱時間
Dimension Table
特殊表現紀錄
學號
居住地區
特殊表現
(學務處資料庫)
Dimension Table
館藏記錄
書本號
索書號
書名
作者
出版社
出版年代
到館日期
Dimension Table
讀者特性
16/47
Bibliomining (7)
– Library Operation Management
• Peak service hours (clustering)
– manpower scheduling
• Borrowing preference (clustering)
– 3 class in morning hours
– 7 and 9 classes in afternoon hours
– Promotion activities
• Books recommendation
– For different reading groups
» Grades(年級), accomplishments(成績), levels(組)
17/47
Bibliomining (8)
– Collection Management
• Association of book classes selected
– 8 class then 3 class for the specific reading group
• Courses have effects on students’ borrowing
behaviors
– Coordinated teaching (team teaching) with librarians
– Personalization
• Preference, trend, associations from personal
borrowing records
18/47
Bibliomining (9)
19/47
Bibliomining (10)
• University or Public Libraries
– Library Operation Management
– Collection Management
– Personalization
– Budget Allocation
– Digital Resources Management
– ……
20/47
Privacy (1)
• What happens?
– Patrons’ Privacy
• The leak of personal data
• Data mining gets the privacy violation
• data using amiss
– Security
• Patrons’ data protection
– No circulation data kept ?!
– Other choice ?!
21/47
Privacy (2)
• Privacy violation example
Data mining tools
Past circulation data
(non-or-late return)
3 Grade
Male
Prefer 3 and 5 class
Education College
If borrowing over 5 books this time
Data mining results
(association rule, class, cluster)
In library use only !!
22/47
Privacy (3)
• Resolving strategies
– What principles applied?
• OECD (Organisation for Economic Co-operation and Development)
– Guidelines on the protection of the privacy and transborder
flows of personal data
» Collection limitation principle
» Data quality principle
» Purpose specification principle
» Use limitation principle
» Security safeguards principle
» Openness principle
» Individual participation principle
» Accountability principle
23/47
Privacy (4)
– Management solution
• Statements
• Process re-engineering
• Education
24/47
Privacy (5)
• Privacy Literacy
– Information ethics (PAPA)
•
•
•
•
Privacy
Accuracy
Property
Accessibility
– Another big story!!
25/47
Are You Ready for Bibliomining? (1)
• Bibliomining is a management issue
– What purpose?
• Another solution?!
– What information is required?
• Make sure?!
– What data sources are needed to generate
the information?
• Enough?!
– Are the data sources ready?
26/47
Are You Ready for Bibliomining? (2)
• What database and tools in place?
– Database literacy ready?
– Tools are OK?
• What methods are appropriate?
– Classification, clustering, association rules, …
• Are there library specialists to validate the
results?
– Who confirm and check the results?
• Next ?!
27/47
Reference
• Bibliomining information center
– http://www.bibliomining.com/
• 書目探勘
– 文華圖書館管理資訊
28/47