Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining Chapter 2_1: Data Preparation and Preprocessing Case Study 2013 Prepared by: Mahmoud Rafeek Al-Farra www.cst.ps/staff/mfarra Course’s Out Lines 2 Introduction Data Preparation and Preprocessing Association Rules Classification Methods Evaluation Clustering Methods Mid Exam Knowledge Representation Special Case study : Document clustering Discussion of Case studies by students Consider the following instances 3 The documents before preprocessing are the following: Document 1: Document 2: Palestine freedom requires all Muslims. All Muslims must pray five times every day. Palestinians and Muslims are persecuted by United Nations. Freedom for Palestine. Palestine is a holy land for all Muslims. The legal right of Palestine for Muslims. I am proud to be Muslim. Document 3: Support our legal rights to Palestine. I am proud to be from Palestine. After the preprocessing 4 After passing them on the preprocessing steps many words will be removed (ex. Our, to, am, the, five and so on) Others will be stemmed to their roots (ex. Muslims is stemmed to Muslim and persecuted to persecute and so on). After the preprocessing 5 Now, after the preprocessing steps the three documents will be as the follows: Document 1: Document 2: Palestin freedom requir all Muslim. All Muslim pray. Palestin Muslim persecut unit nation. Freedom Palestin. Palestin holy land all Muslim. Legal right Palestin Muslim. Proud Muslim. Document 3: Support legal right Palestin. Proud Palestin. Then … representation 6 One of Possible ways item1 item2 item3 item4 Doc1 0 1 1 1 Doc2 1 1 1 1 Doc3 1 1 0 0 Doc4 0 1 1 0 Then our application uses each document as a vector Thanks 7