* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Mining Multidimensional Data Using Constraint Frequent
Survey
Document related concepts
Transcript
International Journal of Computer Trends and Technology (IJCTT) – volume 13 number 2 – Jul 2014 Mining Multidimensional Data Using Constraint Frequent Pattern in Medical Dataset B.Gayathri #1, Dr.E.Ramaraj*2 # M.Phil Scholar, *Professor #,* Computer Science and Engineering, Alagappa University, Karaikudi, Tamil Nadu, India Abstract: Frequent pattern mining is an active field in DataMining research in last decade. Many literatures has been emerged and tremendous programs has been made. This paper has been designed to find out the best mining method for the patient with sugar, heart diseases by defining certain conditions. Recent analysis has delved into these techniques exploitation approaches like hybrid data processing algorithms. This paper proposes a rule based model to check the accuracies of applying rules to the individual records. This paper also analysis the results generated by Apriori, Frequent Pattern growth algorithms. Keywords: DataMining, Condition, Frequent pattern. Introduction: Since introduction of Apriori algorithm, Frequent pattern plays an important role in data mining research programs. Apriori algorithm applies an iterative approach in which KItemset are used to produced K+1 Itemset. To do so Apriori algorithm needs several database scans which increase searching space. To reduce such inconvenience an extended tree structure called Frequent pattern tree and an algorithm for frequent pattern extraction known as frequent pattern growth were developed. This algorithm reduces the number of database scans from k – steps into two steps. In the first scan the frequency of items existed are calculated and frequent items in length of one are extracted. During the second scan the frequent items existed in each transaction are sorted in descending order of frequency and inserted into the frequent pattern tree [3]. This clustered frequent pattern algorithm separates the transactional occurrence and data into two attributes. database. The frequency of 1-itemset is used to find the item set in 2-itemset and so on until there are no more k-item sets. Frequent Pattern – Growth Algorithm Frequent pattern algorithm is an efficient algorithm for producing the frequent item sets without generating candidate item sets. This algorithm is build based on divide and conquer method. It needs only 2 database scans. In first scan, It picks the item sets which satisfies the minimum support threshold and arrange them in a sorted order. This list is called header list. In second step it constructs a frequent pattern tree which is known as compact tree. These item are according to the frequency in the sorted list. When an item inserted in the list is existed in same order then increment the counter by 1 and then mention it with comma. All the links were maintained using a pointer and its entry in the header list. Mining Multidimensional Data Using Constraint Frequent Pattern Algorithm: This paper has made a research towards mining multidimensional data stored in database. This paper handles the information related with the sugar and heart disease patients. Here the predicate values were set earlier based on cholesterol, blood sugar, blood pressure, age, heart beat range values. The records which matches the predicates which created earlier were extracted from the database. From these records, final results were generated by clustering the records retrieved into two main categories. In this paper those categories were male and female gender. A new algorithm is created to made this cluster by using Frequent Pattren growth algorithm. Methods used for mining multidimensional data Algorithm For Mining Multidimensional Data: Data mining technique uses two main algorithms for mining data in database. They were Apriori algorithm and FPGrowth(Frequent pattern Growth) algorithm. The following steps were involved in newly created algorithm. Input: A transaction database DB and a minimum support threshold ξ. Output: FP-tree, the frequent-pattern tree of DB. Method: The FP-tree is constructed as follows. 1. Scan the transaction database DB once. Collect F( the set of frequent items), and the support of each frequent item. 2. Sort F in support-descending order as FList(The list of frequent items). 3. Create the root of an FP-tree(Frequent Pattern - Tree), T (Root), and label it as “null”. For each transaction in DB do the following. Apriori Algorithm Apriori algorithm is very simple to execute and very easy to mine all frequent itemset in database. This algorithm make many searches in database to find frequent itemset where kitemset are used to generate k+1 itemset. Each k-itemset must be greater than or equal to minimum support value. In first step, this algorithm scan database to find frequency of 1itemsets that contain only one item by counting each item in ISSN: 2231-2803 http://www.ijcttjournal.org Page92 International Journal of Computer Trends and Technology (IJCTT) – volume X Issue Y–Month 2014 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. Select the frequent items in transaction and sort them according to the order of FList. Let the sorted frequent item list in transaction be sorted in a variable p(p|P),where p is the first element and P is the remaining list. Now create a tree T. If the tree T has a child ie., there is an item in the variable P then it is treated as a child and denoted as N. If N.item=p.item then increment N’s count by 1,else create a new node N with its count initialized by 1 and its parent link linked to T, and its node-link linked to the nodes with the same item-name via the node-link structure. If P is nonempty, then do the above procedure recursively. Now certain records were retrived. Then again set a constraint,( for this paper I created a constraint based on gender) Set a minimum support value(β) and cluster the records. Now retrieve the record which satisfies the condition and threshold value set for β.Then follow the steps 8 and 9 repeatedly to get the final records needed. Finally the expected records alone retrieved which contains gender with male and female. Other fake records were removed from the final result. Work Flow Of Constraint Frequent Pattern Algorithm: Data Set Set Constraints To Cluster Dataset A2 A1 An Records Retrieved Work Flow Of Existing Frequent Pattern Growth Algorithm: Data Set A 1 B A B A B Set Constraints To Cluster Dataset Output ……….. A1 A2 An Figure 2: Constraint frequent Pattern Tree Output Figure 1: Existing Frequent Pattern Tree Structure In existing Frequent Pattern algorithm, single constraint was set and records were retrieved. ISSN: 2231-5381 In newly created Constraint Frequent Pattern algorithm, It just retrieves the records from the database depends upon the constraint set earlier. And based upon the resultant records, second condition is also set and retrieves the exact records which are needed. ……….. So that it reduces the memory space which were occupied by the fake records. Data Extraction: The multidimensional dataset contains the details of patients like their id ,name, sex, cholesterol, blood sugar, blood pressure, heart beat range etc, There are totally 72 attributes http://www.ijcttjournal.org Page 93 International Journal of Computer Trends and Technology (IJCTT) – volume X Issue Y–Month 2014 were used and one thousand records were taken for analysis. All these values from dataset were extracted as data and stored in database for future need. The information about the data is the name of the dataset, type of the dataset, number of rows and columns and the total number of data present in the dataset. After the data is loaded, the records related to the predicates were extracted from the database. Generating Results Using Constraint Frequent Pattern Algorithm: Finally the modified Frequent Pattern algorithm which again cluster the result from existing Frequent Pattern algorithm’s generated results. This algorithm cluster the result using gender attribute. Clustering Data: Once data were extracted from database they were going to be clustered based on condition defined earlier. According to this paper the prediction is based on age, heart beat range, blood pressure range, and blood sugar range. The records which match the criteria defined were separately extracted and stored separately for future need. This paper also compares the final results generated by constraint Frequent Pattern algorithm with the results generated by Apriori and Frequent Pattern Growth algorithm. They were as follows, Generating Results using Apriori Algorithm: The following results are generated using Apriori algorithm by the prediction of cholesterol sequence is from 1. The prediction yields the cholesterol level of the patient and their corresponding id. Conclusion: The proposed work is used to reduce unwanted records for final calculation and reduce the work of the user. It retrieves the records which are effective. This paper proposes a comparative study of multiple results. References: [1] Generating Results With Frequent Pattern Algorithm: The following chart represents the number of records returned by using frequent pattern growth algorithm. ISSN: 2231-5381 Rui Xu, Student Member, IEEE and Donald Wunsch II, Fellow, IEEE “Survey of Clustering Algorithms” IEEE Transactions On Neural Networks, Vol. 16, No. 3, May 2005 [2] Vikas Gupta, Prof. Devanand” A survey on Data Mining: Tools, Techniques, Applications, Trends and Issues” International Journal of Scientific & Engineering Research Volume 4, Issue3, March-2013 1ISSN 2229-5518 [3] Prof. K. Vijayalakshmi M.C.A., M.Phil., “ Survey Of Data Mining In Socio-Academic Perspective” International Journal Of Scientific & Technology Research Volume 2, Issue 9, September 2013 Issn 22778616 [4] KHALID RAZA Centre for Theoretical Physics, Jamia Millia Islamia, New Delhi-110025, India” APPLICATION OF DATA MINING IN BIOINFORMATICS” Khalid Raza / Indian Journal of Computer Science and Engineering Vol 1 No 2, 114-118 [5] Wendy Foslien, Valerie Guralnik, Karen Zita Haigh Honeywell Laboratories, 3660 Technology Drive, Minneapolis, MN 55418” Data Mining For Space Applications” SpaceOps 2004 - Conference [6] Data Mining Concepts and Techniques – Jiawei Han & Micheline Kamber [7] Nan Jiang and Le Gruenwald The University of Oklahoma, School of Computer Science, Norman, OK 73019, USA Email: {nan_jiang, ggruenwald} @ou.edu “Research Issues in Data Stream Association Rule Mining” [8] Dr. Mohd Maqsood Ali1 Asst. Professor and Head of Marketing Department, Jazan Community College, Jazan University, Jazan Kingdom of Saudi Arabia 1 [email protected]” Role Of Data Mining In Education Sector” International Journal Of Computer Science And Mobile Computing [9] M H Dunham, "Data Mining: Introductory and AdvancedTopics," Prentice Hall, 2002. [10] en.wikipedia.org/Data_mining http://www.ijcttjournal.org Page 94