Download Mining Multidimensional Data Using Constraint Frequent

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nearest-neighbor chain algorithm wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

K-means clustering wikipedia , lookup

Transcript
International Journal of Computer Trends and Technology (IJCTT) – volume 13 number 2 – Jul 2014
Mining Multidimensional Data Using
Constraint Frequent Pattern in Medical
Dataset
B.Gayathri #1, Dr.E.Ramaraj*2
#
M.Phil Scholar, *Professor
#,*
Computer Science and Engineering, Alagappa University, Karaikudi, Tamil Nadu, India
Abstract:
Frequent pattern mining is an active field in DataMining research in
last decade. Many literatures has been emerged and tremendous
programs has been made. This paper has been designed to find out
the best mining method for the patient with sugar, heart diseases by
defining certain conditions. Recent analysis has delved into these
techniques exploitation approaches like hybrid data processing
algorithms. This paper proposes a rule based model to check the
accuracies of applying rules to the individual records. This paper also
analysis the results generated by Apriori, Frequent Pattern growth
algorithms.
Keywords: DataMining, Condition, Frequent pattern.
Introduction:
Since introduction of Apriori algorithm, Frequent pattern
plays an important role in data mining research programs.
Apriori algorithm applies an iterative approach in which KItemset are used to produced K+1 Itemset. To do so Apriori
algorithm needs several database scans which increase
searching space. To reduce such inconvenience an extended
tree structure called Frequent pattern tree and an algorithm for
frequent pattern extraction known as frequent pattern growth
were developed. This algorithm reduces the number of
database scans from k – steps into two steps. In the first scan
the frequency of items existed are calculated and frequent
items in length of one are extracted. During the second scan
the frequent items existed in each transaction are sorted in
descending order of frequency and inserted into the frequent
pattern tree [3].
This clustered frequent pattern algorithm separates the
transactional occurrence and data into two attributes.
database. The frequency of 1-itemset is used to find the item
set in 2-itemset and so on until there are no more k-item sets.
Frequent Pattern – Growth Algorithm
Frequent pattern algorithm is an efficient algorithm for
producing the frequent item sets without generating candidate
item sets. This algorithm is build based on divide and conquer
method. It needs only 2 database scans. In first scan, It picks
the item sets which satisfies the minimum support threshold
and arrange them in a sorted order. This list is called header
list. In second step it constructs a frequent pattern tree which
is known as compact tree. These item are according to the
frequency in the sorted list. When an item inserted in the list is
existed in same order then increment the counter by 1 and
then mention it with comma. All the links were maintained
using a pointer and its entry in the header list.
Mining Multidimensional Data Using Constraint Frequent
Pattern Algorithm:
This paper has made a research towards
mining
multidimensional data stored in database. This paper handles
the information related with the sugar and heart disease
patients. Here the predicate values were set earlier based on
cholesterol, blood sugar, blood pressure, age, heart beat range
values. The records which matches the predicates which
created earlier were extracted from the database. From these
records, final results were generated by clustering the records
retrieved into two main categories. In this paper those
categories were male and female gender. A new algorithm is
created to made this cluster by using Frequent Pattren growth
algorithm.
Methods used for mining multidimensional data
Algorithm For Mining Multidimensional Data:
Data mining technique uses two main algorithms for mining
data in database. They were Apriori algorithm and FPGrowth(Frequent pattern Growth) algorithm.
The following steps were involved in newly created algorithm.
Input: A transaction database DB and a minimum support
threshold ξ.
Output: FP-tree, the frequent-pattern tree of DB.
Method: The FP-tree is constructed as follows.
1. Scan the transaction database DB once. Collect F( the set
of frequent items), and the support of each frequent item.
2. Sort F in support-descending order as FList(The list of
frequent items).
3. Create the root of an FP-tree(Frequent Pattern - Tree), T
(Root), and label it as “null”.
For each transaction in DB do the following.
Apriori Algorithm
Apriori algorithm is very simple to execute and very easy to
mine all frequent itemset in database. This algorithm make
many searches in database to find frequent itemset where kitemset are used to generate k+1 itemset. Each k-itemset must
be greater than or equal to minimum support value. In first
step, this algorithm scan database to find frequency of 1itemsets that contain only one item by counting each item in
ISSN: 2231-2803
http://www.ijcttjournal.org
Page92
International Journal of Computer Trends and Technology (IJCTT) – volume X Issue Y–Month 2014
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
Select the frequent items in transaction and sort them
according to the order of FList.
Let the sorted frequent item list in transaction be sorted in
a variable p(p|P),where p is the first element and P is the
remaining list.
Now create a tree T.
If the tree T has a child ie., there is an item in the variable
P then it is treated as a child and denoted as N.
If N.item=p.item then increment N’s count by 1,else
create a new node N with its count initialized by 1 and its
parent link linked to T, and its node-link linked to the
nodes with the same item-name via the node-link
structure.
If P is nonempty, then do the above procedure
recursively.
Now certain records were retrived.
Then again set a constraint,( for this paper I created a
constraint based on gender)
Set a minimum support value(β) and cluster the records.
Now retrieve the record which satisfies the condition and
threshold value set for β.Then follow the steps 8 and 9
repeatedly to get the final records needed.
Finally the expected records alone retrieved which
contains gender with male and female. Other fake
records were removed from the final result.
Work Flow Of Constraint Frequent Pattern
Algorithm:
Data Set
Set Constraints To
Cluster Dataset
A2
A1
An
Records Retrieved
Work Flow Of Existing Frequent Pattern Growth
Algorithm:
Data Set
A
1
B
A
B
A
B
Set Constraints
To Cluster
Dataset
Output
………..
A1
A2
An
Figure 2: Constraint frequent Pattern Tree
Output
Figure 1: Existing Frequent Pattern Tree Structure
In existing Frequent Pattern algorithm, single constraint was
set and records were retrieved.
ISSN: 2231-5381
In newly created Constraint Frequent Pattern algorithm,
It just retrieves the records from the database depends upon
the constraint set earlier.
And based upon the resultant records, second condition is also
set and retrieves the exact records which are needed.
………..
So that it reduces the memory space which were occupied by
the fake records.
Data Extraction:
The multidimensional dataset contains the details of patients
like their id ,name, sex, cholesterol, blood sugar, blood
pressure, heart beat range etc, There are totally 72 attributes
http://www.ijcttjournal.org
Page 93
International Journal of Computer Trends and Technology (IJCTT) – volume X Issue Y–Month 2014
were used and one thousand records were taken for analysis.
All these values from dataset were extracted as data and stored
in database for future need. The information about the data is
the name of the dataset, type of the dataset, number of rows
and columns and the total number of data present in the
dataset. After the data is loaded, the records related to the
predicates were extracted from the database.
Generating Results Using Constraint Frequent Pattern
Algorithm:
Finally the modified Frequent Pattern algorithm which again
cluster the result from existing Frequent Pattern algorithm’s
generated results. This algorithm cluster the result using
gender attribute.
Clustering Data:
Once data were extracted from database they were going to be
clustered based on condition defined earlier. According to this
paper the prediction is based on age, heart beat range, blood
pressure range, and blood sugar range. The records which
match the criteria defined were separately extracted and stored
separately for future need.
This paper also compares the final results generated by
constraint Frequent Pattern algorithm with the results
generated by Apriori and Frequent Pattern Growth algorithm.
They were as follows,
Generating Results using Apriori Algorithm:
The following results are generated using Apriori algorithm
by the prediction of cholesterol sequence is from 1.
The prediction yields the cholesterol level of the patient and
their corresponding id.
Conclusion:
The proposed work is used to reduce unwanted records for
final calculation and reduce the work of the user. It retrieves
the records which are effective. This paper proposes a
comparative study of multiple results.
References:
[1]
Generating Results With Frequent Pattern Algorithm:
The following chart represents the number of records returned
by using frequent pattern growth algorithm.
ISSN: 2231-5381
Rui Xu, Student Member, IEEE and Donald Wunsch II, Fellow, IEEE
“Survey of Clustering Algorithms” IEEE Transactions On
Neural
Networks, Vol. 16, No. 3, May 2005
[2] Vikas Gupta, Prof. Devanand” A survey on Data Mining: Tools,
Techniques, Applications, Trends and Issues” International Journal of
Scientific & Engineering Research Volume 4, Issue3, March-2013
1ISSN 2229-5518
[3] Prof. K. Vijayalakshmi M.C.A., M.Phil., “ Survey Of Data Mining In
Socio-Academic Perspective” International Journal Of Scientific &
Technology Research Volume 2, Issue 9, September 2013 Issn 22778616
[4] KHALID RAZA Centre for Theoretical Physics, Jamia Millia Islamia,
New Delhi-110025, India” APPLICATION OF DATA MINING IN
BIOINFORMATICS” Khalid Raza / Indian Journal of Computer
Science and Engineering Vol 1 No 2, 114-118
[5] Wendy Foslien, Valerie Guralnik, Karen Zita Haigh Honeywell
Laboratories, 3660 Technology Drive, Minneapolis, MN 55418” Data
Mining For Space Applications” SpaceOps 2004 - Conference
[6] Data Mining Concepts and Techniques – Jiawei Han & Micheline
Kamber
[7] Nan Jiang and Le Gruenwald The University of Oklahoma, School of
Computer Science, Norman, OK 73019, USA Email: {nan_jiang,
ggruenwald} @ou.edu “Research Issues in Data Stream Association
Rule Mining”
[8] Dr. Mohd Maqsood Ali1 Asst. Professor and Head of Marketing
Department, Jazan Community College, Jazan University, Jazan
Kingdom of Saudi Arabia 1 [email protected]” Role Of Data
Mining In Education Sector” International Journal Of Computer
Science And Mobile Computing
[9] M H Dunham, "Data Mining: Introductory and AdvancedTopics,"
Prentice Hall, 2002.
[10] en.wikipedia.org/Data_mining
http://www.ijcttjournal.org
Page 94