Download document 8900397

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
International Journal of Computer Application
Available online on http://www.rspublication.com/ijca/ijca_index.htm
Issue 4, Volume 4 (July-August 2014)
ISSN: 2250-1797
Scheduler and Power Control Performance in a Data Mining
Using Rule Generation Algorithm
R.Kalidasa Krishnaswami 1, Dr. Ashish Chaturvedi2
Research Scholar, Department of Computer Science and Engineering,
Himalayan University, Arunachal Pradesh, India,
Professor and Associate Director, Arni School of Computer Science and Applications,
Arni University, Indora (Kathgarh), Himachal Pradesh, India,
Abstract— Data mining and the discovery knowledge in the data, and integrates
developments concepts from statistics, machine learning, data visualization and database
theory. This fusion of disciplines very different has been motivated (among others) by the
significant increase in data volume all spheres of human activity and in this particular case in
the need for the greater amount of evidence to establish criminal intelligence policies more
adjusted with Based on the data available in the different supports. Both approaches have
proved complementary. While the statistic raises hypotheses to be validated from the
available data, the Data Mining discovers patterns in the available data by interpretation of
domain experts proposed.
Index Terms— Decision tree, Data mining, Association Rules, Data Clustering
I. INTRODUCTION
Data mining is an area of scientific study with high expectations for the community
research, mainly by expectations Transfer to society posed. from over 50 years ago have been
published countless Featured articles in conferences and journals matter. However, it remains
a fertile and promising field with many challenges research. This article has provided a
introduction to knowledge discovery and data mining. Have described the major potential data
mining provides and a list of the main methodologies used. They have also been highlighted
different application domains and the main challenges and research trends. Data mining is a
very important part of modern business. Mining large-scale transactional databases is
considered to be a very important research subject for its obvious commercial potential 11. It
is also a major challenge due to its complexity. New techniques have been developed over the
last decade to solve classification12, prediction13 and recommendation14, 15, problems in
these kinds of databases. To find and evaluate valuable and meaningful patterns requires huge
quantities of data. Each business can produce amounts of data just by recording all customers’
events into the database and it is called data warehousing. Although large quantities of data are
typically generated by dotcom web servers16, 17, 18, 19 by monitoring requested packages
going to web site visitors. This information is logged into special purpose – web log files.
A lot of studies have been conducted in Web usage mining. Some focus on the mining of
association rules and navigation patterns in the user access paths20, 21. A session is viewed as
a transaction in association rule mining and algorithms for association rule mining are
employed to find frequent paths that are followed by many users. Others build data cubes from
R S. Publication (rspublication.com), [email protected]
Page 47
International Journal of Computer Application
Available online on http://www.rspublication.com/ijca/ijca_index.htm
Issue 4, Volume 4 (July-August 2014)
ISSN: 2250-1797
Web server logs for OLAP and data mining 21, 23. The statistics along pages, IP domains,
geographical location of users, and access time are calculated from sessions. Some others
cluster users based on their access patterns24. There is also research on data preparation and
query language for Web usage mining, and Web personalization based on Web usage mining.
Nina et al.,25 suggests a complete idea for the pattern discovery of Web usage mining.
Web site creators must have clear knowledge of user's profile and site intentions and also
emphasized information of the approach users will browse Web site. The creators can examine
the visitor's behavior by means of Web analysis and identify patterns of the visitor's activities.
This Web analysis includes the transformation and interpretation of the Web log records to
identify the hidden data or predictive pattern by the data mining and knowledge discovery
process. This result provides a great view coupled with the Web warehousing.
Fig 1. Data Mining and Web mining Process
To analyze the usage of the Web, Web mining, especially Web usage mining, has been
proposed by many researchers27, 28. Web usage mining is the mining of Web usage data. In
most Web usage mining studies, Web server logs are used as the primary data source,
although client and proxy level logs may be used. A Web server log collects a large amount of
information about user activities on the Web site by keeping information about the requests of
pages on the server.
Personalized Web page recommendation is strictly restricted by the nature of web logs, the
intrinsic complexity of the problem and the higher efficiency needs. When handled by existing
Web usage mining methods, because of the existence of an large number of meaningful
clusters and profiles for visitors of a usually highly rated Website, the model-based or
distance-based techniques are likely to create very strong and simple assumptions or, on the
other hand, to turn out to be highly complex and slow. The author designed a heuristic
majority intelligence technique, which effortlessly adjusts to changing navigational patterns;
with the low cost explicitly individuate them ahead of navigation.
II. EXPLORATORY ANALYSIS STATISTICAL PROBLEMS
In the exploratory analysis has shown the presence of all the problems discussed above. All
tests conducted on the pattern of lost data reject the hypothesis of randomness, resulting
therefore not ignorable that pattern. This evidence, along with another set of circumstances
(data mean and anomalous dispersion and very high correlations (7) each other and with other
ratios), it has been decided we remove the three variables affected analysis. In addition, we
have implemented a series of logical filters to refine information and contrasts were performed
normality, linearity in the relationship between variables, analysis (sectors) and study of
R S. Publication (rspublication.com), [email protected]
Page 48
International Journal of Computer Application
Available online on http://www.rspublication.com/ijca/ijca_index.htm
Issue 4, Volume 4 (July-August 2014)
ISSN: 2250-1797
outliers (outliers), being clear in clear breach of these hypotheses, as the literature predicts
accounting ratios.
Fig 2 The Spatial Data Mining Process
The four components explain more than 85% of the original built-in variables (ratios)
variability. It is with these new variables with we will investigate the characteristics that
differentiate accounting firms most and least profitable of Andalusia. It will be, therefore, a
type polar extremes approach, characteristic of certain discriminant analysis, but has expanded
into this work, incorporating the scope of analysis of accounting profitability ratios with the
idea of thick border (thick frontiere), from studies own efficiency of financial economics. So
you will get a better discrimination between the (profitable) companies more efficient and less,
while that isolates industry more than possible effect (sector), which is an advantage manifest
of this approach (10)
III. METHODOLOGY
The data refers to this definition are events recorded in a database. Patterns are expressions
of a specific language, used to designate subsets of the events in the database. this means that
these expressions depend on the technique or method used to analyze the data. For example, if
the classification technique is agglomerative, employers may be groups (clusters) or partitions.
Standards should be potentially useful in the sense that they to be used to make decisions.
Employers should be "understandable" by human beings - what means that, first, must be
simple and interpretable with easy and intuitive in the informal language of human beings
expression. The interesting term in the definition means a combination of validity, novelty,
usefulness and simplicity and is connected with the value of pattern discovered.
Fig 3. Types of Web Data
R S. Publication (rspublication.com), [email protected]
Page 49
International Journal of Computer Application
Available online on http://www.rspublication.com/ijca/ijca_index.htm
Issue 4, Volume 4 (July-August 2014)
ISSN: 2250-1797
In logical sense, the data represents the logical value propositions really. The occurrence of
a data equivalent to the statement that is true that something has happened; an apparatus or a
person registered and observed at a specific time and place, a certain value; and that that's true.
Today, organizations routinely recorded almost everything occurs: the money flows, the facts
relating to the life of people, material flows and decisions, images, sounds. The world is
represented, with increasing realism, in databases. The result is that not only the volume of
data grows exponentially but increases in the same proportion the volume of data to analyze.
a
a
a
a
D
x
11
20
32
41
1
x1
1
0
2
0
1
x2
1
2
0
0
2
x3
1
2
2
1
0
x4
2
1
0
0
2
x5
2
1
1
0
2
x6
2
1
2
1
1
7
Table 1. The Dataset
A distinctive character of data mining is to be performed on large databases. They can not
be confused with very large files; They are, yes, real systems data organized and managed
according principles and methods. Currently, the most widely used system for managing
databases based on the relational model. CODD. This model allows not only define the
structure of relational databases but also programming queries (queries) in both local
databases and on geographically distributed databases.
IV. RESULT
The support of a rule is the frequency with which the rule occurs in the training data. For
example 25 transactions in out of 100 transactions, assuming the training data includes 100
transactions, contain Pepsi. Then the support of Pepsi is 0.25. The item sets within the rules
generated are considered to be frequent item sets; Pepsi is frequent in the previous example.
Further constraints such as Confidence and Lift can be applied to the rules that are generated.
Applying other constraints is defined as interest are:
Some constraints that can be applied as a measure of interestingness in association rule
discovery, the frequent item set generation process is the part which needs improvement in
efficiency. When the training data has thousands of transactions finding frequent item sets can
take up a lot of time. Therefore, most research related to association rule discovery has been
conducted to improve the frequent item sets generation process. Specifically most of the
attention has been given to improve part one of the three part process. There are several
algorithms which have been developed to mine association rules quickly.
R S. Publication (rspublication.com), [email protected]
Page 50
International Journal of Computer Application
Available online on http://www.rspublication.com/ijca/ijca_index.htm
Issue 4, Volume 4 (July-August 2014)
ISSN: 2250-1797
MI - Splitting Criteria Methods Performance
80.00%
73.21%
70.54%
70.54%
68.75%
67.86%
70.00%
65.89%
63.04%
62.32%
63.39%
61.61%
61.78%
58.04%
60.00%
56.25%
54.46%
Percentage %
50.00%
50.00%
40.00%
30.00%
20.00%
10.00%
7.03%
4.89%
3.29%
3.17%
5.41%
0.00%
Information Gain
Gain Ratio
Chi-squared
Gini Gain
Distance Measure
Splitting Criteria Methods
Min Correctly Classified Percentage
Max Correctly Classified Percentage
Avg Correctly Classified Percentage
St. Dev. Correctly Classified Percentage
Fig 4. MI - Splitting Criteria Performance Analysis
Fig 5. An Association Rule of Data Mining
We also propose new metrics to evaluate the utility of the proposed approaches and we
compare these approaches. The experimental evaluations demonstrate that the proposed
techniques are effective at removing direct and/or indirect discrimination biases in the original
data set while preserving data quality
V. DISCUSSION AND CONCLUSION
Temporal databases are naturally good sources for knowledge discovery. Encoding a
temporal database has lead to better utilization of the memory. Temporal databases could also
be used to view how discovered knowledge changes through time. This is a major concept of
this thesis. This concept helps organizations in better decision making and to function in a well
formulated manner.
This paper, projects the impact of the family of algorithms on an encoded database for
association rule mining. Each of the algorithms has a different impact and produces effective
results. The effect and algorithms on a temporal data base which has been encoded by an
encoding method has more openings to provide advantageous results in terms of lower
computation of time and space. The concept of assigning priorities always leads to more
R S. Publication (rspublication.com), [email protected]
Page 51
International Journal of Computer Application
Available online on http://www.rspublication.com/ijca/ijca_index.htm
Issue 4, Volume 4 (July-August 2014)
ISSN: 2250-1797
advantages than other concepts which treat all items uniformly. In a similar manner
prioritizing the items and then encoding the temporal database has lead to minimize of time
and memory computation. A very important fact is that these results are obtained by including
the time constraints that is within the specified valid time interval. Today, all applications are
time based in order to satisfy the real time constraints. Hence applying the Priority Based
Temporal Mining which involves the important concepts of priority, encoding, valid time
interval, and temporal mining to real time applications that has response time constraints will
improve the performance measures in a sharp and distinct manner. In Minimum weighted
Fuzzy Approach, temporal databases, a good source for knowledge discovery, has been
developed.
With the rapid development of Web applications and great flux of Web information
available on the Internet, World Wide Web has become very popular recently and
brought us a powerful platform to disseminate information and retrieve information as
well as analyze information. Although the progress of the web-based data management
research results in developments of many useful Web applications or services, like Web
search engines, users are still facing the problems of information overload and drowning
due to the significant and rapid growth in the amount of information and the number of
users. In particular, Web users usually suffer from the difficulties of finding desirable and
accurate information on the Web due to two problems of low precision and low recall
caused by the above reasons. Thus, the emerging of Web has put forward a lot of
challenges to Web researchers for web-based information management and retrieval.
REFERENCES
[1] Böttcher, M., Spott, M., Nauck, D., and Kruse, R., 2009, Mining changing customer
segments in dynamic markets. Expert Systems with Applications, Vol. 36, No. 1, pp.
155-164. DOI= http://dx.doi.org/10.1016/j.eswa.2007.09.006.
[2] Wu, C., Huang, Y. and Chen, J. “Privacy Preserving Association Rules by Using
Greedy Approach”, in Proceedings of World Congress on Computer Science and
Information Engineering, pp. 61-65, 2009.
[3] Wang, S. “Maintenance of Sanitizing Informative Association Rules”, International
Journal on Expert Systems with Applications, Vol. 36, pp. 4006-4012, 2009.
[4] Gkoulalas-Divanis, A. and Verykios, V. S. “Exact Knowledge Hiding through
Database Extension”, IEEE Transactions on Knowledge and Data Engineering, Vol.
21, No. 5, pp. 699-713, 2009.
[5] [28] Cui, Y., Wong, W. K. and Cheung, D. W. “Privacy Preserving
Clustering with High Accuracy and Low Time Complexity”, in Proceedings of
International Conference on Database Systems for Advanced Applications, pp. 456470, 2009.
[6] Tang, Q. Y., & Zhang, C. X. (2013). Data Processing System (DPS) software with
experimental design, statistical analysis and data mining developed for use in
entomological research. Insect Science, 20(2), 254-260.
[7] Touw, W. G., Bayjanov, J. R., Overmars, L., Backus, L., Boekhorst, J., Wels, M., &
van Hijum, S. A. (2013). Data mining in the Life Sciences with Random Forest: a
walk in the park or lost in the jungle?. Briefings in bioinformatics,14(3), 315-326.
[8] Iqbal, F., Binsalleeh, H., Fung, B., & Debbabi, M. (2013). A unified data mining
solution for authorship analysis in anonymous textual communications.Information
Sciences, 231, 98-112.
[9] Santos, I., Brezo, F., Ugarte-Pedrero, X., & Bringas, P. G. (2013). Opcode sequences
as representation of executables for data-mining-based unknown malware detection.
Information Sciences, 231, 64-82.
R S. Publication (rspublication.com), [email protected]
Page 52
International Journal of Computer Application
Available online on http://www.rspublication.com/ijca/ijca_index.htm
Issue 4, Volume 4 (July-August 2014)
ISSN: 2250-1797
[10]
Hajian, S., & Domingo-Ferrer, J. (2013). A methodology for direct and
indirect discrimination prevention in data mining. Knowledge and Data Engineering,
IEEE Transactions on, 25(7), 1445-1459.
[11]
Jian, L., Wang, C., Liu, Y., Liang, S., Yi, W., & Shi, Y. (2013). Parallel data
mining techniques on graphics processing unit with compute unified device
architecture (CUDA). The Journal of Supercomputing, 64(3), 942-967.
[12]
Rutkowski, L., Pietruczuk, L., Duda, P., & Jaworski, M. (2013). Decision
trees for mining data streams based on the McDiarmid's bound. Knowledge and Data
Engineering, IEEE Transactions on, 25(6), 1272-1279.
[13]
Zorrilla, M., & García-Saiz, D. (2013). A service oriented architecture to
provide data mining services for non-expert data miners. Decision Support
Systems,55(1), 399-411.
[14]
D’Haen, J., Van den Poel, D., & Thorleuchter, D. (2013). Predicting customer
profitability during acquisition: Finding the optimal combination of data source and
data mining technique. Expert systems with applications, 40(6), 2007-2012.
[15]
He, W. (2013). Examining students’ online interaction in a live video
streaming environment using data mining and text mining. Computers in Human
Behavior,29(1), 90-102.
[16]
Brescia, M., & Longo, G. (2013). Astroinformatics, data mining and the future
of astronomical research. Nuclear Instruments and Methods in Physics Research
Section A: Accelerators, Spectrometers, Detectors and Associated Equipment,720,
92-94.
[17]
Wu, F., Zhan, J., Yan, H., Shi, C., & Huang, J. (2013). Land cover mapping
based on multisource spatial data mining approach for climate simulation: a case
study in the farming-pastoral ecotone of North China. Advances in Meteorology,
2013.
R S. Publication (rspublication.com), [email protected]
Page 53