Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Journal of Computer Application Available online on http://www.rspublication.com/ijca/ijca_index.htm Issue 4, Volume 4 (July-August 2014) ISSN: 2250-1797 Scheduler and Power Control Performance in a Data Mining Using Rule Generation Algorithm R.Kalidasa Krishnaswami 1, Dr. Ashish Chaturvedi2 Research Scholar, Department of Computer Science and Engineering, Himalayan University, Arunachal Pradesh, India, Professor and Associate Director, Arni School of Computer Science and Applications, Arni University, Indora (Kathgarh), Himachal Pradesh, India, Abstract— Data mining and the discovery knowledge in the data, and integrates developments concepts from statistics, machine learning, data visualization and database theory. This fusion of disciplines very different has been motivated (among others) by the significant increase in data volume all spheres of human activity and in this particular case in the need for the greater amount of evidence to establish criminal intelligence policies more adjusted with Based on the data available in the different supports. Both approaches have proved complementary. While the statistic raises hypotheses to be validated from the available data, the Data Mining discovers patterns in the available data by interpretation of domain experts proposed. Index Terms— Decision tree, Data mining, Association Rules, Data Clustering I. INTRODUCTION Data mining is an area of scientific study with high expectations for the community research, mainly by expectations Transfer to society posed. from over 50 years ago have been published countless Featured articles in conferences and journals matter. However, it remains a fertile and promising field with many challenges research. This article has provided a introduction to knowledge discovery and data mining. Have described the major potential data mining provides and a list of the main methodologies used. They have also been highlighted different application domains and the main challenges and research trends. Data mining is a very important part of modern business. Mining large-scale transactional databases is considered to be a very important research subject for its obvious commercial potential 11. It is also a major challenge due to its complexity. New techniques have been developed over the last decade to solve classification12, prediction13 and recommendation14, 15, problems in these kinds of databases. To find and evaluate valuable and meaningful patterns requires huge quantities of data. Each business can produce amounts of data just by recording all customers’ events into the database and it is called data warehousing. Although large quantities of data are typically generated by dotcom web servers16, 17, 18, 19 by monitoring requested packages going to web site visitors. This information is logged into special purpose – web log files. A lot of studies have been conducted in Web usage mining. Some focus on the mining of association rules and navigation patterns in the user access paths20, 21. A session is viewed as a transaction in association rule mining and algorithms for association rule mining are employed to find frequent paths that are followed by many users. Others build data cubes from R S. Publication (rspublication.com), [email protected] Page 47 International Journal of Computer Application Available online on http://www.rspublication.com/ijca/ijca_index.htm Issue 4, Volume 4 (July-August 2014) ISSN: 2250-1797 Web server logs for OLAP and data mining 21, 23. The statistics along pages, IP domains, geographical location of users, and access time are calculated from sessions. Some others cluster users based on their access patterns24. There is also research on data preparation and query language for Web usage mining, and Web personalization based on Web usage mining. Nina et al.,25 suggests a complete idea for the pattern discovery of Web usage mining. Web site creators must have clear knowledge of user's profile and site intentions and also emphasized information of the approach users will browse Web site. The creators can examine the visitor's behavior by means of Web analysis and identify patterns of the visitor's activities. This Web analysis includes the transformation and interpretation of the Web log records to identify the hidden data or predictive pattern by the data mining and knowledge discovery process. This result provides a great view coupled with the Web warehousing. Fig 1. Data Mining and Web mining Process To analyze the usage of the Web, Web mining, especially Web usage mining, has been proposed by many researchers27, 28. Web usage mining is the mining of Web usage data. In most Web usage mining studies, Web server logs are used as the primary data source, although client and proxy level logs may be used. A Web server log collects a large amount of information about user activities on the Web site by keeping information about the requests of pages on the server. Personalized Web page recommendation is strictly restricted by the nature of web logs, the intrinsic complexity of the problem and the higher efficiency needs. When handled by existing Web usage mining methods, because of the existence of an large number of meaningful clusters and profiles for visitors of a usually highly rated Website, the model-based or distance-based techniques are likely to create very strong and simple assumptions or, on the other hand, to turn out to be highly complex and slow. The author designed a heuristic majority intelligence technique, which effortlessly adjusts to changing navigational patterns; with the low cost explicitly individuate them ahead of navigation. II. EXPLORATORY ANALYSIS STATISTICAL PROBLEMS In the exploratory analysis has shown the presence of all the problems discussed above. All tests conducted on the pattern of lost data reject the hypothesis of randomness, resulting therefore not ignorable that pattern. This evidence, along with another set of circumstances (data mean and anomalous dispersion and very high correlations (7) each other and with other ratios), it has been decided we remove the three variables affected analysis. In addition, we have implemented a series of logical filters to refine information and contrasts were performed normality, linearity in the relationship between variables, analysis (sectors) and study of R S. Publication (rspublication.com), [email protected] Page 48 International Journal of Computer Application Available online on http://www.rspublication.com/ijca/ijca_index.htm Issue 4, Volume 4 (July-August 2014) ISSN: 2250-1797 outliers (outliers), being clear in clear breach of these hypotheses, as the literature predicts accounting ratios. Fig 2 The Spatial Data Mining Process The four components explain more than 85% of the original built-in variables (ratios) variability. It is with these new variables with we will investigate the characteristics that differentiate accounting firms most and least profitable of Andalusia. It will be, therefore, a type polar extremes approach, characteristic of certain discriminant analysis, but has expanded into this work, incorporating the scope of analysis of accounting profitability ratios with the idea of thick border (thick frontiere), from studies own efficiency of financial economics. So you will get a better discrimination between the (profitable) companies more efficient and less, while that isolates industry more than possible effect (sector), which is an advantage manifest of this approach (10) III. METHODOLOGY The data refers to this definition are events recorded in a database. Patterns are expressions of a specific language, used to designate subsets of the events in the database. this means that these expressions depend on the technique or method used to analyze the data. For example, if the classification technique is agglomerative, employers may be groups (clusters) or partitions. Standards should be potentially useful in the sense that they to be used to make decisions. Employers should be "understandable" by human beings - what means that, first, must be simple and interpretable with easy and intuitive in the informal language of human beings expression. The interesting term in the definition means a combination of validity, novelty, usefulness and simplicity and is connected with the value of pattern discovered. Fig 3. Types of Web Data R S. Publication (rspublication.com), [email protected] Page 49 International Journal of Computer Application Available online on http://www.rspublication.com/ijca/ijca_index.htm Issue 4, Volume 4 (July-August 2014) ISSN: 2250-1797 In logical sense, the data represents the logical value propositions really. The occurrence of a data equivalent to the statement that is true that something has happened; an apparatus or a person registered and observed at a specific time and place, a certain value; and that that's true. Today, organizations routinely recorded almost everything occurs: the money flows, the facts relating to the life of people, material flows and decisions, images, sounds. The world is represented, with increasing realism, in databases. The result is that not only the volume of data grows exponentially but increases in the same proportion the volume of data to analyze. a a a a D x 11 20 32 41 1 x1 1 0 2 0 1 x2 1 2 0 0 2 x3 1 2 2 1 0 x4 2 1 0 0 2 x5 2 1 1 0 2 x6 2 1 2 1 1 7 Table 1. The Dataset A distinctive character of data mining is to be performed on large databases. They can not be confused with very large files; They are, yes, real systems data organized and managed according principles and methods. Currently, the most widely used system for managing databases based on the relational model. CODD. This model allows not only define the structure of relational databases but also programming queries (queries) in both local databases and on geographically distributed databases. IV. RESULT The support of a rule is the frequency with which the rule occurs in the training data. For example 25 transactions in out of 100 transactions, assuming the training data includes 100 transactions, contain Pepsi. Then the support of Pepsi is 0.25. The item sets within the rules generated are considered to be frequent item sets; Pepsi is frequent in the previous example. Further constraints such as Confidence and Lift can be applied to the rules that are generated. Applying other constraints is defined as interest are: Some constraints that can be applied as a measure of interestingness in association rule discovery, the frequent item set generation process is the part which needs improvement in efficiency. When the training data has thousands of transactions finding frequent item sets can take up a lot of time. Therefore, most research related to association rule discovery has been conducted to improve the frequent item sets generation process. Specifically most of the attention has been given to improve part one of the three part process. There are several algorithms which have been developed to mine association rules quickly. R S. Publication (rspublication.com), [email protected] Page 50 International Journal of Computer Application Available online on http://www.rspublication.com/ijca/ijca_index.htm Issue 4, Volume 4 (July-August 2014) ISSN: 2250-1797 MI - Splitting Criteria Methods Performance 80.00% 73.21% 70.54% 70.54% 68.75% 67.86% 70.00% 65.89% 63.04% 62.32% 63.39% 61.61% 61.78% 58.04% 60.00% 56.25% 54.46% Percentage % 50.00% 50.00% 40.00% 30.00% 20.00% 10.00% 7.03% 4.89% 3.29% 3.17% 5.41% 0.00% Information Gain Gain Ratio Chi-squared Gini Gain Distance Measure Splitting Criteria Methods Min Correctly Classified Percentage Max Correctly Classified Percentage Avg Correctly Classified Percentage St. Dev. Correctly Classified Percentage Fig 4. MI - Splitting Criteria Performance Analysis Fig 5. An Association Rule of Data Mining We also propose new metrics to evaluate the utility of the proposed approaches and we compare these approaches. The experimental evaluations demonstrate that the proposed techniques are effective at removing direct and/or indirect discrimination biases in the original data set while preserving data quality V. DISCUSSION AND CONCLUSION Temporal databases are naturally good sources for knowledge discovery. Encoding a temporal database has lead to better utilization of the memory. Temporal databases could also be used to view how discovered knowledge changes through time. This is a major concept of this thesis. This concept helps organizations in better decision making and to function in a well formulated manner. This paper, projects the impact of the family of algorithms on an encoded database for association rule mining. Each of the algorithms has a different impact and produces effective results. The effect and algorithms on a temporal data base which has been encoded by an encoding method has more openings to provide advantageous results in terms of lower computation of time and space. The concept of assigning priorities always leads to more R S. Publication (rspublication.com), [email protected] Page 51 International Journal of Computer Application Available online on http://www.rspublication.com/ijca/ijca_index.htm Issue 4, Volume 4 (July-August 2014) ISSN: 2250-1797 advantages than other concepts which treat all items uniformly. In a similar manner prioritizing the items and then encoding the temporal database has lead to minimize of time and memory computation. A very important fact is that these results are obtained by including the time constraints that is within the specified valid time interval. Today, all applications are time based in order to satisfy the real time constraints. Hence applying the Priority Based Temporal Mining which involves the important concepts of priority, encoding, valid time interval, and temporal mining to real time applications that has response time constraints will improve the performance measures in a sharp and distinct manner. In Minimum weighted Fuzzy Approach, temporal databases, a good source for knowledge discovery, has been developed. With the rapid development of Web applications and great flux of Web information available on the Internet, World Wide Web has become very popular recently and brought us a powerful platform to disseminate information and retrieve information as well as analyze information. Although the progress of the web-based data management research results in developments of many useful Web applications or services, like Web search engines, users are still facing the problems of information overload and drowning due to the significant and rapid growth in the amount of information and the number of users. In particular, Web users usually suffer from the difficulties of finding desirable and accurate information on the Web due to two problems of low precision and low recall caused by the above reasons. Thus, the emerging of Web has put forward a lot of challenges to Web researchers for web-based information management and retrieval. REFERENCES [1] Böttcher, M., Spott, M., Nauck, D., and Kruse, R., 2009, Mining changing customer segments in dynamic markets. Expert Systems with Applications, Vol. 36, No. 1, pp. 155-164. DOI= http://dx.doi.org/10.1016/j.eswa.2007.09.006. [2] Wu, C., Huang, Y. and Chen, J. “Privacy Preserving Association Rules by Using Greedy Approach”, in Proceedings of World Congress on Computer Science and Information Engineering, pp. 61-65, 2009. [3] Wang, S. “Maintenance of Sanitizing Informative Association Rules”, International Journal on Expert Systems with Applications, Vol. 36, pp. 4006-4012, 2009. [4] Gkoulalas-Divanis, A. and Verykios, V. S. “Exact Knowledge Hiding through Database Extension”, IEEE Transactions on Knowledge and Data Engineering, Vol. 21, No. 5, pp. 699-713, 2009. [5] [28] Cui, Y., Wong, W. K. and Cheung, D. W. “Privacy Preserving Clustering with High Accuracy and Low Time Complexity”, in Proceedings of International Conference on Database Systems for Advanced Applications, pp. 456470, 2009. [6] Tang, Q. Y., & Zhang, C. X. (2013). Data Processing System (DPS) software with experimental design, statistical analysis and data mining developed for use in entomological research. Insect Science, 20(2), 254-260. [7] Touw, W. G., Bayjanov, J. R., Overmars, L., Backus, L., Boekhorst, J., Wels, M., & van Hijum, S. A. (2013). Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle?. Briefings in bioinformatics,14(3), 315-326. [8] Iqbal, F., Binsalleeh, H., Fung, B., & Debbabi, M. (2013). A unified data mining solution for authorship analysis in anonymous textual communications.Information Sciences, 231, 98-112. [9] Santos, I., Brezo, F., Ugarte-Pedrero, X., & Bringas, P. G. (2013). Opcode sequences as representation of executables for data-mining-based unknown malware detection. Information Sciences, 231, 64-82. R S. Publication (rspublication.com), [email protected] Page 52 International Journal of Computer Application Available online on http://www.rspublication.com/ijca/ijca_index.htm Issue 4, Volume 4 (July-August 2014) ISSN: 2250-1797 [10] Hajian, S., & Domingo-Ferrer, J. (2013). A methodology for direct and indirect discrimination prevention in data mining. Knowledge and Data Engineering, IEEE Transactions on, 25(7), 1445-1459. [11] Jian, L., Wang, C., Liu, Y., Liang, S., Yi, W., & Shi, Y. (2013). Parallel data mining techniques on graphics processing unit with compute unified device architecture (CUDA). The Journal of Supercomputing, 64(3), 942-967. [12] Rutkowski, L., Pietruczuk, L., Duda, P., & Jaworski, M. (2013). Decision trees for mining data streams based on the McDiarmid's bound. Knowledge and Data Engineering, IEEE Transactions on, 25(6), 1272-1279. [13] Zorrilla, M., & García-Saiz, D. (2013). A service oriented architecture to provide data mining services for non-expert data miners. Decision Support Systems,55(1), 399-411. [14] D’Haen, J., Van den Poel, D., & Thorleuchter, D. (2013). Predicting customer profitability during acquisition: Finding the optimal combination of data source and data mining technique. Expert systems with applications, 40(6), 2007-2012. [15] He, W. (2013). Examining students’ online interaction in a live video streaming environment using data mining and text mining. Computers in Human Behavior,29(1), 90-102. [16] Brescia, M., & Longo, G. (2013). Astroinformatics, data mining and the future of astronomical research. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment,720, 92-94. [17] Wu, F., Zhan, J., Yan, H., Shi, C., & Huang, J. (2013). Land cover mapping based on multisource spatial data mining approach for climate simulation: a case study in the farming-pastoral ecotone of North China. Advances in Meteorology, 2013. R S. Publication (rspublication.com), [email protected] Page 53