Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
DATA MINING BY K.ALHAF MALIK Mail ID: [email protected] Contact No: 9791922448 & M.VASANTHA KRISHNAN III YEAR B.E COMPUTER SCIENCE & ENGINEERING SYED AMMAL ENGINEERING COLLEGE, RAMANATHAPURAM. ABSTRACT This paper will present an overview of the different processes and techniques involved in Data One of the biggest challenges Mining and its applications in higher that higher education faces today is education. It presents a model which predicting the paths of students. For applies the advantage of data mining example, institutions would need to in an higher educational institution know a students choice to enroll in with the help of a case study . particular course programs, and the number of students who will need Key words: enrollment management , assistance in order to graduate. Are motivate, traditional issues, mining some students more likely to transfer patterns, individual behavior than others? In addition to this challenge, traditional issues such as Sections: 1 introduction 2 data mining enrollment management , continue to in higher education 3 data mining motivate higher education institutions process 4 data mining techniques 5 to search for better solutions. implementation of data mining in higher education-a study 6 conclusion One way to effectively address these is through the analysis and presentation of data, or more specifically data mining. Data mining patterns are then use to built data mining models and used to predict individual behavior with high accuracy. As a result of this insight, institutions are able to allocate resources and staff more effectively. 2 those students most at risk. In order to INTRODUCTION understand how and why data mining works, it’s important to understand a Data mining can be defined as few fundamental concepts that are "a decision support process in which explained below. we search for patterns of information in data." This search may be done just DATA MINING PROCESS by the user, i.e. just by performing From a process-oriented view, queries, in which case it is quite hard and in most comprehensive of the enough cases to there are three classes of data mining not activity: discovery, predictive reveal intricate patterns. Data mining uses modeling and forensic analysis, as sophisticated statistical analysis and shown in figure below. modeling techniques to uncover such patterns and relationships hidden in organizational databases - patterns that ordinary methods might miss. Once found, the information needs to be presented in a suitable form, with graphs, reports, etc. DATA MINING IN Discovery is the process of HIGHER EDUCATION looking in a database to find hidden Data mining is a powerful tool patterns without a predetermined idea for academic intervention. Through or hypothesis about what the patterns data mining, a university could, for may be. In other words, the program example, predict with 85 percent takes the initiative in finding what the accuracy the number of students who interesting patterns are, without the will user thinking of the relevant questions or will not graduate. The university could use this information first. to concentrate academic assistance on 3 In modeling Sequence Analysis. Neural network is patterns discovered from the database also one of the key ingredient for are modern data mining. used predictive to predict the future. Predictive modeling thus allows the user to submit records with some Classification: unknown field values, and the system will guess the unknown values based The on previous patterns discovered from set of grouping rules that can be used patterns in data, predictive modeling to classify future data. The mining applies the patterns to guess values for tool new data items. Forensic analysis is discover identifies the training data. Once the clusters are patterns to find anomalous or unusual To automatically clusters, by studying the pattern in the the process of applying the extracted elements. techniques analyze a set of data and generate a the database. While discovery finds data clustering generated, classification can be used to the identify, to which particular cluster, unusual, we first find what is the an input belongs. For example, one norm, and then we detect those items may classify diseases and provide the that deviate from the usual within a symptoms, which describe each class given threshold. Discovery helps us or subclass. find "usual knowledge," but forensic analysis looks for unusual and specific Association: cases. An association rule is a rule that DATA MINING implies certain association relationships among a set of objects in TECHNIQUES a database. In this process we discover a set of association rules at multiple Data Mining has three major components Clustering levels of abstraction from the relevant or set(s) of data in a database. Classification, Association Rules and 4 becomes more common, neural nets Sequential Analysis: and decision trees are also getting more In sequential Analysis, we seek methods sequence. This deals with data that statistical nodes in the hidden layer) to build a same transaction in the case of model that takes and combines a set of association) e.g. if a shopper buys item inputs to predict a continuous or A in the first week of the month, and categorical variable. then he buys item B in the second week etc. Neural Nets and Decision Trees: For any given problem, the nature of the data will affect the techniques you choose. Consequently, you'll need a variety of tools and technologies to find the best possible are among the most common, so the more popular ways for building them have been explained here. Classifications typically involve at least one of two workhorse statistical techniques less Neural nets use many parameters (the opposed to data that appearing the models require sophistication on the part of the user. appear in separate transactions (as Classification Although complex in their own way, these to discover patterns that occur in model. consideration. - logistic regression (a generalization of linear regression) and discriminate analysis. However, as data mining 5 The value from each hidden The institution wanted to improve its node is a function of the weighted sum service of the values from all the preceding students behavior in different fields. nodes that feed into it. The process of The building a model involves finding the information based on the students connecting weights that produce the who take the most credit hours, most accurate results by "training" students who are most likely to return the neural net with data. The most for more classes , students who are common training method is back- the propagation, in which the output university/college, types of courses result is compared with known correct that attracts more students. The values. After each comparison, the model below was broadly followed for weights are adjusted and a new result building the mining prototype. computed. After enough levels data by that identifying was “persisters” the mined has of the passes through the training data, the neural net typically becomes a very good predictor. IMPLEMENTATION OF DATA MINING IN HIGHER EDUCATION – A STUDY Introduction 'An application' was built for an educational institution that wanted Building to explore hidden trends in its data. continuous 6 the above process model is a incorporating several feedback loops and period of year, boarding and considerable interaction among the lodging facilities. components. At each stage, there are To find the relationship among various checks to ensure that the the fields based on the student model is in fact meeting the required behavior. objectives. 1. PROBLEM 2. SOLUTION SELECTION: SELECTION: To make the best use of data This step required a number of mining, one must make a clear iterations, to converge to a final statement of the objectives. You may approach to solution selection. For the wish to increase response to a direct problem neural networks can be used mail campaign. Different goals, such to find out the impact of various as "increasing the response rate" and attributes on the students course "increasing the value of a response," selection will require very induction was applied to get the final different models. An effective problem result, as a set of rules. For the statement will also include a way to problem, measure the results of your knowledge Association rules were used to find out discovery project. The points that the relationship among the fields, were selected for mining in higher based on the student behavior. behavior. Factor Then analysis rule and education institution are: 3. DATA SELECTION To identify the characteristics AND PREPARATION: of the students who take the most credit hours these characteristics were subjects This step is the most time most preferred, timing most consuming. The data preparation comfortable to the students, 7 steps may consume between 50 and 85 and courses, making the mining percent of the time and effort of the algorithms inefficient. whole knowledge discovery process. . In many of the fields that were Here captured, much of the data was the course details was summarized to capture the essential incomplete or inaccurate. elements of each students course selection and persistence behavior 4. BUILD MODEL: using the following fields: . The three most common courses The preferred by students. . The professors discovery analysis and predictive modeling were identified as handling those the appropriate activities to build the subjects. model. Association rules and factor . The three most common subjects in analysis techniques were used for which the students redeems. modeling . The subjects chosen when neural discovery nets, rule analysis and induction and redeeming. decision trees were used . For each of these, we also analyse the for predictive modeling. number of classes or lectures attended regularly by maximum number of The following tools were used to students. . Consolidation of course mine the data - Clementine, Business based Miner information based on the number of and Intelligent Miner. Clementine was used for applying subjects chosen by students in various neural networks, rule induction and fields. association. Business Miner was used for building decision trees and Intelligent miner for doing factor The following issues in data analysis and finding associations. quality were found: . There were an insufficient number of attributes captured about the students 8 years. 5. RESULTS: A significant number of students with very good marks and a Category Type: few averagers took difficult subjects like which zoology, biotechnology, The different categories to thermodynamics etc., but it was seen the students belong were that the 15% of the averagers dropped part time, out. The below average students took residential, lateral, NRI etc., The most very few courses which were more important characteristic in which common than easy . Some of the below frequent students differed from their averagers also took popular subjects, counterparts was by category type. which they eventually completed in a NRI and discretionary were found to longer span of time than the others. Discretionary, regular, take lower than average number of courses whereas more courses were 6.MODEL taken by regulars and laterals . MONITORING After using the model, one Applying Type: should measure how well it has worked. For example, suppose you 88% of the students applied their courses through build a model that identifies people post, and who are likely to leave your long nearly 70% of them joined the college or university distance telephone service for another per year, which was (known as churn). You know the rate significantly higher than the other of churn prior to using the model, and applying types . you can predict what the churn rate will be after you design interventions intended to keep good customers. Subject Type: Notice that it's not the model alone Most common and popular but the actions taken based on the subjects like math, physics, history model that will determine its success. etc., were taken mostly, by students The with average marks in their previous 9 results obtained from the educational institution application to collect and prepare the data were checked against the original properly and to check models against database and were found to be the real world. The "best" model is significant. often found after building models of several different types and by trying out various technologies or CONCLUSION algorithms. Data mining offers great The data mining area is still promise in helping organizations relatively young, and tools that uncover hidden patterns in their data. support the whole of the data mining With these ability to uncover hidden process in an easy to use fashion are patterns in large databases, rare. However, one of the most community colleges and universities important issues facing researchers is can build models that predict—with a the use of techniques against very high degree of accuracy—the behavior large data sets. All the mining of population clusters. By acting on techniques are based on Artificial these predictive models, educational Intelligence, where they are generally institutions can effectively address executed against small sets of data, issues ranging from transfers and which can fit in memory. However, in retention, etc., However, data mining data mining applications these tools must be guided by users who techniques must be applied to data understand the business, the data, and held in very large databases. These the general nature of the analytical include use of parallelism and methods involved. Realistic development of new database oriented expectations can yield rewarding techniques. However, much work is results across a wide range of required before data mining can be applications, from improving revenues successfully applied to large data sets. to reducing costs Only then will the true potential of data mining be able to be realized. Building models is only one step in knowledge discovery. It's vital 10