Download Educational Data Mining –A New Approach to the

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
International Journal On Advanced Computer Theory And Engineering (IJACTE)
_______________________________________________________________________________________________
Educational Data Mining –A New Approach to the Education Systems
1
Patil Sameer G., 2Barahate Sachin R.
1
2
Department of Computer Engineering Tasgaonkar College of Engg. & Management, Mumbai, India
Dept. of Information Technology Yadavrao, Padmbhushan Vasatdada Pratishtan College of Engg. Mumbai, India
Abstract: Data mining techniques are analytical tools that
can be used to extract meaningful knowledge from large
data sets. Data mining is an interdisciplinary field, that
confluence multiple disciplines. Data mining has enormous
applications for businesses and industries, but newest area
of its applicability is the education sector.
One of the biggest challenges that higher education system
face today is to improve the quality of education and
managerial decisions. The managerial decision making
process becomes more intricate as the complexity of
educational entities increase. Educational institute seek out
more efficient technology to support decision making
procedures and to formulate better management plans.
This can be achieved by utilizing valuable implicit
knowledge, which is currently unknown. This knowledge is
hidden among the educational data set and it is extractable
through data mining techniques.
Artificial Neural Networks is the one of the promising data
mining tool. It has better performance than that of many
other traditional data mining techniques, so it can be used
for narrowing knowledge gap that exists in higher learning
institutes. Educational data mining through artificial
neural networks will help to enhance traditional
educational procedures.
This research presents the capabilities of data mining in
the perspective of higher educational system by proposing
a systematic roadmap for higher learning institutions to
enhance their current decision processes. It also aims at
applying data mining techniques to discover new explicit
knowledge for improving quality of education.
I. INTRODUCTION:
There is valuable information hidden in data. Since the
underlying data is generated much faster than it can be
processed and made sense of, this information often
remains buried and untapped. It becomes virtually
impossible for individuals or groups with limited
resources specifically technological to find and gain any
insight from the data.
The term data mining" is more popular than the longer
term of knowledge discovery in databases"(KDD). Data
mining is the process of discovering interesting
knowledge from large amounts of data stored either in
databases, data warehouses, or other information
repositories. Data mining is a field at the intersection of
computer science and statistics, is the process that
attempts to discover patterns in large data sets. It utilizes
methods at the intersection of artificial intelligence,
machine learning, statistics, and database systems. The
overall goal of the data mining process is to extract
information from a data set and transform it into an
understandable structure for further use.
Furthermore, finding patterns and relationships can also
result in prediction of future outcomes. The importance
of data mining has been established for business
applications, criminal investigations, bio-medicine and
more recently counter-terrorism. Most retailers, for
example, employ data mining practices to uncover
customer buying patterns; Amazon.com uses purchase
history to make product recommendations to shoppers.
Data mining can be applied wherever there is an
abundance of data available for and in need of analysis.
II. REVIEW OF LITERATURE:
Data mining is a field which has influence from many
disciplines, including databases, information retrieval,
statistics, algorithms, and machine learning. Data mining
can be either predictive or descriptive [7]. Predictive
model makes prediction about values of data using
known results found from different data. Tasks as
Classification, Prediction, Regression comes under
predictive data mining. Descriptive model identifies
patterns or relationships in data. Tasks as Clustering,
Association rules come under descriptive data mining
[7].
Classification is one of the most common data mining
function/task, which is applicable in almost all the
fields. Classification maps data into predefined groups
or classes. It is often called as supervised learning
because the classes are determined before examining the
data. Data mining provides certain algorithms for the
classification function, but classification of data must
obey certain criteria. A good classification algorithm
should have good predictive accuracy, fast working
speed, robustness, scalability etc. Traditional algorithms
_______________________________________________________________________________________________
ISSN (Print): 2319-2526, Volume -5, Issue -1, 2016
18
Data Mining encompasses tools and techniques for the
extraction or mining knowledge from large amounts of
data. There are many other terms carrying a similar or
slightly different meaning to data mining, such as
knowledge mining from databases, knowledge
extraction, data pattern analysis, data archaeology, and
data dredging. And other popularly used term,
"Knowledge Discovery in Databases", or KDD.
International Journal On Advanced Computer Theory And Engineering (IJACTE)
_______________________________________________________________________________________________
may not work as required for given data. Hence, there is
a need for introduction of some new techniques in the
field of data mining. Artificial Neural Network (ANN) is
one such field which can be applied for data mining
functions [6].
In this work, I am going to focus on data classification
function by using traditional data mining techniques as
well as artificial neural network techniques. I will use
some datasets under classification, apply traditional as
well as ANN techniques for classification and finally
compare the results of two classifications. This will give
us more accuracy of classification and will also help in
selecting better algorithm for classification.
III. PROBLEM STATEMENT:
Data mining is the process of discovering hidden
messages, patterns and knowledge within large amounts
of data and of making predictions for outcomes or
behaviors. Many applications areas such as banking,
retail industry and marketing, fraud detection, computer
auditing,
biomedical
and
DNA
analysis,
telecommunications, financial industry have already
been advanced through the sturdy techniques of data
mining. Another application area that can take advantage
of data mining techniques is educational system,
especially in higher learning institutes.
system is to prove that data mining using Artificial
Neural Network can be more accurate than as compare
to some of traditional data mining techniques. Also in
this research engineering students also classified on
criteria based on their learning environments and
provide necessary information to improve their class of
performance.
V. PHASES OF THE PROPOSED SYSTEM
1.
State the Problem and Collect Data:
Most data-based modeling studies are performed in a
particular application domain. Hence, domain-specific
knowledge and experience are usually necessary in order
to come up with a meaningful problem statement.
Selection of related data concerned with how the data
are generated and collected. In general, there are two
distinct possibilities. The first is when the datageneration process is under the control of an expert
(modeler): this approach is known as a designed
experiment.
One of the most important facts in higher education
system is quality objectives. Higher learning institutes
face many problems which keep them away from
achieving their quality objectives. Several of these
problems stem from knowledge gap. Knowledge gap is
the lack of significant knowledge at the educational
main processes such as counseling, planning,
registration, evaluation and marketing etc.
The main idea is that the hidden patterns, associations,
and anomalies that are discovered by data mining
techniques can help bridge this knowledge gap in higher
learning institutions. The knowledge discovered by data
mining techniques would enable the higher learning
institutions in making better decisions, having more
advanced planning in directing students, predicting
individual behaviors with higher accuracy, and enabling
the institution to allocate resources and staff more
effectively. It results in improving the effectiveness and
efficiency of the processes thus, maintaining quality of
education.
IV. PROPOSED SYSTEM:
For Educational Data Mining purpose, we have to make
choice from available data mining algorithms depending
upon the type of task to be done. For the best results,
selection and implementation of the appropriate datamining algorithm is the main task and this process is not
straightforward. As data mining is the interdisciplinary
field, it provides multiple algorithms for doing the same
task, but accuracy of each algorithm may vary. Hence
selection of appropriate algorithm, for given data mining
task, must be done carefully. So the idea of proposed
Figure 5.1: Selection of Data Mining Technique
The second possibility is when the expert cannot
influence the data generation process: this is known as
the observational approach. An observational setting,
namely, random data generation, is assumed in most
data-mining applications. Also, it is important to make
sure that the data used for estimating a model and the
data used later for testing and applying a model come
from the same, unknown, sampling distribution. If this is
not the case, the estimated model cannot be successfully
used in a final application of the results.
2. Data Preprocessing:
The data collected from the industry and other sources is
complex and have noisy, missing and inconsistent data.
The data is preprocessed to improve the quality of data
and make it fit for the data mining task. The data used
are transformed into appropriate formats to support
meaningful analysis. Some more attributes are derived
using the acquired knowledge to support the mining
process. Generally, a good preprocessing method
_______________________________________________________________________________________________
ISSN (Print): 2319-2526, Volume -5, Issue -1, 2016
19
International Journal On Advanced Computer Theory And Engineering (IJACTE)
_______________________________________________________________________________________________
provides an optimal representation for a data-mining
technique by incorporating a priori knowledge in the
form of application- specific scaling and encoding.
[2]
Prof. Sonal kadu , Prof.Sheetal Dhande ,Elective
Data Mining Through Neural Networks",
IJARCSSE - Volume 2, Issue 3, March 2012
ISSN: 2277 128X
[3]
Dr. Yashpal Singh, Alok Singh Chauhan , Neural
Networks in Data Mining" Journal of Theoretical
and Applied Information Technology 2009.
[4]
Haykin S., Neural Networks, Prentice Hall
International Inc., 1999
[5]
Hongjun Lu, Rudy Setiono, and Huan Liu,
Elective Data Mining Using Neural Networks"
IEEE Trans. Knowledge and Data Eng., vol. 8,
no. 6.
[6]
Svein Nordbotten ,Data Mining with Neural
Networks", Svein Nordbotten and Associates
Bergen 2006.
[7]
M.H. Dunham, S.Sridhar ,Data Mining
Introductory and Advanced Topics", Pearson
Education 2007 ISBN 81-7758-785-4.
3. Apply Algorithm:
Apply traditional data mining algorithm and backpropagation neural network algorithm on the
preprocessed data and evaluate the accuracy of each
algorithm. Accuracy is the most important factor to
evaluate any model in data mining, so select the model
which gives better accuracy. Data classification
techniques are to be used as per this model. Naive
Bayesian classification, decision tree classifiers are the
examples for traditional data mining algorithms while
back-propagation algorithm is ANN technique.
4. Evaluate Algorithm:
Classification and prediction methods can be compared
and evaluated according to the following criteria:

Predictive Accuracy: This refers to the ability
of the model to correctly predict the class label of new
or previously unseen data.
[8]

Speed: This refers to the computation costs
involved in generating and using the model.
Han J, Kamber, Data Mining Concepts and
Techniques", Morgan Kaufmann, M 2001.
[9]

Robustness: This is the ability of the model to
make correct predictions given noisy data or data with
missing values.
Witten, Ian H. and Frank, Eibe, Data mining:
Practical machine learning tools and Techniques",
Academic Press, 2000.
[10]

Scalability: This refers to the ability of the
learned model to perform efficiently on large amounts of
data.
Mrs. Bharati M. Ramageri, Data Mining
Techniques and Applications", IJCSE Vol. 1 No.
4301-305
[11]
Educational DataMining.org". 2012. Retrieved
2012-09-16. 53
[12]
R. Baker, K. Yacef, The State of Educational
Data Mining in 2009: A Review and Future
Visions", Journal of Educational Data Mining,
Volume 1, Issue 11-3-17.
[13]
C. Romero, S. Ventura, E. Garcia, Data mining in
Course Management Systems: MOODLE Case
Study and Tutorial", Computers and Education.
51(1) 368-384.
[14]
C. Romero, S. Ventura, Educational Data Mining:
A Review of the State-of-the- Art",
IEEE
Transaction on Systems, 40(6), 601{618, 2010.
[15]
Umesh Kumar Pandey, Brijesh Kumar Bhardwaj,
Saurabh pal, \Data Mining as a Torch Bearer in
Education Sector", Technical Journal of
LBSIMDS
[16]
Al-Radaideh, A. Qasem, E. M. Al-Shawakfa and
M.I. Al-Najjar, \Mining Student Data Using
Decision Trees, International Arab Conference on
Information Technology, 2006.

Interpretability: This refers to the level of
understanding and insight that is provided by the learned
model.
VI. CONCLUSION
This work is an effort to enhance the traditional
educational process via strategic roadmap of data mining
functionalities. The proposed EDM model is used to
analyze the current works of data mining in education
and identify the existing gaps and further works. The
main contribution of this work discusses on how the
various data mining techniques can be applied to the set
of educational data and what new explicit knowledge or
models are discovered. These models can be either
predictive or descriptive. The obtained rules from each
model can be translated into plain text for setting new
strategies and plans to improve managerial decision
making.
REFERENCES:
[1]
Prof. Barahate Sachin , Prof.Shelake Vijay ,A
Survey and Future of Data Mining in Educational
Field", IEEE 978-1-4673-0471-9, 2012.

_______________________________________________________________________________________________
ISSN (Print): 2319-2526, Volume -5, Issue -1, 2016
20