Download data mining in higher education

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
DATA MINING
BY
K.ALHAF MALIK
Mail ID: [email protected]
Contact No: 9791922448
&
M.VASANTHA KRISHNAN
III YEAR B.E
COMPUTER SCIENCE & ENGINEERING
SYED AMMAL ENGINEERING COLLEGE,
RAMANATHAPURAM.
ABSTRACT
This paper will present an
overview of the different processes
and techniques involved in Data
One of the biggest challenges
Mining and its applications in higher
that higher education faces today is
education. It presents a model which
predicting the paths of students. For
applies the advantage of data mining
example, institutions would need to
in an higher educational institution
know a students choice to enroll in
with the help of a case study .
particular course programs, and the
number of students who will need
Key words: enrollment management ,
assistance in order to graduate. Are
motivate, traditional issues, mining
some students more likely to transfer
patterns, individual behavior
than others? In addition to this
challenge, traditional issues such as
Sections: 1 introduction 2 data mining
enrollment management , continue to
in higher education 3 data mining
motivate higher education institutions
process 4 data mining techniques 5
to search for better solutions.
implementation of data mining in
higher education-a study 6 conclusion
One way to effectively address
these is through the analysis and
presentation of data, or more
specifically data mining. Data mining
patterns are then use to built data
mining models and used to predict
individual behavior with high
accuracy. As a result of this insight,
institutions are able to allocate
resources and staff more effectively.
2
those students most at risk. In order to
INTRODUCTION
understand how and why data mining
works, it’s important to understand a
Data mining can be defined as
few fundamental concepts that are
"a decision support process in which
explained below.
we search for patterns of information
in data." This search may be done just
DATA MINING PROCESS
by the user, i.e. just by performing
From a process-oriented view,
queries, in which case it is quite hard
and
in
most
comprehensive
of
the
enough
cases
to
there are three classes of data mining
not
activity: discovery, predictive
reveal
intricate patterns. Data mining uses
modeling and forensic analysis, as
sophisticated statistical analysis and
shown in figure below.
modeling techniques to uncover such
patterns and relationships hidden in
organizational databases - patterns
that ordinary methods might miss.
Once found, the information needs to
be presented in a suitable form, with
graphs, reports, etc.
DATA MINING IN
Discovery is the process of
HIGHER EDUCATION
looking in a database to find hidden
Data mining is a powerful tool
patterns without a predetermined idea
for academic intervention. Through
or hypothesis about what the patterns
data mining, a university could, for
may be. In other words, the program
example, predict with 85 percent
takes the initiative in finding what the
accuracy the number of students who
interesting patterns are, without the
will
user thinking of the relevant questions
or
will
not
graduate.
The
university could use this information
first.
to concentrate academic assistance on
3
In
modeling
Sequence Analysis. Neural network is
patterns discovered from the database
also one of the key ingredient for
are
modern data mining.
used
predictive
to
predict
the
future.
Predictive modeling thus allows the
user to submit records with some
Classification:
unknown field values, and the system
will guess the unknown values based
The
on previous patterns discovered from
set of grouping rules that can be used
patterns in data, predictive modeling
to classify future data. The mining
applies the patterns to guess values for
tool
new data items. Forensic analysis is
discover
identifies
the
training data. Once the clusters are
patterns to find anomalous or unusual
To
automatically
clusters, by studying the pattern in the
the process of applying the extracted
elements.
techniques
analyze a set of data and generate a
the database. While discovery finds
data
clustering
generated, classification can be used to
the
identify, to which particular cluster,
unusual, we first find what is the
an input belongs. For example, one
norm, and then we detect those items
may classify diseases and provide the
that deviate from the usual within a
symptoms, which describe each class
given threshold. Discovery helps us
or subclass.
find "usual knowledge," but forensic
analysis looks for unusual and specific
Association:
cases.
An association rule is a rule that
DATA
MINING
implies
certain
association
relationships among a set of objects in
TECHNIQUES
a database. In this process we discover
a set of association rules at multiple
Data Mining has three major
components
Clustering
levels of abstraction from the relevant
or
set(s) of data in a database.
Classification, Association Rules and
4
becomes more common, neural nets
Sequential Analysis:
and decision trees are also getting
more
In sequential Analysis, we seek
methods
sequence. This deals with data that
statistical
nodes in the hidden layer) to build a
same transaction in the case of
model that takes and combines a set of
association) e.g. if a shopper buys item
inputs to predict a continuous or
A in the first week of the month, and
categorical variable.
then he buys item B in the second
week etc.
Neural Nets and Decision
Trees:
For any given problem, the
nature of the data will affect the
techniques you choose. Consequently,
you'll need a variety of tools and
technologies to find the best possible
are
among the most common, so the more
popular ways for building them have
been explained here. Classifications
typically involve at least one of two
workhorse statistical techniques
less
Neural nets use many parameters (the
opposed to data that appearing the
models
require
sophistication on the part of the user.
appear in separate transactions (as
Classification
Although
complex in their own way, these
to discover patterns that occur in
model.
consideration.
-
logistic regression (a generalization of
linear regression) and discriminate
analysis. However, as data mining
5
The value from each hidden
The institution wanted to improve its
node is a function of the weighted sum
service
of the values from all the preceding
students behavior in different fields.
nodes that feed into it. The process of
The
building a model involves finding the
information based on the students
connecting weights that produce the
who take the most credit hours,
most accurate results by "training"
students who are most likely to return
the neural net with data. The most
for more classes , students who are
common training method is back-
the
propagation, in which the output
university/college, types of courses
result is compared with known correct
that attracts more students. The
values. After each comparison, the
model below was broadly followed for
weights are adjusted and a new result
building the mining prototype.
computed.
After
enough
levels
data
by
that
identifying
was
“persisters”
the
mined
has
of
the
passes
through the training data, the neural
net typically becomes a very good
predictor.
IMPLEMENTATION
OF DATA MINING IN
HIGHER EDUCATION
– A STUDY
Introduction
'An application' was built for
an educational institution that wanted
Building
to explore hidden trends in its data.
continuous
6
the
above
process
model
is
a
incorporating
several
feedback
loops
and
period of year, boarding and
considerable interaction among the
lodging facilities.

components. At each stage, there are
To find the relationship among
various checks to ensure that the
the fields based on the student
model is in fact meeting the required
behavior.
objectives.
1.
PROBLEM
2. SOLUTION
SELECTION:
SELECTION:
To make the best use of data
This step required a number of
mining, one must make a clear
iterations, to converge to a final
statement of the objectives. You may
approach to solution selection. For the
wish to increase response to a direct
problem neural networks can be used
mail campaign. Different goals, such
to find out the impact of various
as "increasing the response rate" and
attributes on the students course
"increasing the value of a response,"
selection
will require very
induction was applied to get the final
different models. An effective problem
result, as a set of rules. For the
statement will also include a way to
problem,
measure the results of your knowledge
Association rules were used to find out
discovery project. The points that
the relationship among the fields,
were selected for mining in higher
based on the student behavior.
behavior.
Factor
Then
analysis
rule
and
education institution are:

3. DATA SELECTION
To identify the characteristics
AND PREPARATION:
of the students who take the
most
credit
hours
these
characteristics were subjects
This step is the most time
most preferred, timing most
consuming. The data preparation
comfortable to the students,
7
steps may consume between 50 and 85
and courses, making the mining
percent of the time and effort of the
algorithms inefficient.
whole knowledge discovery process.
. In many of the fields that were
Here
captured, much of the data was
the
course
details
was
summarized to capture the essential
incomplete or inaccurate.
elements of each students course
selection and persistence behavior
4. BUILD MODEL:
using the following fields:
. The three most common courses
The
preferred by students.
.
The
professors
discovery
analysis
and
predictive modeling were identified as
handling
those
the appropriate activities to build the
subjects.
model. Association rules and factor
. The three most common subjects in
analysis techniques were used for
which the students redeems.
modeling
.
The
subjects
chosen
when
neural
discovery
nets,
rule
analysis
and
induction
and
redeeming.
decision trees were used
. For each of these, we also analyse the
for predictive modeling.
number of classes or lectures attended
regularly by maximum number of
The following tools were used to
students.
.
Consolidation
of
course
mine the data - Clementine, Business
based
Miner
information based on the number of
and
Intelligent
Miner.
Clementine was used for applying
subjects chosen by students in various
neural networks, rule induction and
fields.
association. Business Miner was used
for
building
decision
trees
and
Intelligent miner for doing factor
The following issues in data
analysis and finding associations.
quality were found:
. There were an insufficient number of
attributes captured about the students
8
years.
5. RESULTS:
A
significant
number
of
students with very good marks and a
Category Type:
few averagers took difficult subjects
like
which
zoology,
biotechnology,
The different categories to
thermodynamics etc., but it was seen
the
students
belong
were
that the 15% of the averagers dropped
part
time,
out. The below average students took
residential, lateral, NRI etc., The most
very few courses which were more
important characteristic in which
common than easy . Some of the below
frequent students differed from their
averagers also took popular subjects,
counterparts was by category type.
which they eventually completed in a
NRI and discretionary were found to
longer span of time than the others.
Discretionary,
regular,
take lower than average number of
courses whereas more courses were
6.MODEL
taken by regulars and laterals .
MONITORING
After using the model, one
Applying Type:
should measure how well it has
worked. For example, suppose you
88% of the students applied
their courses through
build a model that identifies people
post, and
who are likely to leave your long
nearly 70% of them joined the college
or university
distance telephone service for another
per year, which was
(known as churn). You know the rate
significantly higher than the other
of churn prior to using the model, and
applying types .
you can predict what the churn rate
will be after you design interventions
intended to keep good customers.
Subject Type:
Notice that it's not the model alone
Most common and popular
but the actions taken based on the
subjects like math, physics, history
model that will determine its success.
etc., were taken mostly, by students
The
with average marks in their previous
9
results
obtained
from
the
educational
institution
application
to collect and prepare the data
were checked against the original
properly and to check models against
database and were found to be
the real world. The "best" model is
significant.
often found after building models of
several different types and by trying
out various technologies or
CONCLUSION
algorithms.
Data mining offers great
The data mining area is still
promise in helping organizations
relatively young, and tools that
uncover hidden patterns in their data.
support the whole of the data mining
With these ability to uncover hidden
process in an easy to use fashion are
patterns in large databases,
rare. However, one of the most
community colleges and universities
important issues facing researchers is
can build models that predict—with a
the use of techniques against very
high degree of accuracy—the behavior
large data sets. All the mining
of population clusters. By acting on
techniques are based on Artificial
these predictive models, educational
Intelligence, where they are generally
institutions can effectively address
executed against small sets of data,
issues ranging from transfers and
which can fit in memory. However, in
retention, etc., However, data mining
data mining applications these
tools must be guided by users who
techniques must be applied to data
understand the business, the data, and
held in very large databases. These
the general nature of the analytical
include use of parallelism and
methods involved. Realistic
development of new database oriented
expectations can yield rewarding
techniques. However, much work is
results across a wide range of
required before data mining can be
applications, from improving revenues
successfully applied to large data sets.
to reducing costs
Only then will the true potential of
data mining be able to be realized.
Building models is only one
step in knowledge discovery. It's vital
10