Download CC: Logistic Regression

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CN6121- Group Coursework
Comparative Study of Various
techniques in AI
Carol Chachati & Myriam Marlin
U1312503 & U1630248
School of Architecture, Computing and
Engineering, University of East London
[email protected], [email protected]
•Explain why they use a particular software to
implement the algorithm
•Write about how the implementation of the
algorithm was done and how the data was
represented
•Results obtained from the simulations of both
learning with the appropriate screen shots
•Detailed analysis of the results, such as
compare, contrast between the results
obtained from the simulations of the two types
of AI techniques
•Conclusion of comparison
1.2. Overview
Abstract: Supervised learning and unsupervised
learning are both different learning curves used
by a machine in order draw a graph which can
bring predictions of new inputs of data.
Supervised uses labelled features, while
unsupervised uses unlabelled features. With
supervised you know what outcome you want,
and values are discreetly either one or another, 1
or 0. Whereas unsupervised features are not
supervised and you rely on the algorithm to give
you an outcome based on the features data you
give it and see what output it can give you. The
decision of which one is better is usually based
on a number of factors and depends on the data
input, amount of it, success of it, and tailoring
with inclusion or absence of features. (Brownlee,
2016) (Stanford University, no date)
Supervised Learning:
1. Introduction:
Regression and Classification:
1.1 Objectives:
Supervised learning techniques usually come
under either regression or classification type
problems. Classification type problems
involve categories so either 0 or, while
regression involves getting an output value
based on some features and rather than
categories it can be any range of number of
the output data type e.g.: input features of
room numbers and such, output is price
which is continuous. (Cord and Cunningham,
2008) (Ng, A., no date)
•To compare supervised and unsupervised
learning
•In pairs do the coursework
•Each person in the group must do a type of
learning, one person supervised, and other
unsupervised
•Each person must use the same dataset to
implement their algorithm depending on what
learning they are using. The dataset you use in
the implementation algorithm
should consist of at least 100 instances, or more
than
200 for a dataset with only a few attributes.
•Each person must discuss why they chose the
particular AI techniques, and discuss why they
have been used for comparison
Supervised learning is when the training data
has labelled features and the output you want
is discrete and you know what outcome you
would like. Supervised learning is used for
classification problems and regression
problems. An algorithm that uses supervised
learning, creates a model from the training
data in which predictions of outputs of new
data can be made based on this model. (Cord
and
Cunningham,
2008)
(Stanford
University, no date) (Ur Réhman, D.S.
(2016)
The supervised learning algorithm that will
be used is Logistic Regression by Carol.
Coefficients:
The coefficients of variables are the base of
how the prediction is made, like Linear
Regression which Logistic Regression is
based on. The coefficients are worked out by
the algorithm, and the coefficients will show
how much impact the variable has on the
final output prediction. (Altman, Gill and
McDonald, 2004)
Cost Function:
The estimated error loss between predicted
output and what prediction you actually
wanted for that data. (Ng, A., no date)
The unsupervised learning algorithm that
will be used is K-means clustering by
Myriam.
•
Apriori, Eclat are used to discover
association rules learning problems
•
K-means,
expectation
Maximisation(EM),
k-Medians
and
hierarchical Clustering are used for
clustering problems
•
Quadratic Discriminant Analysis
(QDA),
Principal
Component
Analysis(PCA) and Sammon Mapping are
some examples of algorithms used for
Dimensionality
Reduction
problems.
Dimensionality Reduction algorithms can
also be adapted to supervised learning
methods (Roweis & Saul, 2000) (Agrawal &
Ramakrishnan, n.d.).
Unsupervised learning:
Unlike supervised learning, unsupervised
learning refers to machine learning where
only the input data is known as there are no
defined output variables. The purpose of
unsupervised learning is to distribute the data
or replicate the underlying structure. This
type of learning enables algorithms to find
and produce the structure in the data or the
distribution on it, to learn more about the data
itself (Brownlee, 2016).
Unsupervised learning problems can be
divided into two groups:
Association and dimensionality reduction
problems
It refers to discovering rules which describe
large parts of the data. For example, an
algorithm design to find rules to describes
large part of the data collected from
customers to predict how people shop –
customer who purchased products X are
likely to buy products Y as well (Lee &
Verleysen, 2010).
Clustering problems
It refers to finding a way to discover the
inherent groups in the data. For example, an
algorithm design to group customers based
on their previous orders (Hartigan & Wong,
1979).
Popular types of unsupervised learning
algorithms:
3. Methodology:
3.1 Chosen Techniques:
Carol Chachati C.C:
Regression
Logistic
Logistic Regression finds the probability of a
category being more likely to be the outcome
than another category based on inputted
independent variables. The number of
categories is limited to 2, classification can
either be one category or the other.
(BioMedware, 2014) The algorithm is based on
linear regression for ‘classification problems’
shown in Figure 1.
Figure 1 Logistic Regression based on Linear
Regression (Sammut and Webb, 2011)
In Figure 2, you can see that B are the
coefficients of the x values which are the
independent variable values.
(Sammut and
Webb, 2011)
Figure 2 Logistic Regression based on Linear
Regression (Sammut and Webb, 2011)
The logistic regression model differs to the
linear regression as it is not for regression
problems, but classification problems. It uses
logs of the input data creating a linear graph
from them. The Letter P at the front for
probability of 1 or 0, such as heart disease or
not heart disease is different to the function
used in linear regression like in figure 1 where
it gives a probable value based on the data, such
as if a house has a certain amount of rooms, the
prediction would be of how much the house
costs for every individual house and there is
more variance in what values are predicted than
just being binary. (Stanford University, no
date) (Sammut and Webb, 2011) The
Equation used for Logistic regression model is
shown in Figure 3.
Maximum Likelihood:
Logistic Regression works on the idea of
maximum likely hood, by which when a value
is predicted it can vary between 0 and 1 and
calculating the best way to do so. As shown
by the logit form of the Logistic Regression
model in figure 5. (Altman, Gill and
McDonald, 2004)
Figure 5 Logit form of Logistic Regression
(Altman, Gill and McDonald, 2004)
The first derivation done by maximum
likelihood is shown in figure 6 , in which
maximum likelihood tries to get the best way
of finding the best prediction by trying to find
the best B , coefficients. The better the
coefficients are figured, the better the output
prediction, if the coefficients are not worked
out properly, the model will not accurately
predict if the output is 0 or 1. (Altman, Gill
and McDonald, 2004)
Figure 3 Mathematical Logistic Regression
equation (BioMedware, 2014)
By which P is the probability is the logit of
odds of an event i.e. one category or another is
what will happen if the event will happen or
not like odds (event) shown in Figure
4.(faculty.cas.usf.edu, 2016) The regression
coefficients gives the x variable values their
weight to the probability of an event outcome
. The x value represents all the x values up to
the last x variable used in calculating the
probability occurring. All of this is totalled as
shown by the total sign, with M signifying the
set of data inputted. (BioMedware, 2014)
(Statistics Solutions, 2016)
Figure 4 Logistic Regression equation in
words (Statistics Solutions, 2016)
Figure 6 Derivation by maximum likelihood
(Altman, Gill and McDonald, 2004)
Decision Boundary:
Since predictions are varied between 0 and 1 a
decision boundary has to be decided for the
data. For example if the decision boundary
value is equal to or below 0.5 that means
anything below 0.5 is classed as 0 and
anything with a value higher than 0.5 is
1.Since datasets vary, it is important to decide
on a good decision boundary as datasets
values and problems vary so a more suitable
decision boundary value should be decided on
so that when analysing of the classification
occurs ,testing data that has been classified
well can actually be given a better accuracy
output than if the decision boundary was
different and vice versa. (Stanford University,
no date)
Convergence and the data:
In order for data predictions to be good,
Logistic Regression must
converge.Convergance is when the algorithm
reaches a global maximum, and predicted
values are predicted around this only, which is
a good sign of showing that the algorithm was
able to find the best way of predicting the
odds of the classifier in the model. If too many
local maximum points are found by the
algorithm if a global maximum is not found,
although this should not since the log output
should be concave i.e. a bell shape graph,
values maybe predicted towards it, rather than
more ambiguously. Convergence can also fail
because of the way data is split into training
and testing data itself. If data is not
randomised, data splitting or cross validation
are not used the model will not be able to
support a variety of different data values.
(Altman, Gill and McDonald, 2004)
C.C – end
Myriam Marlin M.M: Kmeans
clustering
The K means clustering approach is the most
commonly used method for K-means cluster
analysis. It is a simple algorithm designed to
solve clustering problem by classifying a given
dataset into a k number of clusters fixed
Apriori. The principle of the algorithm is to
assign k centroids to each of data point based
on their closeness. Once each data points are
assigned, the algorithm recalculates the new
centroids based on the average of the data
points of a cluster. For example, centroids can
be p-length mean vectors where p represents
the number of variables. If data points are still
pending, the process is repeated until all the
observation are reassigned or reached.
Kmean clustering advantages
K mean clustering has many advantages such
as:
It is easy to understand as well as robust
It is fast and efficient
It provides the best results when datasets are
distinct and well separated from one another.
K mean clustering is designed to minimise the
risk of a squared error function also known as
an objective function using the algorithm
below:
Where
represents the Euclidean
distance, which is the chosen distance between
the cluster centre
and the data point
And n represents distance of n data points with
their assigned cluster centres
M.M – end
3.2 Reasons for comparing
algorithms:
The coursework says to choose two AI
techniques for comparison, one supervised and
one unsupervised. Choosing logistic regression
as the supervised technique was based on the
available algorithms on weka for the dataset
and had a higher prediction accuracy in weka
then some other algorithms available to try.
Also the technique is suitable for binary
classification problems that have several
independent variables.
Kmeans clustering algorithm was selected for
the comparison for the fact that it is also an
technique suitable for binary classification and
also because of the toolbox we can use to apply
it. The algorithm can be run in Rstudio. Unlike
Weka, R programming language like Python
and MATLAB can import libraries and ML
package. Weka has his own ML package and is
less flexible than other tools and software used
for data exploration and statistical analysis.
Like Python programming the R provides
freedom to transform, clean and explore
datasets as well as offering ways to tweak and
tune the algorithms. Even though Weka is an
education oriented tool it is less performant for
data science and offer less room to improve
coding
skills.
Moreover,
R
offers
comprehensive facilities to handle the
methodology available for partitioning
clustering.
4. Simulations:
4.1 Introduction to dataset
The dataset selected to test the two algorithms
used to compare supervised and unsupervised
machine learning was found on the University
California Irvine repository (UCI) which is a
website that contains a large database of
datasets. The dataset we used for both
simulations is available at:
http://archive.ics.uci.edu/ml/datasets/Statlog+(
Heart).
It is a heart disease dataset which contains 270
instances and 14 variables. The class is the
presence (2) or absence (1) of heart disease.
The data types are numerical values and the
attributes are a mix of Real, Ordered, Binary
and Nominal types.
The attributes information is as follow:
1. age
2. sex
3. chest pain type (4 values)
4. resting blood pressure
5. serum cholestoral in mg/dl
6. fasting blood sugar > 120 mg/dl
7. resting electrocardiographic results (values
0,1,2)
8. maximum heart rate achieved
9. exercise induced angina
10. oldpeak = ST depression induced by
exercise relative to rest
11. the slope of the peak exercise ST segment
12. number of major vessels (0-3) coloured by
fluoroscopy
13. thal: 3 = normal; 6 = fixed defect; 7 =
reversible defect
algorithms using it was less than for R and
MATLAB. Also there were many
terminologies to understand in order to
understand output. Also on the standard
algorithms that you could try out on your
data, for example multinomial logistic
regression was available as well as logistic
regression but if the encoding did not
produce the same values as the classification
of these standard algorithms it would be
down to what classification calculations
were used to do so, and it was difficult to
find this out. Algorithms could fit on the
dataset, but not necessarily make sense to
such as multinomial logistic regression,
while available, there are only 2 categories
for the heart disease dataset so it didn’t seem
right to use multinomial logistic regression
that normally takes more than 2.
4.2 Input encoding/ input
representation:
C.C: Logistic Regression
First the data from UCI repository is
converted to csv format in excel and it can
then be read into R. class is made into a
factor so that R can split up the class1
which is the response variable into
categories which it will understand so for
the program, 0 is for present and 1 for
absent. ‘contrasts’ is used to confirm this.
(Alice, 2015)
@attribute class {1 2} absent or present
4.2 Reasons why the particular
software or tool was chosen:
R studio and R were used instead of weka
and matlab.This is because R is easy to
understand and is used for in industry, it
provides an easy to use interface and much
quicker to figure out. Whereas MATLAB,
while it had good classification facilities, the
encoding has seemingly overly flexible
possibilities which are harder to put
together. As for weka, it was hard to
understand the classification output, while
there is a command line interface for weka,
ready information of how to encode
Code reference (Alice, 2015) (Rai, 2015)
The contrast output in Figure A shows
Present being the reference. So Present is
represented as 1 in R while Absence of heart
disease is represented as 0.
Figure A contrast output (Chachati, 2017)
Code reference (Alice, 2015) (Rai, 2015)
Coefficient analysing: Any variables with
stars next to them means they have a
significant effect on predictability, so any
variable with no star can be taken out from
the model later on. (Alice, 2015) (Rai, 2015)
Summary(themodel) output
Code reference (Alice, 2015) (Rai, 2015)
Data is randomised, seeding enables the
same training and testing test every time you
run the script and split of 70% for building
model to 30% used for testing. (Alice, 2015)
(Rai, 2015)
Using themodel <-glm(class ~.,data=tdata)
so all the 13 independent variables are
included :
‘glm’ is the function used to do logistic
regression along with choosing ‘binary’ for
binary logistic regression. A prediction
model is created. You can see in the ‘glm’
function I took out some of the 13
independent variables, leaving in 8, since
the earlier summary model, the coefficient
showed the left out variables did not have
much significance to predicting the response
outcome. (Alice, 2015) (Rai, 2015)
With just 8 variables specified:
set to NULL. When the head of the dataset is
displayed on the console, the class column is
removed.
You can see most of the variables now that
have a significant impact on the final
prediction response.
C.C – end
M.M: Kmeans clustering
Once the dataset is ready, the first thing to do
is to prepare it so it can be used in R. To do
so the dataset is extracted on notepad ++ and
saved as a text file which will be then opened
in Excel to add the columns labels before
saving it in Comma Separated Values (CSV).
The dataset can then be imported to Rstudio
using the import dataset button in the
environment panel.
Fig 2. Remove the class column from dataset in
Rstudio
In order to obtain the same result every time
a function is applied to the dataset, we use the
function set.seed(1234) to keep the variable
fixed by setting the seed of the generator of
random number in R so results can be
reproduced.
The data ‘s variables need to be rescale for
better comparability. To do so, the
commands below are used in R:
#Prepare Data
heart1 <- na-omit(heart1) # delete missing data
heart1 <- scale(heart1) # standardise variables
Fig 1. Importing dataset to Rstudio
As the k means algorithm is designed to
group the observations into clusters, we need
to remove the class column from the original
dataset. To do so we used the code to create
a copy of the dataset with the class column
As the use of the k-mean algorithm in R
requires us to specify how many clusters
need to be extracted, we can use a method in
R called “Elbow” which consist into looking
for a significant elbow (bend) in the sum of
squared error (SSE) using the function
below:
The result obtained when the function was
run shows that the number of clusters for the
dataset is 2 as shown below.
Fig 3. Result of plot to suggest the number of cluster
for kmeans in Rstudio
present in the dataset e.g. observation
number 5 is placed in the cluster 2 based on
most of its data being close to the cluster
means calculated for each variables of the
cluster 2
The result also display the within cluster sum
of squares by cluster which represents the
multiplication of the squared distance of each
mean to the global mean by the number of
data points it illustrates. The available
components are also displayed in the results
which can be used to carry out further
analysis such as see the results per size or per
cluster by using the command illustrated in
Fig5.
We can also use the 2 values in the class = 1
(absent) and 2 (present) to set the number of
cluster to two. We can now apply the kmeans
function to the dataset as shown below.
Fig 5. Results based on size and cluster components
in Rstudio
5. Results obtained:
C.C
5.1 Logistic Regression results
For the Logistic regression these are the
results obtained.
Fig 4. K-means results in Rstudio
The return results display the cluster size
(230, 40) as well as the means calculate for
each variables of the dataset. The clustering
vector shows the cluster of each observation
Prediction output: Matrix and the
misclassification error.
defined in the original dataset. This table will
allow us.to analyse the performance of the
classifier (classification model) on the
dataset that contains the true values also
known as a confusion matrix.
Fig 6. Confusion matrix table from Data School.
Code reference (Alice, 2015) (Rai, 2015)
The class output: with 2 being the reference
ie Present. So Present is represented as 1 in
R. While Absence of heart disease is
represented as 0. For the matrix description,
34 were predicted to be correctly classed
absent and 5 were incorrectly classed
present. While 6 were predicted to be
incorrectly classed absent and 29 were
correctly classed present.(DBD, 2017)
Misclassification error is then calculated. It
shows an output of 0.1486486 which means
this is what was incorrectly classified so this
means, there was a 0.8513514 accuracy
i.e. approximately 85% accuracy which
means the logistic regression worked well.
C.C – end
M.M
5.2: Kmeans clustering results
To provide a deeper analysis of the result, we
produced a table that will compare the result
based of the clusters to the class variable
The confusion matrix allows to easily
compare the following terminology:
 true positives (TP) represent the
number of patients that we predicted
as patients that do not have heart
disease and the actual data shows that
the patients are heart disease free.
 true negatives (TN) represent the
patients that do have heart disease
that we also predicted as patient with
the disease.
 false positives (FP) represents the
patients we predicted as patient with
heart disease that are disease free.
This is also referred as Type I error.
 false negatives (FN) represents the
patients we predicted as disease free
that have been diagnosed with heart
disease (Data School, 2017).
Based on the confusion matrix table in Fig.6,
we have two predicted classes in the dataset;
> plot(heart[c("resting_BP","serum_cholestoral")], col =
heart$class)
> plot(heart[c("resting_BP","serum_cholestoral")], col =
results$cluster)
“0” for the patient that do not have a heart
disease and the class “1” for patient
diagnosed with a heart disease. The classifier
contains a total of 270 observations related to
patient tested for heart diseases. Now let
create the confusion matrix as follow:
PLOT CLASS
PLOT CLUSTER
Fig 4. Confusion matrix in Rstudio
The confusion matrix shows that 23 patients
were classified in cluster two (present) when
they did not have the disease and 103 patients
with the disease were assigned to cluster 1
(absent). Based on the results, the confusion
matrix provides a list of rate that are usually
computed such as accuracy and the
misclassification rate as follow:
Accuracy which represents the number of
time the classifier was correct using
(TP+TN)/Total = (127+17)/ 270 = 0.86
Misclassification Rate represents how many
often the classification was wrong using
(FP+FN)/Total= (103+23)/270= 0.46
Based on the calculated rates, the accuracy
rate is acceptable but the misclassification is
too high for the size of the dataset. Overall,
the performance of the classifier is poor.
Next, we can also produce a graph using the
plot command to allow us to have a visual
comparison of the results. For the purpose,
we will compare the graphs using the resting
blood pressure variable with the serum
cholesterol variable of the class attribute in
the original dataset to the one produced by
the clustering results. To do so we used the
commands bellow:
Fig 5. Plots of the serum cholesterol vs resting BP.
Top graph is based on the class of the original
dataset – Bottom graph based on the cluster results.
The graph comparison allows us to notice
that the results are very similar in shape
however some observations are overlapping
and were assigned to the wrong cluster.
These discrepancies could be caused by the
fact that the Euclidean distance calculated
have unequally weighted the underlying
factors. Moreover, cluster centres that are
chosen randomly can lead to the wrong
result.
6. Critical Analysis of results
6.1 Comparison of results:
The k-means worked on most of data set at
once, while logistic regression used a model
and then worked on a training set. Both
algorithms produced ok accuracy, but
Logistic Regression turned out better as the
misclassification error was 85% while kmeans misclassification rate was over
40%.Both algorithms had their pros and
cons, k-means had overlapping in the
classifying which would contribute to error
while Logistic Regression had error
depending on the coefficient values. Across
accuracy, misclassification rate, tp rate, fp
rate, specificity, precision and prevalence
Logistic Regression was better.
1
2
absent
present
Tn =
127
Fn =
103
129
Fp = 23
150
Tp = 17
110
110
Accuracy = (tp+tn)/total = 0.86
misclassification rate = fp+fn/total = 0.46
Tp rate = 17/110 = 0.16
Fp rate = 23/150 = 0.15
Specificity
tn/actual no = 127/150 = 0.8
precision
tp/predicted yes = 100/110 = 0.9
Overall, the k-means algorithm applied to the
dataset for clustering analysis did not provide
us with satisfactory results as the
performance of the classifier is poor. Despite,
the high accuracy rate obtained through the
confusion
matrix
results,
the
misclassification rate is too high to conclude
that the k-means technique was a success.
Moreover, the comparison of the rate
obtained through the k-means algorithm and
the logistic regression algorithm shows that
the later had a better accuracy and
misclassification rate than the k-means
algorithms.
M.M –end
prevalence
C.C
actual yes / total = 110/260 = 0.4
(Data, 2014)
absent
present
7. Conclusion
M.M
1 absent
tn = 34
Fn = 6
40
2 present
Fp = 5
Tp= 29
34
39
35
Accuracy = (tp+tn)/total = (29+34)/74 =
0.85
Misclassification rate = fp+fn/total =
(5+6)/74 = 0.15
Tp rate = 29/35 =0.83
Fp rate = 5/39 = 0.13
Specificity
tn/actual no = 34/39 = 0.87
The two algorithms classified the data in
different ways and while one used the whole
data set, the other split the data with
predictions for a training data. If I were to
do the coursework again, I would do more
tests including variables and excluding
variables along with comparison to doing
the same for k-means. This would give a
better overview of which algorithm better
worked for the dataset. Also I would have
tried other things such as cross validation
and using different types of classifiers to test
the model, and the same would be for kmeans.
C.C –end
8. References:
Precision
Tp/predicted yes = 29/34 = 0.85
Prevalence
Actual yes / total = 35/74 = 0.47
(Data, 2014)
Agrawal, R. & Ramakrishnan, S., n.d. Fast
Algorithms for Mining Association Rules,
San Jose: IBM Almaden Research Center .
Alice, M. (2015) How to perform a logistic
regression in R. Available at: https://www.rbloggers.com/how-to-perform-a-logisticregression-in-r/ (Accessed: 6 January
2017).
Altman, M., Gill, J. and McDonald, M.
(2004). Numerical issues in statistical
computing for the social scientist. 1st ed.
Hoboken, NJ: Wiley, pp.238-248.
on/Logistic.html (Accessed: 5 January
2017).
BioMedware, (2014). A general equation
for logistic regression. [image] Available at:
https://www.biomedware.com/files/docume
ntation/spacestat/Statistics/Multivariate_Mo
deling/Regression/logistic_gen2.jpg
[Accessed 30 Dec. 2017].
Hartigan, J. A. & Wong, M. A., 1979.
Algorithm AS 136: A K-Means Clustering
Algorithm. Journal of the Royal Statistical
Society. Series C (Applied Statistics), 28(1),
pp. 100-108.
BioMedware, (2014). BioMedware
SpaceStat Help - About Aspatial Logistic
Regression. [online] Biomedware.com.
Available at:
https://www.biomedware.com/files/docume
ntation/spacestat/Statistics/Multivariate_Mo
deling/Regression/About_Aspatial_Logistic
_Regression.htm [Accessed 30 Dec. 2016].
Brownlee, J. (2016) Supervised and
Unsupervised machine learning Algorithms.
Available at:
http://machinelearningmastery.com/supervis
ed-and-unsupervised-machine-learningalgorithms/ (Accessed: 30 December 2016).
Cord, M. and Cunningham, P. (2008)
Machine learning techniques for multimedia
case studies on organization and retrieval.
Berlin: Springer Verlag Berlin Heidelberg.
(Chapter 2 & 3)
Data School, 2017. Simple guide to
confusion matrix terminology. [Online]
Available at:
http://www.dataschool.io/simple-guide-toconfusion-matrix-terminology/(Accessed
3rd January 2017).
DBD, U. (2017). Confusion Matrix. [online]
Www2.cs.uregina.ca. Available at:
http://www2.cs.uregina.ca/~dbd/cs831/notes
/confusion_matrix/confusion_matrix.html
[Accessed 5 Jan. 2017].
faculty.cas.usf.edu (2016) Logistic
regression. Available at:
http://faculty.cas.usf.edu/mbrannick/regressi
Lee , J. A. & Verleysen, M., 2010.
Unsupervised Dimensionality Reduction:
Overview and Recent Advances. Barcelona,
WCCI 2010 IEEE World Congress on
Computational Intelligence.
Rai, B. (2015). Logistic Regression with R:
Categorical Response Variable at Two
Levels. [video] Available at:
https://www.youtube.com/watch?v=xrAg3F
LQ0ZI&feature=youtu.be [Accessed 1 Jan.
2017].
Roweis, S. & Saul, L. K., 2000. Nonlinear
Dimensionality Reduction by Locally Linear
Embedding, New York: Science.
Sammut, C. and Webb, G. (2011).
Encyclopedia of machine learning. 1st ed.
New York: Springer, p.631.
Statistics Solutions (2016) Logistic
regression. Available at:
http://www.statisticssolutions.com/regressio
n-analysis-logistic-regression/ (Accessed: 5
January 2017).
Stanford University (n.date) Supervised
Learning [MOOC]. Available at:
https://www.coursera.org/learn/machinelearning/supplement/NKVJ0/supervisedlearning (Accessed: 9 November 2016).
Stanford University (n.date) Unsupervised
Learning [MOOC]. Available at:
https://www.coursera.org/learn/machinelearning/lecture/olRZo/unsupervisedlearning (Accessed: 9 November 2016)
Stanford University (no date) Supervised
Learning. [MOOC]. Available at:
https://www.coursera.org/learn/machinelearning/supplement/NKVJ0/supervisedlearning (Accessed: 9 November 2016).
ur Réhman, D.S. (2016) Data Driven
Machine Learning (Part 1). Available at:
https://moodle.uel.ac.uk/pluginfile.php/7898
08/mod_resource/content/3/machine%20lear
ning%20intro-lec%201.pdf (Accessed: 09
November 2016).