Download An Introduction to Linear Discriminants for Classification

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Saethre–Chotzen syndrome wikipedia , lookup

History of genetic engineering wikipedia , lookup

Public health genomics wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Genome evolution wikipedia , lookup

Gene wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Genome (book) wikipedia , lookup

Gene desert wikipedia , lookup

The Selfish Gene wikipedia , lookup

Helitron (biology) wikipedia , lookup

Gene therapy wikipedia , lookup

NEDD9 wikipedia , lookup

Gene nomenclature wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Oncogenomics wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Microevolution wikipedia , lookup

Gene expression profiling wikipedia , lookup

Designer baby wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression programming wikipedia , lookup

Transcript
A Short and Simple Introduction
to Linear Discriminants
(with almost no math)
Jennifer Listgarten, November 2002.
Introduction
• A linear discriminant is a group of mathematical
models that allows us to classify data (like
microarray) into preset groups (eg. cancer vs. noncancer, metastatic vs. non metastatic, respond well to drug
vs. poorly to drug)
• ‘Discriminant’ simply means that it has the ability
to discriminate between two classes.
• The meaning of the word ‘linear’ will become
clearer later.
Motivation I
• Spoke previously at great length about common
clustering methods for microarray data
(unsupervised learning).
• Supervised techniques are much more
powerful/useful.
• Linear discriminants (supervised method) are one
of the older, well studied supervised techniques,
both in traditional statistics and machine learning.
Motivation II
• Linear discriminants are widely used today in many
application domains, including the modeling of
various types of biological data.
• Many classes or sub-classes of techniques are
actually linear discriminants (eg. Artificial Neural
Networks, Fisher Discriminant, Support Vector
Machine and many more).
• Provides very general framework upon which much
has been built i.e. can extend to very sophisticated,
robust techniques.
eg. Classifying Cancer Patients vs.
Healthy Patients from Microarray
Patient_X= (gene_1, gene_2, gene_3, …, gene_N)
Cancerous
Healthy
N (number of dimensions) is normally larger
than 2, so we can’t visualize the data.
eg. Classifying Cancer Patients vs.
Healthy Patients from Microarray
Up-regulated
For simplicity,
pretend that
we are only
looking at
expression
levels of 2
genes.
Gene_2 expression level
5
Cancerous
0
Healthy
-5
Down-regulated
-5
0
Gene_1 expression level
5
eg. Classifying Cancer Patients vs.
Healthy Patients from Microarray
Question:
How can we
build a
classifier for
this data?
Gene_2 expression level
5
Cancerous
0
Healthy
-5
-5
0
Gene_1 expression level
5
eg. Classifying Cancer Patients vs.
Healthy Patients from Microarray
IF
gene_1 <0 AND
gene_2 <0
THEN person=healthy
IF
gene_1 >0 AND
gene_2 >0
THEN person=cancerous
Gene_2 expression level
Simple
Classification Rule:
5
Cancerous
0
-5
Healthy
-5
0
Gene_1 expression level
5
eg. Classifying Cancer Patients vs.
Healthy Patients from Microarray
Simple
Classification Rule:
IF
gene_1 <0 AND
gene_2 <0 AND
…
gene 5000 < Y
If we move away from our simple
example with 2 genes to a realistic
case with say 5000 genes, then
1. What will these rules look like?
2. How will we find them?
THEN person=healthy
IF
gene_1 >0 AND
gene_2 >0
…
gene 5000 >W
THEN person=cancerous
Gets a little complicated, unwieldy…
eg. Classifying Cancer Patients vs.
Healthy Patients from Microarray
SIMPLE RULE:
•If data point lies to the
‘left’ of the line, then
‘healthy’.
•If data point lies to
‘right’ of line then
‘cancerous’
5
Gene_2 expression level
Reformulate the
previous rule
Cancerous
0
Healthy
-5
-5
0
5
Gene_1 expression level
It is easier to generalize this line to 5000 genes than it
is a list of rules. Also easier to solve mathematically.
More Than 2 Genes (dimensions) ?
Easy to Extend
•Line in 2D: x1C1 + x2C2 = T
•If we had 3 genes, and needed to build a
‘line’ in 3-dimensional space, then we
would be seeking a plane.
Plane in 3D: x1C1 + x2C2 + x3C3 = T
•If we were looking in more than 3
dimensions, the ‘plane’ is called a
hyperplane. A hyperplane is simply a
generalization of a plane to dimensions
higher than 3.
Hyperplane in N-dimensions:
x1C1 + x2C2 + x3C3 + … + xNCN = T
5
Cancerous
0
Healthy
5
5
0
5
eg. Classifying Cancer Patients vs.
Healthy Patients from Microarray
Why is it called ‘linear’?
The rule of ‘which side is
the point on’, looks,
mathematically like:
gene1*C1 + gene2*C2 > T
then cancer
gene1*C1 + gene2*C2 < T
then healthy
It is linear in the input (the
gene expression levels).
Gene_2 expression level
5
<T
Cancerous
>T
0
Healthy
-5
-5
0
Gene_1 expression level
5
Linear Vs. Non-Linear
gene1*C1 + gene2*C2 > T
gene1*C1 + gene2*C2 < T
gene12*C1 + gene2*C2 > T
gene12*C1 + gene2*C2 < T
1/[1+exp-(gene1*C1 + gene2*C2 +T)] < 0
1/[1+exp-(gene1*C1 + gene2*C2 +T)] > 0
gene1*gene2*C > T
gene1*gene2*C < T
‘logistic’ linear discriminant
Mathematically, linear problems
are generally much easier to solve
than non-linear problems.
Back to our Linear Discriminant
There are actually
many (infinite) lines
that ‘properly’ divide
the points.
5
0
5
5
0
5
Which is the
correct one?
One solution (that SVMs use):
1. Find line that has the all data points on the proper side.
2. Of all lines that satisfy (1), find the one that maximizes the
‘margin’ (smallest distance between any point and line).
3. This is called ‘Constrained Optimization’ in mathematics.
largest margin
5
5
0
0
5
smaller margin
5
0
5
margin
5
-
0
5
5
margin
Obtaining Different ‘Lines’:
Objective Functions
• In general, the line that you end up with
depends on some criteria, defined by the
‘Objective Function’ (for SVM, the margin)
• An ‘Objective Function’ is chosen by the
modeler, and varies depending on exactly what
the modeler is trying to achieve or thinks will
work well (eg margin, posterior probabilities, sum of squares
error, small weight vector).
• The function usually has a theoretical
foundation (eg. risk minimization, maximum
likelihood/gaussian processes/zero mean gaussian noise).
What if the data looked like this?
Depends…
•Is it just a few points
that are small
‘outliers’?
•Or is the data simply
not amenable to this
kind of classification?
5
Gene_2 expression level
How could we build a
suitable line that
divides the data nicely?
Cancerous
0
Healthy
-5
-5
0
Gene_1 expression level
5
Not linearly separable
data.
Healthy
Almost linearly
separable data.
Linearly separable
data.
Cancerous
Cancerous
Healthy
Inherently, the data
cannot be separated
by any one line.
A few outliers –
probably can still
find a ‘good’ line.
Can make a great
classifier.
Not linearly separable data.
•If we allow the model to
have more than one line (or
hyperplane), then maybe we
can still form a nice model.
5
Cancerous
Healthy
•Much more complicated.
•This is one thing that
neural networks allow us to
do: combine linear
discriminants together to
form a single classifier (no
longer a linear classifier).
•No time to delve further
during this talk.
0
Cancerous
Healthy
5
-
0
5
Inherently, the data
cannot be separated
by any one line.
5
Not linearly separable data.
5
Now what??
Even with many lines it
would be extremely
difficult to build a good
classifier.
0
5
5
0
5
Sometimes Need to Transform the Data
Linearly separable data.
Not linearly separable data.
polar
coordinates
0
5
Distance from center (radius)
Need to transform the coordinates: polar coordinates,
Principal Components coordinates, kernel transformation into
higher dimensional space (support vector machines).
Caveats
• May need to find a subset of the data that is
linearly separable (called feature selection).
• Feature selection is what we call in computer
science, an NP-complete problem, which means,
in layman’s terms: impossible to solve exactly.
Feature selection is an open research problem.
• There are a spate of techniques that give you
approximate solutions to feature selection.
• Features selection is mandatory in microarray
expression experiments because there is so
much noisy, irrelevant data.
• Also, with microarray data, there is much missing
data – introduces difficulties.
Other Biological Applications
• Gene finding in DNA: (input is part of DNA strand,
output is whether or not nucleotide at centre is inside of a
gene).
• Sequence-based gene classification: the input is a gene
sequence, output is a functional class.
• Protein secondary structure prediction: input is a
sequence of amino acids, output is the local secondary
structure.
• Protein localization in cell: the input is an amino acid
sequence, the output is position in the cell (eg. nucleus,
membrane, etc.)
Taken from Introduction to Support Vector Machines and Applications to
Computational Biology, Jean Philippe Vert
Wrap-Up
• Intuitive feel for linear discriminants.
• Widely applicable technique – for many
problems in Polyomx and many other areas.
• Difficulties: missing data, feature selection.
• Have used linear discriminants for our SNP
data and microarray data.
If interested in knowing more, great book:
Neural Networks for Pattern Recognition, Christopher Bishop, 1999.
Finding the Equation of the Linear Discriminant
(How a Single Layer Neural Network Might Do It)
The discriminant function:
y ( x)  0
y (x)  w x  w0
T
y ( x)  0
y ( x)  0
Eg. Sum-of-squares error function
(more for regression):
N
E (w )   (w T x n  w0 )  t n ) 2
(t n {1,1})
n 1
Minimize objective function
1.
Exact solution via matrix algebra since here E is convex.
2.
Iterative algorithms (gradient descent, conjugate gradient,
Newton’s method, etc.) for cases where E may not be convex.
 E E
E
E (w)  
,
,...,
wK
 w1 w2

  0

w
Can regularize by
adding in ||w||2 to E.
Finding the Equation of the Linear Discriminant
(How an SVM would do it.)
The discriminant function:
y ( x)  0
y (x)  w T x  w0
y ( x)  0
The margin is given by:
y ( x)  0
w
wT x
 0
|| w ||
|| w ||
Minimize ||w||2 subject to the following constraints:
i  1...K , ti (w T x  w0 )  1  0
(t n {1,1})
Use Lagrange Multipliers
N
L(w, w0 , λ ) || w ||2  i [ti (w T x  w0 )  1]
i 1
w
wT x
 0
|| w ||
|| w ||