Download Slide 1

Document related concepts

Forecasting wikipedia , lookup

Regression toward the mean wikipedia , lookup

Data assimilation wikipedia , lookup

Least squares wikipedia , lookup

Choice modelling wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Conventional Data Mining
Techniques II
Computers are useless. They can only
give you answers. –Pablo Picasso
Pablo Picasso
PowerPoint permissions
Cengage Learning Australia hereby permits the usage and posting of our copyright controlled PowerPoint slide content for all
courses wherein the associated text has been adopted. PowerPoint slides may be placed on course management systems that operate
under a controlled environment (accessed restricted to enrolled students, instructors and content administrators). Cengage Learning
Australia does not require a copyright clearance form for the usage of PowerPoint slides as outlined above.
A B M Shawkat Ali
Copyright © 2007 Cengage Learning Australia Pty Limited
1
My Request
“A good listener is not only popular everywhere,
but after a while he gets to know something”
- Wilson Mizner
Association Rule Mining
PowerPoint permissions
Cengage Learning Australia hereby permits the usage and posting of our copyright controlled PowerPoint slide content for all
courses wherein the associated text has been adopted. PowerPoint slides may be placed on course management systems that operate
under a controlled environment (accessed restricted to enrolled students, instructors and content administrators). Cengage Learning
Australia does not require a copyright clearance form for the usage of PowerPoint slides as outlined above.
Copyright © 2007 Cengage Learning Australia Pty Limited
Objectives
On completion of this lecture you should know:
• Features of association rule mining.
• Apriori: Most popular association rule mining
algorithm.
• Association rules evaluation.
• Association rule mining using WEKA.
• Strengths and weaknesses of association rule
mining.
• Applications of association rule mining.
Association rules
• Affinity Analysis
• Market Basket Analysis: Which products go
together in a basket?
– Uses: determine marketing strategy, plan
promotions, shelf layout.
• Looks like production rules, but more than one
attribute may appear in the consequent.
– IF customers purchase milk THEN they
purchase bread AND sugar
Transaction data
Table 7.1. Transactions Data
Transaction ID
Itemset or Basket
01
{‘webcam’, ‘laptop’, ‘printer’}
02
{‘laptop’, ‘printer’, ‘scanner’}
03
{‘desktop’, ‘printer’, ‘scanner’}
04
{‘desktop’, ‘printer’, ‘webcam’}
Concepts of association rules
Rule for Support:
• The minimum percentage of instances in the
database that contain all items listed in a
given association rule.
Example
• 5,000 transactions contain milk and
bread in a set of 50,000
• Support => 5,000 / 50,000 = 10%
Concepts of association rules
Rule for Confidence:
Given a rule of the form “If A then B”, rule for
confidence is the conditional probability that B
is true when A is known to be true.
Example
• IF customers purchase milk THEN they also
purchase bread:
– In a set of 50,000, there are 10,000
transactions that contain milk, and 5,000
of these contain also bread.
– Confidence => 5,000 / 10,000= 50%
Parameters of ARM
1. To find all items that appears frequently in
transactions. The level of frequency of appearance
is determined by pre-specified minimum
support count.
Any item or set of items that occur less frequently
than this minimum support level are not included for
analysis.
2. To find strong associations among the frequent
items.
The strength of the association is quantified by
the confidence.
Any association below a pre-specified level of
confidence is not used to generate rules.
Relevance of ARM
• On Thursdays, grocery store consumers often
purchase diapers and beer together.
• Customers who buy a new car are very likely to
purchase vehicle extended warranty.
• When a new hardware store opens, one of the
most commonly sold items is toilet fittings.
Functions of ARM
• Finding the set of items that has significant
impact on the business.
• Collating information from numerous
transactions on these items from many
disparate sources.
• Generating rules on significant items from
counts in transactions.
Single-dimensional association
rules
Table 7.2 Boolean form of a transaction data.
Transaction ‘webcam’ ‘laptop’ ‘printer’ ‘scanner’ ‘desktop’
id
01
1
1
1
0
0
02
0
1
1
1
0
03
0
0
1
1
1
04
1
0
1
0
1
(cont.)
Multidimensional association
rules
General considerations
• We are interested in association rules that show
a lift in product sales where the lift is the result of
the product’s association with one or more other
products.
• We are also interested in association rules that
show a lower than expected confidence for a
particular association.
Itemset
‘Webcam’
Supports in %
50%
‘Laptop’
‘Printer’
‘Scanner’
‘Desktop’
{ ‘webcam’, ‘laptop’}
{‘webcam’, ‘printer’ }
{‘webcam’, ‘scanner’}
{‘webcam’, ‘desktop’}
{‘laptop’, ‘printer’}
{‘laptop’, ‘scanner’}
{‘laptop’, ‘desktop’}
{‘printer’, ‘scanner’}
{‘printer’, ‘desktop’}
{‘scanner’, ‘desktop’}
{‘webcam’, ‘laptop’, ‘printer’}
{‘webcam’, ‘laptop’, ‘scanner’}
{‘webcam’, ‘laptop’, ‘desktop’}
{‘laptop’, ‘printer’, ‘scanner’}
{‘laptop’, ‘printer’, ‘desktop’}
{‘printer’, ‘scanner’, ‘desktop’}
{‘webcam’, ‘laptop’, ‘printer’, ‘scanner’, desktop’}
50%
100%
50%
50%
25%
50%
00%
25%
50%
25%
00%
50%
50%
25%
25%
00%
00%
25%
00%
25%
00%
Enumeration tree
W
WL
L
P
WP
WS
WLS
WLD WPS
WLP
WLPS
WLPD
WD
S
D
LP
LS
LD
PS
PD
SD
WPD
WSD
LPS
LPD
LSD
PSD
WLSD
WPSD
LPSD
WLPSD
Figure 7.1 Enumeration tree of transaction items of Table 7.1. In the
left nodes, branches reduce by 1 for each downward progression
– starting with 5 branches and ending with 1 branch, which is typical
Association models
nC
k
= The number of combinations of n things
taken k at a time.
Two other parameters
• Improvement (IMP) =
Support(Antecedent & Consequent)
Support(Antedecent)  Support(Consequent)
• Share (SH) =
where LMV = local measure value and TMV is
total measure value.
LMV ( X , G )
SH ( X i , G ) 
i
TMV
IMP and SH measure
Table 7.4. Market transaction data
Transaction
ID
T1
T2
T3
T4
T5
‘Yogurt’
(A)
2
0
5
3
0
‘Cheese’
(B)
0
3
2
10
0
‘Rice’
(C)
5
0
20
0
10
‘Corn’
(D)
10
5
0
12
13
(cont.)
Table 7.5 Share measurement
Itemset
A
B
C
D
AB
AC
AD
BC
BD
CD
ABC
ABD
ACD
BCD
ABCD
‘Yogurt’(A)
LMV
SH
10
0.10
‘Cheese’(B)
LMV
SH
15
‘Rice’(C)
LMV
SH
0.15
35
8
7
5
0.08
0.07
0.05
12
5
3
2
0.05
0.03
0.02
2
10
0.35
40
0.40
22
0.22
17
23
0.17
0.23
12
10
0.12
0.10
0.12
25
2
13
‘Corn’(D)
LMV
SH
0.02
0.13
0.02
0.10
20
15
20
5
0.25
0.20
0.15
0.20
0.05
Itemset
LMV
SH
10
0.10
15
0.15
35
0.35
40
0.40
20
0.20
32
0.32
27
0.27
22
0.22
30
0.30
38
0.38
27
0.27
25
0.25
17
0.17
0
0.00
0
0.00
Taxonomies
• Low-support products are lumped into bigger
categories and high-support products are broken
up into subgroups.
• Examples are: Different kinds of potato chips
can be lumped with other munchies into snacks,
and ice cream can be broken down into different
flavours.
Large Datasets
• The number of combinations that can generate
from transactions in an ordinary supermarket
can be in the billions and trillions. The amount of
computation thus required for Association Rule
Mining can stretch any computer.
APRIORI algorithm
1. All singleton itemsets are candidates in the first
pass. Any item that has a support value of less
than a specified minimum is eliminated.
2. Selected singleton itemsets are combined to
form two-member candidate itemsets. Again,
only the candidates above the pre-specified
support value are retained.
(cont.)
3. The next pass creates three-member
candidate itemsets and the process is
repeated. The process stops only when all
large itemsets are accounted for.
4. Association Rules for the largest itemsets are
created first and then rules for the subsets are
created recursively.
Database D
Items
01
2 3
Scan D
Itemset
support
Large I.
sup.
{1}
2
{1}
2
{2}
3
{2}
3
{3}
3
02
1 3 5
03
1 2 4
{3}
3
04
2 3
{4}
1
{5}
1
Itemset
support
{1 2}
1
{1 3}
1
{2 3}
2
Large I.
support
{2 3}
2
Select
Select
Create
T.ID
Itemset
Scan D
{1 2}
{1 3}
{2 3}
Figure 7.2 Graphical demonstration of the working of the Apriori algorithm
APRIORI in Weka
Figure 7.3 Weka environment with market-basket.arff data file
Step 2
Figure 7.4 Spend98 attribute information visualisation.
Step 3
Figure 7.5 Target attributes selection through Weka
Step 4
Figure 7.6 Discretisation filter selection
Step 5
Figure 7.7 Parameter selections for discretisation.
Step 6
Figure 7.8 Descretisation activation
Discretised data visualisation
Figure 7.9 Discretised data visualisation
Step 7
Figure 7.10 Apriori algorithm selection from Weka for ARM
Step 8
Figure 7.11 Apriori output
Associator output
1. ‘Dairy’='(-inf-1088.666667]’ ‘Deli’='(-inf1169.666667]' 847
==> ‘Bakery’='(-inf-1316.666667]‘ 833
conf:(0.98)
Strengths and weaknesses
•
•
•
•
•
•
•
Easy Interpretation
Easy Start
Flexible Data Formats
Simplicity
Exponential Growth in Computations
Lumping
Rule Selection
Applications of ARM
•
•
•
•
•
Store Layout Changes
Cross/Up selling
Disaster Weather forecasting
Remote Sensing
Gene Expression Profiling
Recap
• What is association rule mining?
• Apriori: Most popular association rule mining
algorithm.
• Applications of association rule mining.
The Clustering Task
PowerPoint permissions
Cengage Learning Australia hereby permits the usage and posting of our copyright controlled PowerPoint slide content for all
courses wherein the associated text has been adopted. PowerPoint slides may be placed on course management systems that operate
under a controlled environment (accessed restricted to enrolled students, instructors and content administrators). Cengage Learning
Australia does not require a copyright clearance form for the usage of PowerPoint slides as outlined above.
Copyright © 2007 Cengage Learning Australia Pty Limited
Objectives
On completion of this lecture you should know:
•
•
•
•
•
Unsupervised clustering technique
Measures for clustering performance
Clustering algorithms
Clustering task demonstration using WEKA
Applications, strengths and weaknesses of the
algorithms
Clustering: Unsupervised learning
• Clustering is a very common technique
that appears in many different settings
(not necessarily in a data mining context)
– Grouping “similar products” together to
improve the efficiency of a production line
– Packing “similar items” into a basket
– Grouping “similar customers” together
– Grouping “similar stocks” together
A simple clustering example
Table 8.1 A simple unsupervised problem
Sl. No. Subjects
Code
1
COIT21002
Marks
85
2
COIS11021
78
3
COIS32111
75
4
COIT43210
83
Cluster representation
Figure 8.1 Basic clustering for data of Table 8.1.The X-axis is
the serial number and Y-axis is the marks
How many clusters can you form?
A
K
Q
J
A
K
Q
J
A
K
Q
J
A
K
Q
J
Figure 8.2 Simple playing card data
Distance measure
• The similarity is usually captured by a distance
measure.
• The original proposed measure of distance is
the Euclidean distance.
X  ( x1 , x2 ,, x n ), Y  ( y 1 , y 2 ,, y n )
d ( x, y ) 
n
 (x  y )
i 1
i
i
2
Figure 8.3 Euclidean distance D between two points A and B
Other distance measures
• City-block (Manhattan) distance
  xi  yi
i
• Chebychev distance: Maximum
• Power distance: 
xi  yi
1
p r
  xi  yi 
 i

Minkowski distance when p = r.
Distance measure for
categorical data
• Percent disagreement
  number of xi  yi  / n
Types of clustering
• Hierarchical Clustering
– Agglomerative
– Divisive
Agglomerative clustering
1. Place each instance into a separate partition.
2. Until all instances are part of a single cluster:
a. Determine the two most similar clusters.
b. Merge the clusters chosen into a single
cluster.
3. Choose a clustering formed by one of the
step 2 iterations as a final result.
1,2,3…………….,28,29,30
Divisive
Agglomerative
Dendrogram
Example 8.1
Figure 8.5 Hypothetical data points for agglomerative clustering
Example 8.1 cont.
Step 1
C = {{P1}, {P2}, {P3}, {P4}, {P5}, {P6}, {P7}, {P8}}
Step 2
C1  {{P2 },{P3},{P4},{P5},{P6},{P1P7 },{P8}}
Step 3
C2  {{P3},{P2 P4},{P5},{P6},{PP
1 7 },{P8}}
Example 8.1 cont.
Step 4
C3  {{P2 P3 P4 },{P5},{P6 },{PP
1 7 },{P8 }}
Step 5
C4  {{P2 P3 P4 },{P5},{PP
1 6 P7 },{P8}}
Step 6
C5  {{P2 P3 P4 },{P5},{PP
1 6 P7 P8 }}
Step 7
C6  {{P2 P3 P4 },{P1P5 P6 P7 P8}}
Agglomerative clustering: An
example
P2
P4
Y-Axis
P3
P6
P7
P1
P5
P8
X-Axis
Figure 8.6 Hierarchical clustering of the data points of Example 8.1
Dendrogram of the example
P3
P4
P2
P8
P7
P1
P6
P5
Figure 8.7 The dendrogram of the data points of Example 8.1
Types of clustering cont.
• Non-Hierarchical Clustering
– Partitioning methods
– Density-based methods
– Probability-based methods
Partitioning methods
The K-Means Algorithm:
1. Choose a value for K, the total number of
clusters.
2. Randomly choose K points as cluster centers.
3. Assign the remaining instances to their closest
cluster center.
4. Calculate a new cluster center for each cluster.
5. Repeat steps 3-5 until the cluster centers do not
change.
General considerations of Kmeans algorithm
•
•
•
•
•
Requires real-valued data.
We must pre-select the number of clusters
present in the data.
Works best when the clusters in the data are of
approximately equal size.
Attribute significance cannot be determined.
Lacks explanation capabilities.
Example 8.2
Let us consider the dataset of Example 8.1 to
find two clusters using the k-means algorithm.
Step 1. Arbitrarily, let us choose two cluster centers
to be the data points P5 (5, 2) and P7 (1, 2). Their
relative positions can be seen in Figure 8.6.
We could have started with any two other points.
The initial selection of points does not affect the final
result.
Step 2. Let us find the Euclidean distances of all the
data points from these two cluster centers.
Step 2. (Cont.)
Step 3. The new cluster centres are:
Step 4. The distances of all data points from these
new cluster centres are:
Step 4. (cont.)
Step 5. By the closest centre criteria P5 should be
moved from C2 to C1, and the new clusters are C1 =
{P1, P5, P6, P7, P8} and C2 = {P2, P3, P4}.
The new cluster centres are:
Step 6. We may repeat the computations of Step
4 and we will find that no data point will switch
clusters. Therefore, the iteration stops and the
final clusters are C1 = {P1, P5, P6, P7, P8} and C2 =
{P2, P3, P4}.
Density-based methods
C2
1
0.8
C1
C3
0.6
0.4
0.2
0
4
2
4
2
0
0
-2
-2
-4
-4
Figure 8.8 (a) Three irregular shaped clusters (b) Influence curve of a point
Probability-based methods
• Expectation Maximization (EM) uses a Gaussian
mixture model:
• Guess initial values of all the parameters until a
termination criterion is achieved
• Use the probability density function to compute
the cluster probability for each instance.
• Use the probability score assigned to each
instance in the above step to re-estimate the
parameters.
P(Ck ) P( X i | Ck )
P(Ck | X i ) 
P( X i )
Clustering through Weka
Step 1.
Figure 8.9 Weka environment with credit-g.arff data
Step 2.
Figure 8.10 SimpleKMeans algorithm and its parameter selection
Step 3.
Figure 8.11 K-means clustering performance
Step 3. (cont.)
Figure 8.12 Weka result window
Cluster visualisation
Figure 8.13 Cluster visualisation
Individual cluster information
Figure 8.14 Cluster0 instances information
Step 4.
Figure 8.15 Cluster 1 instance information
Kohonen neural network
Input 1
Input 2
Figure 8.16 A Kohonen network with two input nodes and nine output nodes
Kohonen self-organising maps:
Contains only an input layer and an output
layer but no hidden layer.
The number of nodes in the output layer that
finally captures all the instances determine the
number of clusters in the data.
Example 8.3
Output 1
Output 2
0.5
0.1
0.4
0.2
Input 1
Input 2
0.3
0.6
Figure 8.17 Connections between input and output nodes of a neural
network
Example 8.3 Cont.
The scoring for any output node k is done
using the formula:
2
(
I

W
)
 i ij
i
(0.3  0.1) 2  (0.6  0.2) 2
(0.3  0.4) 2  (0.6  0.5) 2
= 0.447
= 0.141
Example 8.3 cont.
wij ( new)  wij ( current )  wij
where
wij  r ( ni  wij )
0  r  1
Example 8.3 cont.
Assuming that the learning rate is 0.3, we get:
W12  0.3(0.3  0.4)  0.03
W22  0.3(0.6  0.5)  0.03
W12 (new)  0.4  0.03  0.37
W22 (new)  0.5  0.03  0.53
Cluster validation
t-test
 2 -test
Validity in Test Cases
Strengths and weaknesses
•
•
•
•
•
•
Unsupervised Learning
Diverse Data Types
Easy to Apply
Similarity Measures
Model Parameters
Interpretation
Applications of clustering
algorithms
• Biology
• Marketing research
• Library Science
• City Planning
• Disaster Studies
• Worldwide Web
• Social Network Analysis
• Image Segmentation
Recap
• What is clustering?
• K-means: Most popular clustering algorithm
• Applications of clustering techniques
The Estimation Task
PowerPoint permissions
Cengage Learning Australia hereby permits the usage and posting of our copyright controlled PowerPoint slide content for all
courses wherein the associated text has been adopted. PowerPoint slides may be placed on course management systems that operate
under a controlled environment (accessed restricted to enrolled students, instructors and content administrators). Cengage Learning
Australia does not require a copyright clearance form for the usage of PowerPoint slides as outlined above.
Copyright © 2007 Cengage Learning Australia Pty Limited
Objectives
On completion of this lecture you should know:
• Assess the numeric value of a variable from other
related variables.
• Predict the behaviour of one variable from the
behaviour of related variables.
• Discuss the reliability of different methods of
estimation and perform a comparative study.
What is estimation?
Finding the numeric value of an unknown attribute
from observations made on other related
attributes. The unknown attribute is called the
dependent (or response or output) attribute (or
variable) and the known related attributes are
called the independent (or explanatory or input)
attributes (or variables).
Scatter Plots and Correlation
Week ending
1-1-2006
8-1-2006
15-1-2006
22-1-2006
29-1-2006
5-2-2006
12-2-2006
19-2-2006
26-2-2006
5-3-2006
12-3-2006
19-3-2006
26-3-2006
2-4-2006
9-4-2006
16-4-2006
ASX
33.70
34.95
34.14
34.72
34.61
34.28
33.24
33.14
31.08
31.72
33.30
32.60
32.70
33.20
32.70
32.50
BHP
23.35
23.73
24.66
26.05
25.53
24.75
23.88
24.55
24.34
23.37
24.70
25.92
28.00
29.50
29.75
30.68
RIO
68.80
70.50
74.00
76.10
74.75
74.40
71.65
72.20
70.35
67.50
71.25
75.23
78.85
83.70
82.32
83.06
Table 9.1 Weekly closing stock prices (in dollars) at the Australian Stock
Exchange
Figure 9.1a Computer screen-shots of Microsoft Excel spreadsheets to
demonstrate plotting of scatter plot
Figure 9.1b
Figure 9.1c
Figure 9.1d
Scatter Plot
90.00
RIO Share Price ($)
80.00
70.00
60.00
50.00
40.00
30.00
20.00
10.00
0.00
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
BHP Share Price ($)
Figure 9.1e Computer screen-shots of Microsoft Excel spreadsheets to
demonstrate plotting of scatter plot
Correlation coefficient
Covariance between the two variables
r =
(Standard deviation of one variable )(Standard deviation of other variable )
r 
(X
i
 X )(Yi  Y )
2
(
X

X
)
.
 i
2
(
Y

Y
)
 i
Scatter plots of X and Y variables
and their correlation coefficients
Figure 9.2 Scatter plots of X and Y variables and their correlation
coefficients
CORREL xls function
Figure 9.3 Microsoft Excel command for the correlation coefficient
Example 9.2
Date
23-6-1983
24-6-1983
25-6-1983
26-6-1983
27-6-1983
28-6-1983
29-6-1983
30-6-1983
1-7-1983
2-7-1983
Rainfall
(mm/day)
0.00
1.64
20.03
9.20
75.37
50.13
9.81
1.02
0.00
0.00
Streamflow
(mm/day)
0.10
0.07
0.24
0.33
3.03
15.20
9.66
4.01
2.05
1.32
Example 9.2 cont.
The computations can be done neatly in tabular
form as given in the next slide:
(a) For the mean values:
X

X 
= 167.2/10 = 16.72,
i
n
Y
Y


i
n
= 36.01/10 = 3.601
Example 9.2 cont.
Therefore, the
correlation coefficient, r =
495.08
 0.43
(5983.89) (226.06)
Example 9.2 cont.
Therefore, the
correlation coefficient, r =
1039.06
 0.95
(5673.24) (212.45)
Linear regression
analysis
Linear means all exponents (powers) of x must
be one, i.e., it cannot be a fraction or a value
greater than 1. There cannot be a product term
of variables as well.
f ( x1 , x2 , x3 ... xn )  a1 x1  a2 x2  a3 x3  ....... an xn  c
Fitting a straight line
y=mx+c
Suppose the line passes through two points A and B,
where A is (x1,y1) and B is (x2, y2).
y  y1
x  x1

y1  y2 x1  x2
 y1  y2 
x1 y2  x2 y1
y
x 
x1  x2
 x1  x2 
Eq. 9.3
Example 9.3
Problem: The number of public servants claiming
compensation for stress has been steadily rising in
Australia. The number of successful claims in 198990 was 800 while in 1994-95 the figure was 1900.
How many claims are expected in the year 20062007 if the growth continues steadily? If each claim
costs an average of $24,000, what should be the
budget allocation of Comcare in year 2006-2007 for
stress-related compensation?
Example 9.3 cont.
Therefore, using equation (9.3) we get:
Y  1900
X  1995

1900  800 1995  1990
Solving, we have Y = 220.X – 437,000. If we now let
X = 2007, we get the expected number of claims in
the year 2006-2007.
So the number of claims in the year 2006-2007 is
expected to be 220(2007) – 437,000 = 4,540.
At $24,000 per claim, Comcare's budget should be
$108,960,000.
Simple linear regression
Figure 9.6 Schematic representation of the simple linear regression model
Least squares criteria
 ei2   (Yi  Yi ) 2  [Yi  (b0  b1 X i )]2
b1 
S xy
b0  Y b1 X
S xx
Y  average of all the y values
Y


i
Xi

X  average of all the x values 
n
n
S xy  sum of cross product deviations   ( X i  X )(Yi  Y ) 
S xx  sum of the squared deviations for X   ( X i  X ) 2 
 X iYi 
 X i2 
(  X i )(  Yi )
n
( X i ) 2
n
Example 9.5
Table 9.2 Unisuper membership by States
State
No. of
Inst., X
Membership,
Y
X2
Y2
XY
NSW
QLD
SA
TAS
VIC
WA
Others
Total
17
11
10
3
41
9
11
102
5 987
5 950
3 588
1 356
14 127
4 847
3 893
39 748
289
121
100
9
1681
81
121
2402
3.58442x107
3.54025x107
1.28737x107
1.83873x106
1.99572x108
2.34934x107
1.51554x107
3.241799x108
101 779
65 450
35 880
4 068
579 207
43 623
42 823
872 830
Example 9.5 cont.
Yi

Y
n
39748

 5678
7
(39 748 )(102 )
Sxy  872 830 
 293 645
7
X
 Xi
n

102
 14.57
7
(102 )2
Sxx  2 402 
 915 . 7
7
b1 = Sxy/Sxx = 293 645/915.7 = 320.7
b0  Y  m. X
= 5 678 - (320.7)(14.57) = 1005
Therefore, the regression equation is Y =
320.7X + 1005.
Multiple linear regression
with Excel
Type regression under help and then go to
linest function. Highlight ‘District office building
data’ and copy with cntrl C and paste with cntrl
V in your spreadsheet.
Multiple regression
Y  o  1X a   2 X b  ... 1X c X d 2 X e X f .....
1
2
1 2
2 3
Where Y is the dependent variable; X1, X2,
... are independent variables; 0,1, ... are
regression coefficients; and a,b,... are
exponents.
Example 9.6
Period
No. of
Private
Houses
Average
weekly
earnings ($)
1986-87
1987-88
1988-89
1989-90
1990-91
1991-92
1992-93
1993-94
1994-95
83 973
100 069
128 231
96 390
87 038
100 572
113 708
123 228
111 966
428
454
487
521
555
581
591
609
634
= LINEST (A2:A10,B2:D10,TRUE,TRUE)
No. of
persons in
workforce (in
millions)
5.6889
5.8227
6.0333
6.1922
6.0933
5.8846
5.8372
5.9293
6.1190
Variable Home
loan rate (in %)
15.50
13.50
17.00
16.50
13.00
10.50
9.50
8.75
10.50
Example 9.6 cont.
The Ctrl
and Shift
keys must
be kept
depressed
while
striking the
Enter key
to get
tabular
output.
Figure 9.7 Demonstration of use of LINEST function
Hence, from the printout, the regression equation is
the following:H = 155914.8 + 232.2498  E –
36463.4  W + 3204.0441 I
Coefficient of determination
If the fit is perfect, the R2 value will be one and
if there is no relationship at all, the R2 value
will be zero.
2
ˆ
(
Y

Y
)
Variation
explained

i
i
2
R 
1
2
Total variatio n
(
Y

Y
)
 i
Logistic regression
Regression equation cannot model discrete
values. We get a better reflection of the reality if
we replace the actual values by its probability.
The ratio of the probabilities of occurrence and
non-occurrence directs us close to the actual
value.
Transforming the linear
regression model
Logistic regression is a nonlinear regression
technique that associates a conditional
probability with each data instance.
The logistic regression model
eax  c
p( y  1 | x ) 
ax  c
1 e
where
e is the base of natural logarithms often denoted as exp
ax in the right-hand side of the regression equation in
vector form.
.
Logistic regression cont.
The logistic regression equation
1
0.9
0.8
0.7
P(y=1|X)
0.6
0.5
0.4
0.3
0.2
0.1
0
-10
-8
-6
-4
-2
0
X Values
2
4
6
8
10
Figure 9.8 Graphical representation of the logistic regression equation
Regression in Weka
Figure 9.10 Selection of logistic function
Output from logistic regression
Figure 9.12 Output from logistic regression
Visualisation option of the results
Figure 9.13 Visualisation option of the results
Visual impression of data and
clusters
Figure 9.14 Visual impression of data and clusters
Particular instance information
Figure 9.15 Information about a particular instance
Strengths and weaknesses
• Regression analysis is a powerful tool
suitable for linear relationships, but most
real-world problems are nonlinear. Mostly,
therefore, the output is not accurate but
useful.
• Regression techniques assume normality
in the distribution of uncertainty and the
instances are assumed to be independent
of each other. This is not the case with
many real problems.
Applications of regression
algorithms
•
•
•
•
•
Financial Markets
Medical Science
Retail Industry
Environment
Social Science
Recap
• What is estimation?
• How to solve the estimation problem?
• Applications of regression analysis.