Download Slide 1 - my Mancosa

Document related concepts
no text concepts found
Transcript
Random Variable
Quantitative
(numeric)
Qualitative
(categorical)
Nominal
Interval
Ordinal
Discrete
Ratio
Continuous
SUMMARIZING NUMERIC DATA
•
•
•
•
•
Simple Frequency Table
Grouped Frequency Table
Histogram
Frequency Polygon
Cumulative Frequency Distribution
3- 3
Measures of Central Location
•
Arithmetic Mean
•
Median
•
Mode.
3- 4
Mean for grouped data:
fm

population mean :  
N
fm
sample mean : x 
n
3- 5
Median for grouped data:
n

c    CF 
2


median  L 
fm
3- 6
Mode for grouped data:
f m  f m1
mode  L 
c
2 f m  f m1  f m1
Measures of Dispersion (Variability)
•
•
•
•
Range
Variance and Standard Deviation
Coefficient of Variation
Non-central Locations: Inter-fractile Ranges
Standard Deviation
s
n(  x 2 )  (  x ) 2
s
n( fm2 )  ( fm) 2
(ungrouped data)
n(n  1)
n(n  1)
(grouped data)
Coefficient of variation:
s
CV  (100)%
x
3- 10
Empirical Rule:
68%
95%
99.7%
3s
2s
1s

1s
2s
 3s
Symmetric Distribution
Zero skewness → :Mean =Median = Mode
Mean
Median
Mode
The Relative Positions of the Mean, Median, and Mode:
3- 11
Positively skewed: Mean>Median>Mode
Mode
Mean
Median
3- 12
3- 13
Negatively Skewed: Mean<Median<Mode
Mean
Mode
Median
Non-Central Location Measures
(Fractiles or Quantiles)
•
•
•
•
•
Quartiles
Sextiles
Octiles
Deciles
Percentiles
Calculating Quartiles for Grouped Data
The jth quartile for grouped data is given by:
 jn

  CF   c
4


Qj  L 
fQ j
n = sample size
L = lower limit of jth quartile class
CF = < cumulative frequency of immediately preceding class.
fQj = frequency of jth quartile class.
Example
A sample of 20 randomly-selected hospitals in the US revealed the following
daily charges (in $) for a semiprivate room.
153
159
142
146
141
140
130
148
142
163
134
151
122
167
137
152
143
168
159
141
1.1 Using class intervals of width 10 units, construct a less-than cumulative
frequency distribution of the above data. Let 120 units be the lower limit
of the smallest class.
1.2 Draw a less-than ogive and use it to estimate the 80th percentile.
1.3 For the grouped data of question 1.1 above, calculate:
1.3.1 The mean, median and mode
1.3.2 The interquartile range..
1.3.3 The coefficient of variation. Interpret the result obtained.
Solution
1.1
Class
Freq, f
<cum freq, F
120 < 130
1
1
130 < 140
3
4
140 < 150
8
12
150 < 160
5
17
160 < 170
3
20
∑ = 20
1.2
Less-than Ogive
25
cum Frequency
20
15
10
5
0
100
110
120
130
140
150
Upper Class Limit
80th percentile = 158
160
170
180
Class
Freq, f
<cum freq, F
midpt, m
fm
120 < 130
1
1
125
125
130 < 140
3
4
135
405
140 < 150
8
12
145
1160
150 < 160
5
17
155
775
160 < 170
3
20
165
495
∑ = 20
1.3.1
∑ = 2960
fm 2960

x

 148
 f 20
n

2  CF 
x L 
c
med
xmod e  Lmod e
med
f med
 f m  f m1 
140 
10  4
10  147.5
8
(8  3)
10  146.3

c  140 
(16  3  5)
(2 f m  f m 1  f m 1 )
Class
Freq, f
<cum freq, F
120 < 130
1
1
130 < 140
3
4
140 < 150
8
12
150 < 160
5
17
160 < 170
3
20
∑ = 20
1.3.2
Q3

15  12 
 150 
 10  156
5
(5  4 )
Q1  140 
 10  141.3
8
IQR  Q3  Q1  156  141.3  14.7
1.3.3
Class
Midpt, m
fm
fm2
120 < 130
125
125
15625
130 < 140
135
405
54675
140 < 150
145
1160
168200
150 < 160
155
775
120125
160 < 170
165
495
81675
∑ = 2960
∑ = 440300
CV = standard deviation/mean
s
→
 fm
2
2
 ( fm) / n
(n  1)
440300  29602 / 20

 10.8
19
CV = 10.8/148  0.073 ≡ 7.3% → data clustered around mean.
BASIC PROBABILITY CONCEPTS
•
•
•
•
•
•
Random Experiment
Sample Space
Event
Collectively Exhaustive Events
Dependent Events
Independent Events
• Marginal Probability
• Joint Probability: P(A∩B) = P(B∩A)
• Conditional Probability: P(A|B) = P(A∩B)/P(B)
P(B|A) = P(A∩B)/P(A)
.
Complement Rule:
P(A’) = 1 – P(A) or P(A) = 1 – P(A’)
Special Multiplication Rule:
P(A and B) = P(A)P(B) = P(B)P(A)
General Multiplication Rule:
P(A and B) = P(AB) = P(A)P(B/A)
or
P(A and B) = P(AB) = P(B)P(A/B)
Special Addition Rule:
P(A or B) = P(A)+P(B)
GeneralAddition Rule:
P(A or B) =
P(A)+P(B) – P(A and B)
Example
A company manufactures a total of 8000 motorcycles a month
in three plants A, B and C. Of these, plant A manufactures
4000, and plant B manufactures 3000. At plant A, 85 out of
100 motorcycles are of standard quality or better. At plant B,
65 out of 100 motorcycles are of standard quality or better
and at plant C, 60 out of 100 motorcycles are of standard
quality or better. The quality controller randomly selects a
motorcycle and finds it to be of substandard quality. Calculate
the probability that it has come from plant B.
Solution
P(B/substd) = No. of substd items from B/Total no. of substd items
No of substd items from A = 4000x(100 – 85)/100 = 40x15 = 600
No of substd items from B = 3000x(100 – 65)/100 = 30x35 = 1050
No of substd items from C =1000x(100 – 60)/100 = 10x40 = 400
Total number of substd items = 600 +1050 + 400 = 2050
P(B/substd) = 1050/2050 = 0.512
PROBABILITY DISTRIBUTIONS
• Properties
• Discrete distributions
• Normal distributions
Binomial Probability Distribution
n!
x
n x
P( x ) 
 (1   )
x!(n  x )!
Example
According to a leading newspaper, the largest cellular
phone service in the US has about 36 million subscribers
out of a total of 180 million cell phone users. If six cell
phone users are randomly selected, what is the probability
that at least two of them subscribes to this service?
n=6
  36 / 180  0.2
n!
P( x) 
 x (1   ) n  x
x!(n  x)!
P( x  2)  1  P(0)  P(1)
P(0) 
P(1) 
6!
(0.2) 0 (1  0.2) 6  0.262
0!(6  0)!
6!
(0.2)1 (1  0.2) 5  0.393
1!(6  1)!
P( x  2)  1  0.262  0.393  0.345
Poisson Probability Distribution
P( x ) 
x 
e
x!
Example
Customers arrive randomly and independently at a service point
at an average rate of 30 per hour.
1. Calculate the probability that exactly 20 customers arrive at
the service point during any given hour.
2. Calculate the probability that
during any 5 minute period at least 3 customers arrive at the
service point.
Solution
1.
2.
3020 30
P(20) 
e  0.0134
20!
λ = 30/hr
P( x) 
x
x!
e  ; λ = 30/60 min = 2.5/5 min
P( x  3)  1  P(0)  P(1)  P(2)
2.5 0 2.5
P(0) 
e
0!
0
2.51 2.5
P(1) 
e
1!
2.5  2.5
e
→ P(x ≥ 3) = 1 0!
= 0.497
1
2.5 2.5
e
1!
2.5 2  2.5
P(2) 
e
2!
2.5 2  2.5
e
2!
Normal probability distribution
Standard normal or z-distribution
z
x
s
r
a
l
i
t r
b
u
i o
n
:
m
=
0
,
s2
=
1
Normal Distribution
0
. 4
0
. 3
Theoretically,
curve extends
to infinity
Normal curve is
symmetrical
. 2
0
. 1
f ( x
0
. 0
- 5
a
Mean, median, and
mode are equal
x
z
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
0.00
0.0000
0.0398
0.0793
0.1179
0.1554
0.1915
0.2257
0.2580
0.2881
0.3159
0.3413
0.3643
0.3849
0.4032
0.4192
0.4332
0.4452
0.4554
0.4641
0.4713
0.4772
0.4821
0.4861
0.4893
0.4918
0.4938
0.4953
0.4965
0.4974
0.4981
0.4987
0.01
0.0040
0.0438
0.0832
0.1217
0.1591
0.1950
0.2291
0.2611
0.2910
0.3186
0.3438
0.3665
0.3869
0.4049
0.4207
0.4345
0.4463
0.4564
0.4649
0.4719
0.4778
0.4826
0.4864
0.4896
0.4920
0.4940
0.4955
0.4966
0.4975
0.4982
0.4987
0.02
0.0080
0.0478
0.0871
0.1255
0.1628
0.1985
0.2324
0.2642
0.2939
0.3212
0.3461
0.3686
0.3888
0.4066
0.4222
0.4357
0.4474
0.4573
0.4656
0.4726
0.4783
0.4830
0.4868
0.4898
0.4922
0.4941
0.4956
0.4967
0.4976
0.4982
0.4987
0.03
0.0120
0.0517
0.0910
0.1293
0.1664
0.2019
0.2357
0.2673
0.2967
0.3238
0.3485
0.3708
0.3907
0.4082
0.4236
0.4370
0.4484
0.4582
0.4664
0.4732
0.4788
0.4834
0.4871
0.4901
0.4925
0.4943
0.4957
0.4968
0.4977
0.4983
0.4988
0.04
0.0160
0.0557
0.0948
0.1331
0.1700
0.2054
0.2389
0.2704
0.2995
0.3264
0.3508
0.3729
0.3925
0.4099
0.4251
0.4382
0.4495
0.4591
0.4671
0.4738
0.4793
0.4838
0.4875
0.4904
0.4927
0.4945
0.4959
0.4969
0.4977
0.4984
0.4988
0.05
0.0199
0.0596
0.0987
0.1368
0.1736
0.2088
0.2422
0.2734
0.3023
0.3289
0.3531
0.3749
0.3944
0.4115
0.4265
0.4394
0.4505
0.4599
0.4678
0.4744
0.4798
0.4842
0.4878
0.4906
0.4929
0.4946
0.4960
0.4970
0.4978
0.4984
0.4989
0.06
0.0239
0.0636
0.1026
0.1406
0.1772
0.2123
0.2454
0.2764
0.3051
0.3315
0.3554
0.3770
0.3962
0.4131
0.4279
0.4406
0.4515
0.4608
0.4686
0.4750
0.4803
0.4846
0.4881
0.4909
0.4931
0.4948
0.4961
0.4971
0.4979
0.4985
0.4989
0.07
0.0279
0.0675
0.1064
0.1443
0.1808
0.2157
0.2486
0.2794
0.3078
0.3340
0.3577
0.3790
0.3980
0.4147
0.4292
0.4418
0.4525
0.4616
0.4693
0.4756
0.4808
0.4850
0.4884
0.4911
0.4932
0.4949
0.4962
0.4972
0.4979
0.4985
0.4989
0.08
0.0319
0.0714
0.1103
0.1480
0.1844
0.2190
0.2517
0.2823
0.3106
0.3365
0.3599
0.3810
0.3997
0.4162
0.4306
0.4429
0.4535
0.4625
0.4699
0.4761
0.4812
0.4854
0.4887
0.4913
0.4934
0.4951
0.4963
0.4973
0.4980
0.4986
0.4990
0.09
0.0359
0.0753
0.1141
0.1517
0.1879
0.2224
0.2549
0.2852
0.3133
0.3389
0.3621
0.3830
0.4015
0.4177
0.4319
0.4441
0.4545
0.4633
0.4706
0.4767
0.4817
0.4857
0.4890
0.4916
0.4936
0.4952
0.4964
0.4974
0.4981
0.4986
0.4990
Example
Six hundred candidates wrote an entrance test for admission to
a management course. The marks obtained by the candidates
were found to be normally distributed with a mean of 132
marks and a standard deviation of 18 marks.
1. How many candidates scored between 140 and 160 marks?
2. If the top 60 performers were given confirmed admission,
calculate the minimum mark (to the nearest integer) above
which a candidate would be guaranteed admission?
Solution
1.
z
x
s
Z1 =(140 -132)/18 = 0.4444 → P1 ≈ 0.172
Z2 =(160 -132)/18 = 1.5556 → P2 ≈ 0.440
→ P (160<X<140) ≈ 0.440 – 0.172 = 0.268
→ 0.268 x 600 students ≈ 161 students
2.
Let xc denote the minimum mark.
zc 
60/600 = 0.1 = 10%.

xc  
s
P(0 <z<zc) = 0.50 - 0.10 = 0.4 → zc = 1.28
xc  132 xc  132

 1.28  xc  155
18
18
HYPOTHESIS TESTING
• What is a Hypothesis?
• What is Hypothesis Testing?
Basic Terms
•
•
•
•
•
•
•
•
•
•
•
Null hypothesis
Alternative hypothesis
Level of significance
Type I error
Type II error
Critical value
Test statistic
Rejection area
Acceptance area
One-tailed test
Two-tailed Test
Five-Step Procedure for Hypothesis Testing
Step 1: State the null and alternative hypotheses
Step 2: Determine the critical value associated with the
the level of significance
Step 3: Identify and calculate the test statistic
Step 4: Formulate and apply the decision rule
Step 5: Draw a conclusion
Testing a Single Population Mean
Large sample (n > 30)
Test statistic:
ztest 
x
s
n
Small sample (n ≤ 30)
Test statistic:
ttest 
x
s
n
df\p
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
0.4
0.32492
0.288675
0.276671
0.270722
0.267181
0.264835
0.263167
0.261921
0.260955
0.260185
0.259556
0.259033
0.258591
0.258213
0.257885
0.257599
0.257347
0.257123
0.256923
0.256743
0.25658
0.256432
0.256297
0.256173
0.25606
0.255955
0.255858
0.255768
0.25
1
0.816497
0.764892
0.740697
0.726687
0.717558
0.711142
0.706387
0.702722
0.699812
0.697445
0.695483
0.693829
0.692417
0.691197
0.690132
0.689195
0.688364
0.687621
0.686954
0.686352
0.685805
0.685306
0.68485
0.68443
0.684043
0.683685
0.683353
0.1
3.077684
1.885618
1.637744
1.533206
1.475884
1.439756
1.414924
1.396815
1.383029
1.372184
1.36343
1.356217
1.350171
1.34503
1.340606
1.336757
1.333379
1.330391
1.327728
1.325341
1.323188
1.321237
1.31946
1.317836
1.316345
1.314972
1.313703
1.312527
0.05
6.313752
2.919986
2.353363
2.131847
2.015048
1.94318
1.894579
1.859548
1.833113
1.812461
1.795885
1.782288
1.770933
1.76131
1.75305
1.745884
1.739607
1.734064
1.729133
1.724718
1.720743
1.717144
1.713872
1.710882
1.708141
1.705618
1.703288
1.701131
0.025
12.7062
4.30265
3.18245
2.77645
2.57058
2.44691
2.36462
2.306
2.26216
2.22814
2.20099
2.17881
2.16037
2.14479
2.13145
2.11991
2.10982
2.10092
2.09302
2.08596
2.07961
2.07387
2.06866
2.0639
2.05954
2.05553
2.05183
2.04841
0.01
31.82052
6.96456
4.5407
3.74695
3.36493
3.14267
2.99795
2.89646
2.82144
2.76377
2.71808
2.681
2.65031
2.62449
2.60248
2.58349
2.56693
2.55238
2.53948
2.52798
2.51765
2.50832
2.49987
2.49216
2.48511
2.47863
2.47266
2.46714
0.005
63.65674
9.92484
5.84091
4.60409
4.03214
3.70743
3.49948
3.35539
3.24984
3.16927
3.10581
3.05454
3.01228
2.97684
2.94671
2.92078
2.89823
2.87844
2.86093
2.84534
2.83136
2.81876
2.80734
2.79694
2.78744
2.77871
2.77068
2.76326
0.0005
636.6192
31.5991
12.924
8.6103
6.8688
5.9588
5.4079
5.0413
4.7809
4.5869
4.437
4.3178
4.2208
4.1405
4.0728
4.015
3.9651
3.9216
3.8834
3.8495
3.8193
3.7921
3.7676
3.7454
3.7251
3.7066
3.6896
3.6739
Testing a Single Population Proportion:
Large sample (n > 30)
Test statistic: ztest 
p 
 (1   )
n
Small sample (n ≤ 30)
Test statistic:
ttest 
p 
 (1   )
n
Tests Involving Two Sample Means
Small sample sizes
𝑡=
𝑥1 − 𝑥2 − (µ1 − µ2 )
𝑛1 − 1 𝑠1 2 + 𝑛2 − 1 𝑠2 2 1
1
( + )
𝑛1 + 𝑛2 − 2
𝑛1 𝑛2
Degrees of freedom = 𝑛1 + 𝑛2 − 2
Example
Students are trained using two different formats
for an accounting program. A random sample of
10 students are trained using format 1, and the
number of errors in a prototype examination is
as follows : 11 8 8 3 7 5 9 5 1 3 .
Another random sample of 12 students using
format 2 was used and the errors in the same
examination was :
10 11 9 7 2 11 12 3 6 7 8 12 .
Example
Investigate at the 10% level of significance if
there is a difference in the mean of the
samples.
Solution
𝐻0 ∶ µ1 = µ2
(The two review formats are effectively equal)
𝐻1 ∶ µ1 ≠ µ2
(The two review formats are effectively not
equal)
Solution
𝑥1 = 6.000 and 𝑥2 = 8.167
𝑠1 = 3.127 and 𝑠2 = 3.326
𝑡=
6.000 − 8.167
1
1
10.484( +
)
10 12
= −1.563
Solution
𝑑𝑓 = 10 + 12 − 2 = 20
𝑡𝑐𝑟𝑖𝑡 = ±1.725
Since 𝑡 falls in the acceptance region, we
conclude that there is no difference in the mean
errors.
Tests Involving Two Sample Means
ztest 
x1  x2  ( 1  2 )
2
1
2
2
s
s

n1 n2
Example
A union representing workers at a large industrial concern accused
management that discriminatory wages were paid to the workers
in two production facilities, A and B. It claimed that workers in
facility A were being paid differently than those in facility B. The
company investigates the claim by examining the pay of 70
workers from each production facility. The results were as
follows.
Facility A
Facility B
Mean salary
$455.00
$463.00
Std deviation
$10.00
$13.00
What conclusion did the company reach? Investigate at the
5% level of significance.
Solution
H0:
H1:
 A  B
 A  B
→ two tailed-test
nA, nB > 30 → z test. α = 5% → zcrit = 1.96
z test 

x A  xB
s A2 / n A  s B2 / n B
455  463
100 / 70  169 / 70
 4.081
Since │ztest │ > │zcrit│ reject H0
→ Sufficient statistical evidence to suggest a significant
difference in the salaries.
Tests Involving Two Sample Proportions
ztest 
p1  p2  ( 1   2 )
1 1
pq  
 n1 n2 
n1 p1  n2 p2
p
n1  n2
q  1 p
Example
Surveys were conducted in two major cities “A” and “B” to
ascertain viewer habits regarding a popular television channel.
In city “A”, 1000 people were interviewed and 680 said they
viewed the channel. In city “B”, 600 people were interviewed
and 444 said they viewed the channel. Investigate, at the 5%
level of significance, whether there is a significant difference
between the viewing habits in the two cities.
A B
H1 :  A   B
H0 :
→ two tailed-test;
α = 5% → zcrit = 1.96
p n  p B nB
680  444
p A A

 0.7025 q = 1 – p = 0.2975
n A  nB
1000  600
ztest 
p A  pB
680 / 1000  444 / 600

 2.54
pq(1 / nA  1 / nB )
0.7025  0.29751 / 1000  1 / 600
Since │ztest │> │zcrit │, reject H0 at the 5% level of significance.
→ Sufficient statistical evidence to suggest a significant difference in the viewing
habits.
Chi-square Applications
Major Characteristics:
 positively skewed
 non-negative
 family of chi-square distributions
H0: There is no difference between the
observed and expected frequencies.
H1: There is a difference between the
observed and the expected frequencies.
Test statistic:
2


fo  f e  
2
 stat   

fe


The critical value is a chi-square value with (k-1)
degrees of freedom, where k is the number of
categories
df\area 0.995
0.99
0.975
0.95
0.90
0.75
0.5
0.25
0.10
0.05
0.025
0.01
0.005
1
0.00004 0.00016 0.00098 0.00393 0.01579 0.10153 0.45494 1.3233 2.70554 3.84146 5.02389 6.6349 7.87944
2
0.01003 0.0201 0.05064 0.10259 0.21072 0.57536 1.38629 2.77259 4.60517 5.99146 7.37776 9.21034 10.5966
3
0.07172 0.11483 0.2158 0.35185 0.58437 1.21253 2.36597 4.10834 6.25139 7.81473 9.3484 11.3449 12.8382
4
0.20699 0.29711 0.48442 0.71072 1.06362 1.92256 3.35669 5.38527 7.77944 9.48773 11.1433 13.2767 14.8603
5
0.41174 0.5543 0.83121 1.14548 1.61031 2.6746 4.35146 6.62568 9.23636 11.0705 12.8325 15.0863 16.7496
6
0.67573 0.87209 1.23734 1.63538 2.20413 3.4546 5.34812 7.8408 10.6446 12.5916 14.4494 16.8119 18.5476
7
0.98926 1.23904 1.68987 2.16735 2.83311 4.25485 6.34581 9.03715 12.017 14.0671 16.0128 18.4753 20.2777
8
1.34441 1.6465 2.17973 2.73264 3.48954 5.07064 7.34412 10.2189 13.3616 15.5073 17.5346 20.0902 21.955
9
1.73493 2.0879 2.70039 3.32511 4.16816 5.89883 8.34283 11.3888 14.6837 16.919 19.0228 21.666 23.5894
10
2.15586 2.55821 3.24697 3.9403 4.86518 6.7372 9.34182 12.5489 15.9872 18.307 20.4832 23.2093 25.1882
11
2.60322 3.05348 3.81575 4.57481 5.57778 7.58414 10.341 13.7007 17.275 19.6751 21.9201 24.725 26.7569
12
3.07382 3.57057 4.40379 5.22603 6.3038 8.43842 11.3403 14.8454 18.5494 21.0261 23.3367 26.217 28.2995
13
3.56503 4.10692 5.00875 5.89186 7.0415 9.29907 12.3398 15.9839 19.8119 22.362 24.7356 27.6883 29.8195
14
4.07467 4.66043 5.62873 6.57063 7.78953 10.1653 13.3393 17.1169 21.0641 23.6848 26.119 29.1412 31.3194
15
4.60092 5.22935 6.26214 7.26094 8.54676 11.0365 14.3389 18.2451 22.3071 24.9958 27.4884 30.5779 32.8013
16
5.14221 5.81221 6.90766 7.96165 9.31224 11.9122 15.3385 19.3689 23.5418 26.2962 28.8454 31.9999 34.2672
17
5.69722 6.40776 7.56419 8.67176 10.0852 12.7919 16.3382 20.4887 24.769 27.5871 30.191 33.4087 35.7185
18
6.2648 7.01491 8.23075 9.39046 10.8649 13.6753 17.3379 21.6049 25.9894 28.8693 31.5264 34.8053 37.1565
19
6.84397 7.63273 8.90652 10.117 11.6509 14.562 18.3377 22.7178 27.2036 30.1435 32.8523 36.1909 38.5823
20
7.43384 8.2604 9.59078 10.8508 12.4426 15.4518 19.3374 23.8277 28.412 31.4104 34.1696 37.5662 39.9969
Example
A certain drug is claimed to be effective in curing the common cold. In a clinical
trial involving 500 patients having the common cold, 250 were given the drug and
the rest were given sugar pills. The patients’ reactions to the treatment are recorded
in the table below.
Helped
Harmed
No Effect
Total
Drug
150
30
70
250
Sugar Pills
130
40
80
250
Total
280
70
150
500
On the basis of the above data, can it be concluded, at the 5% significance level,
that there is a significant difference in the effect of the drug and sugar pills?
f e0
H0: No significant difference in effect of drug and sugar pills.
H1: There is a significant difference in effect of drug and sugar pills.
α = 0.05, df = (2-1)(3-1) = 2 →
f0
150
30
70
130
40
80
2
2
 calc
 3.524   crit
fe
140
35
75
140
35
75
2
 crit
 5.991
f0 – fe
10
-5
-5
-10
5
5
(f0 - fe)2/fe
0.7143
0.7143
0.3333
0.7143
0.7143
0.3333
 = 3.524
Hence do not reject H0 at α = 0.05.
→ insufficient statistical evidence to suggest that there is a
significant difference between drug and sugar pills.
LINEAR REGRESSION
AND CORRELLATION
•
•
•
•
•
•
Correlation analysis
Scatterplot
Correlation coefficient
Dependent and independent variables
The coefficient of determination
Linear regression equation
Correlation Coefficient Formula:
r
n xy   x   y
n x   x 
2
2
n y   y 
2
The coefficient of determination = r2
2
The regression equation : Y' = a + bX
Y' = average predicted value of Y for any X.
a = Y-intercept = estimated Y value when X=0
b = slope of the line.
b
n   xy   x  y 
n   x   x 
2
y  b x 

a
n
2
Example
The following data relates to the training periods and
average weekly sales of seven randomly selected
salesmen in a large company.
Salesman
Training (hours)
A
B
C
20
5
10
Ave weekly sales
($’000)
44
22
35
D
E
F
13
12
8
32
27
26
G
15
35
1. Calculate the correlation coefficient. Comment on the
value obtained.
2. Determined the coefficient of determination and interpret
the value obtained.
3. Assuming a linear relation between the variables in the
given data, obtain the regression equation connecting the
variables.
4. Estimate the weekly sales of a salesman who had 22h of
training. Is the result reliable? Explain.
Solution
1. Let x denote training period (in hours) and let y
denote sales (in $’000)
x
y
x2
Y2
xy
20
5
10
13
44
22
35
32
400
25
100
169
1936
484
1225
1024
880
110
350
416
12
8
27
26
144
64
729
676
324
208
15
35
225
1225
525
83
221
1127
7299
2813
r
n xy   x y
n x   x 
2

2
n y   y 
2
7 x 2813  83  221
(7 x1127  832 )(7 x7299  2212 )
2
 0.9
 strong positive linear relationship between x and y
2. r2 = 0.81
 81% of variation in Y due to variation in X.
The remaining 19% due to other factors.
3.
b
n xy   x y
n x   x 
2
2
7 x 2813  83  221
 1.35
=
2
7 x1127  83
a  y  bx
= 221/7 – 1.35 x 83/7 =15.56
→ y = 15.56 +1.35x
4. When x = 22 hours,
y = 15.56 + 1.35 x 22 = 45.3 x $1000 = $45300
No. Regression equation valid only in the domain
5 ≤ x ≤ 20
TIME SERIES AND FORECASTING
Components
• The Secular Trend (T)
• The Cyclical Variation (C)
• The Seasonal Variation (S)
•
The Irregular Variation (I)
Multiplicative Model: Y = T.C.S.I
The linear trend equation :
T = a + bt
Seasonal Indices





Moving average
Centred moving average
Ratio to centred moving average
Adjusted seasonal average
Deasonalizing a series.
Example
The Following table gives the quarterly healthcare claims (in R
millions) against all healthcare claims for the period 2008 to 2010.
Year
Q1
Q2
Q3
Q4
2008
14.0
15.6
21.5
18.3
2009
13.1
14.7
24.8
19.4
2010
14.4
17.3
25.6
15.8
1. Represent the above data in as time series plot.
2. Calculate the quarterly seasonal indices for healthcare claims
using the ratio-to moving average method. Interpret the
results.
3. Derive a trend line using the method of least squares
4. Estimate the seasonally-adjusted trend value of health care
claims for the third quarter of 2011.
1.
30.0
Quarterly Healthcare Claims ( in Rm)
for the period 2008 - 2010
Claims (Rm)
25.0
20.0
15.0
10.0
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
2008
2009
2010
2.
Season
Data
(Rm)
4MA
(Rm)
Centred
4MA (Rm)
Unadj.
SI(%)
2008 Q1
Q2
Q3
Q4
2009 Q1
Q2
Q3
Q4
2010 Q1
Q2
Q3
Q4
14.0
15.6
21.5
18.3
13.1
14.7
24.8
19.4
14.4
17.3
25.6
15.8
17.350
17.125
16.900
17.725
18.000
18.325
18.975
19.175
18.275
-
17.238
17.013
17.313
17.863
18.163
18.650
19.075
18.725
-
124.7
107.6
75.7
82.3
136.5
104.0
75.5
92.4
-
Q1
Q2
2 008
Q3
Q4
124.7
107.6
2 009
75.7
82.3
136.6
104.0
2 010
75.6
92.4
-
-
Mean SI
75.7
87.4
130.7
105.8
Adj. SI
75.7
87.5
130.9
106.0
The annual seasonal influences are as follows:
Q1: substantial decrease of 24.3%
Q2: decrease of 12.5%
Q3: substantial increase of 30.9%
Q4: increase of 6.0%
t
1
2
3
4
5
6
7
8
9
10
11
12
∑ = 78
3.
4.
tT
t2
T
1
14.0
14.0
4
31.2
15.6
9
64.5
21.5
16
73.2
18.3
25
65.5
13.1
36
88.2
14.7
49
173.6
24.8
64
155.2
19.4
81
129.6
14.4
100
173.0
17.3
121
281.6
25.6
144
189.6
15.8
∑ = 214.5 ∑ = 650 ∑ = 1439.2
T(t) = 15.9 +0.31t
Adj. Estimate for Q3 of 2011:
Y(2011, Q3) = T(15) x 1.309 = (15.9 + 0.31 x 15) x 1.309
= 26.9 ≡ R26.9m
STATISTICAL DECISION THEORY
Components to Decision-Making Situation
•
Decision alternatives or acts
States of nature
•
Payoffs
•
Decision Making Without Probabilities
•
Maximin Strategy
•
Maximax Strategy
•
Minimax Regret Strategy
Decision Making with Probabilities
• Payoff table
• Expected Payoff or Expected Monetary Value (EMV)
Decision Trees
• Decision nodes
• Even nodes
• Tree Structure
• EMV calculations
Example
A large corporation arranged to use an ocean linear as a
floating hotel for its annual convention. The shipping company
had to make a decision whether or not to lease the ship. If
leased, the company would get a flat fee and an additional
percentage of profits from the convention, which could attract
as many as 50000 people. The company’s analysts estimated
that if the ship were leased there would be a 50% chance of
realizing a profit of $700000, a 30% chance of making a profit
of $800000, 15% chance of making a profit of $900000 and a
5% chance of making a profit of $1m.
If the ship were not leased, it could be used for its usual
voyage over the convention duration. In this case there would
a 90% probability of making a profit of $750000 and a 10%
probability that profits would be $780000.
The company has one additional option. It the ship were
leased, and it became clear within the first few days of the
convention that the profits were going to be in the $700000
range, the company could choose to promote the convention on
its own by offering participants discounts on the ocean liner’s
cruises. The company’s analysts believe that if this action were
chosen there would be a 60% chance that profits would
increase to $740000 and a 40% chance that the promotion
would fail, lowering profits to $680000.
4.1 Draw a decision tree to depict the above problem.
4.2 What decision should the shipping company take?
Show all working.
Do not
Promote
$700000
C
0.5
0.4
$680000
0.6
$740000
Promote
D
0.3
$800000
Lease
B
0.15
0.05
$900000
$1000000
Do not
lease
$750000
0.9
A
0.1
$780000
4.2
EMV = max[EMV(A), EMV(B)]
EMV(A) = $780000 x 0.1 + $750000 x 0.9 = $753000
EMV(B) = $1000000x0.05 + $900000x0.15 + $800000x0.3 + 0.5xEMV(C)
= $425000+0.5xEMV(C)
EMV(C) = max[$700000, EMV(D)]
= max[$700000, $680000x0.4 + $740000x0.6] = $716000 → promote
Hence EMV (B) = $425000 + $716000x0.5 = $783000 → EMV = $783000
Decision: Lease and then promote the convention if profits from lease are in the
$700000 range.
Related documents