Download Slides 3-7 Proportion Inference

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Confidence interval wikipedia , lookup

Opinion poll wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
BA 275
Quantitative Business Methods
Agenda
 Multiple Linear Regression: Dummy Variables
 Statistical Inference: Population Proportion


Confidence Interval Estimation
Hypothesis Testing
1
Using Dummy Variables
Median Value Location Amount
of Homes
of ATM Withdrawn
($000)
($000)
X1
X2
Y
225
yes
120
170
no
99
153
yes
91
132
no
82
237
yes
124
187
yes
104
245
yes
127
125
yes
80
215
yes
115
170
no
97
223
no
117
147
no
86
197
yes
109
167
no
94
210
no
112
Median Value Location Amount
of Homes
of ATM Withdrawn
($000)
($000)
X1
X2
Y
225
120
170
99
153
91
132
82
237
124
187
104
245
127
125
80
215
115
170
97
223
117
147
86
197
109
167
94
210
112
2
Dummy Variable for LOCATION
1, yes, if located in a shopping center
X2  
0, no, if located outside a shopping center
Y  0  1 (Value)   2 ( Location)  
3
Fitted Model
Multiple Regression Analysis
Dependent variable: Withdrawal
Parameter
CONSTANT
Location
Value
Source
Model
Residual
Total (Corr.)
Estimate
29.6823
1.22822
0.393129
Standard
Error
1.33374
0.545909
0.0073456
Analysis of Variance
Sum of Squares
Df Mean Square
3278.42
2
1639.21
11.9753
12
0.997943
3290.4
14
T
Statistic
22.2549
2.24986
53.5189
F-Ratio
1642.59
P-Value
0.0000
0.0440
0.0000
P-Value
0.0000
R-squared = 99.6361 percent
R-squared (adjusted for d.f.) = 99.5754 percent
Standard Error of Est. = 0.998971
Mean absolute error = 0.623333
Durbin-Watson statistic = 1.93022
4
Questions






Write down the fitted model.
Is the assumed model reliable? Why?
What is the value of R2? the adjusted R2? To select a model,
why do we prefer adj-R2 to R2?
Predict the amount of money withdrawn from a neighborhood
in which the median value of homes is $200,000 for an ATM
that is located in a shopping center.
If the median value of homes increases by $2,000, then the
amount of money withdrawn from an ATM located in a
shopping center is expected to increase by
.
If the median value of homes is $200,000, then the amount of
money withdrawn from an ATM located in a shopping center is
???; and the amount of money withdrawn from an ATM located
outside a shopping center is ???. What is the difference?
5
Two Lines with the Same Slopes but
Different Intercepts
Y  0  1 (Value)   2 ( Location)  
Plot of Fitted Model
Withdrawal
130
Location
0
1
120
110
100
90
80
120
150
180
210
240
270
Value
6
Two Lines with Different Intercepts
and Slopes
Y   0  1 (Value)   2 ( Location)  3 (Value  Location)  
Plot of Fitted Model
Withdrawal
130
Location
0
1
120
110
100
90
80
120
150
180
210
240
270
Value
7
Analyzing Categorical Data
 Do you own an iPod? ___Yes ___No
 Do you own a XBoX? ___Yes ___No
 Which of the following 4 soft drinks gives you
the highest satisfaction?




Type A ___
Type B ___
Type C ___
Type D ___
 Your gender: ____Male ____ Female
 Your nationality: _____
8
Central Limit Theorem
 In the case of sample mean
X ~ N ( ,

2
n
)
X 

n
 In the case of sample proportion
p(1  p)
pˆ ~ N ( p,
)
n
 pˆ 
p(1  p)

n
pˆ (1  pˆ )
n
9
Formulas for Proportion
100(1-)% confidence interval estimator
pˆ (1  pˆ )

p  z / 2
n
Hypothesis testing H0: p = p0
z
pˆ  p0
p0 (1  p0 )
n
10
Example 1
 A sample of 35 student information sheets
shows that 9 intend to concentrate in
Finance. Give a 99% confidence interval for
the proportion of students in the population
that intend to concentrate in Finance.
11
Margin of Error (m)
X  z / 2

n
X  t / 2
s
n
pˆ  z / 2
pˆ (1  pˆ )
n
Margin of Error (how good is
your point estimate?)
12
Example 2
 A marketing manager for a start-up firm in
Michigan wishes to discover the proportion of
teenagers in Japan who own an iPod. If the
manager wants a confidence interval of width
0.1, how many teenagers must be sampled?
Use a conservative estimate of p. Assume
that the confidence level is to be 95%.
13
Accuracy Gained
by Increasing the Sample Size
 1.96
n 


(0.5)( 0.5)
m




2
Margin of Error (B)
7%
Sample Size (n)
196
6%
5%
4%
266
384
600
3%
2%
1%
1037
2401
9604
14
Hypothesis Testing
4 out of 5 dentists recommend Oral-B.
Scenario 1: “Hmm, I thought it was higher.”
Scenario 2: “No, it cannot be. It should be lower.”
Scenario 3: “Really? I don’t think so.”
15
Example 3
 Three politicians are attempting to win the
Democratic nomination for senator.
 The result of a survey of 1000 Democrats is
summarized below.
 Do we have enough evidence to indicate that
Candidate A will receive more than 50% of the vote?
Assume  = 5%. (use the rejection region and the pvalue approaches.)
Candidate A
550
Candidate B
300
Candidate C
150
16
Example 4
 In recent years over 70% of first-year college
students responding to a national survey have
identified “being well-off financially” as an important
personal goal. A state university finds that 153 of a
random sample of 200 of its first-year students say
that this goal is important.
 Do we have evidence to support that more than 70%
of first-year students would identify being well-off as
an important personal goal? (use the rejection region
and the p-value approaches.) Assume  = 5%.
17
Example 5
 A financial analyst wanted to determine the
mean annual return on mutual funds. In a
random sample of 15 returns she found a
sample mean of 12.9% with a (sample)
standard deviation 3%.
 Is there evidence to claim that the mean
annual return on mutual funds is greater than
12%? Assume  = 5%.
18
Answer Key to Examples 1 – 3


9
9
(1  )
9
35
 2.576 35
Example 1.
35
35
Example 2. Margin of error = m = 0.10 / 2 = 0.05.
2

 1.96 0.5  0.5 
  384.16  385 .
n  

0.05


Example 3. H0: p = 0.50 vs. Ha: p > 0.50. Given  = 5%, the rejection region is:
550
 0.55 and the z statistic
Reject H0 if z > 1.645. The sample proportion pˆ 
1000
0.55  0.5
z
 3.16 is in the rejection region. Hence, reject H0. The p-value =
0.5  0.5
1000
P( z > 3.16 ) = 1 – 0.9992 = 0.0008 < . Again, reject H0.
19
Answer Key to Examples 4 – 5


Example 4. H0: p = 0.70 vs. Ha: p > 0.70. Given  = 5%, the rejection region is:
153
 0.765 and the z statistic
Reject H0 if z > 1.645. The sample proportion pˆ 
200
0.765  0.7
z
 2.00 is in the rejection region. Hence, reject H0. The p-value =
0.7  0.3
200
P( z > 2.00 ) = 1 – 0.9772 = 0.0228 < . Again, reject H0.
Example 5. H0:  = 0.12 vs. Ha:  > 0.12. Given a small sample with unknown
population standard deviation, we use the T table (Table D) to set up the rejection
region. With  = 5% and degrees of freedom n – 1 = 14, the rejection region is:
0.129  0.12
Reject H0 if t > 1.761. The t statistic t 
 1.16 is outside the
0.03 / 15
rejection region and thus, we fail to reject H0. By using Table D, we found that
we would have rejected H0 if  = 15% but would not if  = 10% or less. This
implies that the p-value of the test is between 10% and 15%.
20