Download Part 1 Confidence Intervals

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Math 131 Labs
Working with Albuquerque Home data. The Qualitative Data is NE (V 1) = Located in northeast sector of
city (1) or not (0) The Quantitative Data are PRICE (V 2) = Selling price ($hundreds) and SQFT (V3) =
Square feet of living space.
Lab 1: Sample the data.
Select a random sample of 40 from the 117 cases.
Calc -> Random Data -> Sample from Columns. Sample 40 rows from columns: Select all the columns
(PRICE SQFT AGE FEATS NE CUST COR TAX)
Store samples in: type C1 C2 C3 C4 C5 C6 C7 C8.
PRICE
945
739
729
660
876
1560
755
750
995
899
2050
780
1050
710
2150
1045
872
1020
540
1449
875
2150
730
1270
1109
749
805
975
1030
600
1250
1160
810
759
670
725
700
975
1695
1170
SQFT
1580
970
1007
1159
1156
1920
1275
1030
1500
1464
2650
1080
1920
1083
2664
1630
1229
1478
1142
1710
1173
2848
1027
1880
1740
1733
1258
1500
1540
1198
2180
1720
1365
997
1350
1140
1505
1430
2931
1928
AGE
9
4
19
*
*
1
*
*
15
*
13
*
8
22
6
6
6
53
*
1
6
4
*
8
4
43
7
7
6
*
17
5
*
4
*
*
*
*
28
18
FEATS
3
4
6
0
1
5
5
1
4
2
7
3
4
4
5
4
3
3
0
3
4
6
3
6
3
6
4
3
2
4
4
4
2
4
2
3
2
3
3
8
NE
0
0
1
0
1
1
1
1
1
1
1
0
0
1
1
0
0
1
0
1
1
1
1
1
0
1
1
0
0
0
1
0
1
1
1
1
0
1
1
1
CUST
0
0
0
0
0
1
0
0
0
1
1
1
0
0
1
0
0
0
0
1
0
1
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
COR
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
1
1
0
1
0
0
0
0
1
1
0
1
0
TAX
810
541
513
225
*
1161
*
486
743
566
1639
600
944
504
1193
750
721
626
223
1010
456
1487
427
930
816
656
821
700
826
*
1141
867
673
461
622
490
591
752
1142
600
Lab 2: Organize the data into frequency distributions and graphs.
A. Preparing frequency and relative Frequency distribution for the qualitative data.
Stat -> Tables -> Tally Individual Variables. Choose Count and Percent. The result presented in the
Session Window is:
Tally for Discrete Variables: NE
NE
0
1
N=
Count
14
26
40
Percent
35.00
65.00
B. Preparing a Pie Chart
Graph -> Pie Chart Choose Chart Raw Data. Choose NE as the Categorical Variable. Click Labels
Select Slice Labels, check off Category name and Percent. Click OK twice.
Pie Chart of NE
Category
0
1
0
35.0%
1
65.0%
C. Prepare a Pareto Chart (Bar Chart)
Graph -> Bar Chart. Bars represent Counts of Unique Values. Choose Simple Table. Click OK. Choose
NE as the Categorical variable.
Chart of NE
25
Count
20
15
10
5
0
0
1
NE
D. They both give a good idea of the relative size of each category.
E. The NE has more sales.
Part 2: Quantitative Data
Relative Frequency Histogram for PRICE
Graph -> Histogram Select Simple Click OK, Select PRICE. Click on Scale and under the tab Y-scale
Type, Choose Percent click OK To include the values on the graph, click on the Data Labels tab and
select Use y-value labels click OK, OK
Histogram of PRICE
42.5
40
Percent
30
20
20
12.5
10
10
5
5
2.5
2.5
0
0
800
1200
1600
2000
PRICE
Part 3: Quantitative Data
Frequency Histogram for SQFT
Graph -> Histogram Select Simple Click OK, Select SQFT. To include the values on the graph, click on
the Data Labels tab and select Use y-value labels. Click on Scale and under the tab Y-scale Type, Choose
Frequency click OK
Histogram of SQFT
12
11
10
8
Frequency
8
7
6
5
4
4
3
2
1
0
1
0
1000
1500
2000
SQFT
2500
3000
C. Prepare a frequency polygon using class midpoints (Note I just let Minitab choose the points)
Graph -> Histogram Select Simple Click OK, Select SQFT. To change the title to indicate Polygon, click
on the Labels button, and under the Titles/Footnotes tab specify the title to ‘Polygon of SQFT’. To include
the values on the graph, click on the Data Labels tab and select Use y-value labels. To make a polygon
instead of a histogram, click on the Data view button. On the Data Display tab, remove check mark from
Bars and place check mark on Symbols. Under the Smoother Tab, choose Lowess for Smoother, make
the Degree of smoothing 0 and the Number of steps 1. Then click OK twice.
Polygon of SQFT
12
11
10
8
Frequency
8
7
6
5
4
4
3
2
1
0
1
0
1000
1500
2000
SQFT
2500
3000
D. Prepare an ogive using upper class boundaries
Place SQFT data in column C1 of a new Worksheet then sort the data to make it easier to find frequencies
in each class.
Data -> Sort -> Sort column C1 by Column C1. Choose Store sorted data in original column.
970,997,1007,1027,1030,1080,1083,1140,1142,1156,1159,1173,1198,1229,1258,1275,1350,1365,1430,
1464,1478,1500,1500,1505,1540,1580,1630,1710,1720,1733,1740,1880,1920,1920,1928,2180,2650
2664,2848,2931
Dividing the range by the number of classes:
2931  970 1961

 217.9 . Rounding up give a class
9
9
width of 218.
970 - 1187
1188 - 1405
1406 - 1623
1624 - 1841
1842 - 2059
2060 - 2277
2278 - 2495
2496 - 2713
2714 -2931
Class
Boundaries
969.5, 1187.5
1405.5
1623.5
1841.5
2059.5
2277.5
2495.5
2713.5
2931.5
Midpoints
Freq
1079
1297
1515
1733
1951
2169
2387
2605
2823
12
6
8
5
4
1
0
2
2
Rel
freq
0.30
0.15
0.20
0.125
0.10
0.025
0
0.05
0.05
Cum freq
Cum rel freq
12
18
26
31
35
36
36
38
40
0.30
0.45
0.65
0.775
0.875
0.90
0.90
0.95
1.00
To use Minitab to plot the Ogive, in a new worksheet make the Class Boundaries column C1 and the Cum
rel freq column C2 as follows:
969.5 0.000
1187.5
1405.5
1623.5
1841.5
2059.5
2277.5
2495.5
2713.5
2931.5
0.300
0.450
0.650
0.775
0.875
0.900
0.900
0.950
1.000
Then select Graph -> Scatterplot -> With Connect Line. Select C2 for the Y-variable and C1 for the Xvariable. Click on the Data View button and be sure that both Symbols and Connect line are selected. By
choosing both Symbol and Connect line, Minitab will connect the dots at each data point on the graph.
Click on Labels and title the ogive ‘Ogive of SQFT’. To label the points click the Data labels tab and
choose use y-value labels. Click OK After the graph is created, it should be edited to show each upper
class limit. Right-click on the X-axis of the graph and select Edit X scale. Enter the Position of ticks as
969.5: 2931.5/218. This tells Minitab that the tick marks should start at 969.5 and go up in steps of 218.
The results are:
Ogive of SQFT
1.000
1.0
0.875
0.900
0.900
0.950
0.775
0.8
Cum rel freq
0.650
0.6
0.450
0.4
0.300
0.2
0.0
0.000
969.5 1187.5 1405.5 1623.5 1841.5 2059.5 2277.5 2495.5 2713.5 2931.5
Class Boundaries
E. They tell me that the number of homes with square feet around 1200 is the greatest.
Lab 3: Finding descriptive statistics for the quantitative data.
A-D. Stat -> Basic Statistics -> Display Descriptive Statistics Choose Price, click on Statistics Choose
Mean, Median, range and standard deviation. Note Minitab does not provide Mode.
Descriptive Statistics: PRICE
Variable
PRICE
N
40
N*
0
Mean
1019.5
StDev
405.9
Median
887.5
Range
1610.0
E List the five number summary and make the box-and-whisker plot.
Stat -> Basic Statistics -> Display Descriptive Statistics Choose Price, click on Statistics Choose
Minimum, Maximum, First Quartile, Median, Third Quartile and InterQuartile Range.
Descriptive Statistics: PRICE
Variable
PRICE
N
40
N*
0
Minimum
540.0
Q1
741.5
Median
887.5
Q3
1147.3
Maximum
2150.0
IQR
405.8
Click on Graph->Boxplot and select Simple boxplot. Click on OK. Select PRICE for the Graph variable.
To view a horizontal boxplot (rather than a vertical one) click on Scale and select Transpose value and
category scales. Click on OK twice.
Boxplot of PRICE
500
750
1000
1250
1500
PRICE
1750
2000
2250
Calc -> Standardize Specify PRICE as input column. Store results in C9 (an empty column). Choose
Subtract mean and divide by std dev.
The results for each PRICE are given. the highest PRICE (2150) has a standard score of 2.78490. The mean
has a standard score of 0, and the lowest PRICE (540) has a standard score of -1.18130.
Part 2:
A. Estimating the mean for the SQFT variable using the technique from the text p 62
x
( x  f )

n
1000 * 7  1250 *11  1500 * 8  1750 * 5  2000 * 4  2250 *1  2750 * 3  3000 *1
 1575
40
From p 78: (Note on p 78 the x’s are specific values so the formula is exact; on p 79 the x’s are the
midpoints of classes, so the formula is an approximation)
s
( x  x ) 2 f
n 1
Substituting x = 1000, 1250,…,3000 and
x  1575 gives s  531.7
B. Chebychev’s Theorem: P(| X   | k  1 k .
For the SQFT, the mean is about 1575 and the standard deviation is about 532. Thus 2 standard deviations
on either side of the mean goes from 511 to 2639. Chebychev’s Theorem says that 75% of the data should
lie between these two values. If we look at the histogram for SQFT in Lab 2 Part 3 we see that this is easily
the case.
2
Lab 4 Finding simple probabilities, conditional probabilities and using the Multiplication and Addition
Rules.
Part 1
Median Price ($887.5)and
Under
Higher than the Median
NE Corner
13
Not Northeast Corner
7
13
26
7
14
20
20
Part 2
A.
B.
C.
D.
Probability that Price is less than or equal to the median = 0.5
Probability that Price is greater than the median = 0.5
Probability of being in the NE Corner = 0.65
Probability of being in NE Corner and less than or equal to the median = 0.325
Part 3
A.
B.
C.
D.
Probability that Price is less than or equal to the median given home is in NE corner = 0.5
Let A be NE Corner and B be <= median: P(B|A) = P(B). So the events are independent
Probability of being in NE Corner given Price is higher than the median = 0.65
P(A|B) = P(A), so events are independent.
Part 4
Probability both homes are in NE Corner = P(A)*P(A|A) =(26/40)*(25/39) = 0.65*0.641 = 0.4167
Part 5
A. P(A or B) = P(A) + P(B) – P(AB) = 0.65 + 0.5 – 0.325 = 0.825
B. P(B) + P(B’) = 1
C. Two mutually exclusive events are being in row 1 and in row 2, and being in col 1 and col 2.
Lab 5 Standardizing data, computing probabilities using the standard normal distribution, and finding
values given probabilities.
The mean and standard deviation for the variable SQFT were estimated using the techniques aon p 62 and
79 to be
x  1575
s  531.7
Using Minitab to get the exact values:
Stat -> Basic Statistics -> Display Descriptive Statistics. Select SQFT, click Statistics, Choose Mean and
Standard Deviation
Results for: ALBHOMEPRICESLAB.MTW
Descriptive Statistics: SQFT
Variable
SQFT
Mean
1552.3
StDev
513.8
The values are slightly different because the technique used in Lab 3 Part 2 is an estimate based on the
midpoints in the ranges.
Sort the SQFT data
Data -> Sort -> Sort Columns: Select SQFT, Sort by: Select SQFT, Store sorted data in New
worksheet. Click OK
To compute the z-score in Minitab:
Calc -> Standardize. Input column is SQFT. Store results in C2. Click on Subtract mean and divide
by std.dev., click OK
Line Number
SQFT
z-score =
1
20
40
L = 970
M = 1430
H = 2931
-1.13312
-0.17174
2.68318
Data that goes with following z-scores:
z-score
-2.50
3.20
0
0.5
SQFT
None that small
None that large
1540 is nearest 0
1880 is nearest 0.5
SQFT  1552.3
513.8
Normal curve Arrows locate L = 970, M = 1430, and H = 2931on it based on p in table 4 Appendix B for
corresponding z-score:
L = 970, z = -1.13312 (p = 0.13), M = 1430, z = -0.17174 (p = 0.43), H = 2931, z = 2.68318 (p = .996)
P(L < X < H)
P(L < X < M)
P(M < X < H)
P(X < L)
P(X > M)
P(X < H)
P(-1.13312 < Z < 2.68318)
P(-1.13312 < Z < -0.17174)
P(-0.17174 < Z < 2.68318)
P(Z < -1.13312)
P(Z > -0.17174)
P(Z < 2.68318)
1 - .1292 – (1 - .9963) = 0.1329
.4325 - .1292 = 0.3033
1 - .4325 – (1 - .9963) = 0.5702
0.1292
0.4325
0.0037
Sketch of Normal curve with area where the lowest 10% are indicated by line, determined by finding z .10 on
table 4 Appendix B and translating to x as shown.
z.10  1.28, x  zz    1.28 * 513.8  1552.3  894.6 .
10% of my population are below 894.6
Sketch of Normal curve with area where the highest 5% are shaded, determined by finding z.95 on table 4
Appendix B and translating to x as shown.
z.95  1.645, x  zz    1.645 * 513.8  1552.3  2397.5
5% of my population are above 2397.5
Lab 6: Define a binomial experiment and find binomial probabilities by three different methods
A.
B.
C.
D.
Success is NE
Failure is not NE
P(NE) = 0.65
P(not NE) = 0.35
Part 2:
Can use a tree diagram as on page 176 or the Binomial Probability Formula also on p 176.
Tree:
NE
p3
Not NE
p2q
NE
p2q
Not NE
pq 2
NE
p2q
Not NE
pq 2
NE
pq 2
Not NE
q3
NE
NE
Not NE
NE
Not NE
Not NE
Binomial Probability Formula
 n
P( x)    p x q n  x
 x
Either way, the results are:
x
p (x )
3
results for p = 0.65
0.042875
0
q
1
3pq 2
0.238875
2
3 p 2q
0.443625
3
p
3
0.274625
Part 3
A: Create a table with n = 15 trials:
Calc -> Probability Distributions -> Binomial -> Select Probability and enter 15 for the Number of Trials
and .65 for the Probability of Success and the Input Column
Probability Density Function
Binomial with n = 15 and p = 0.65
x
0
1
2
3
4
5
P( X = x )
0.000000
0.000004
0.000052
0.000422
0.002353
0.009612
6
7
8
9
10
11
12
13
14
15
B.
0.029751
0.071037
0.131926
0.190560
0.212339
0.179247
0.110962
0.047555
0.012617
0.001562




P(X = 10) = 0.212339.
P(X <= 3) = 0.000478
P(X >= 7) = 0.957805
P(2 <= X <= 9) = 0.435713
Part 4. Normal Approximation to Binomial Distribution
A.
The Normal Approximation can be used if np >=5 and nq >= 5. Here np = 200*.65 = 130 and nq =
200*.35 = 70, so it can be used.
B.
Want P(80 <= X <= 120). Approximate with a normal distribution with mean and standard deviation as
follows:
  np  130
  npq  6.745
x
Noting that z 

P(79.5 <= X <= 120.5) = P(-7.49 <= z <= -1.41) = 0.0793.
Lab 7. Forming Confidence Intervals and performing hypothesis testing, determining the appropriate
sample size for a selected error tolerance and forming a CI for a certain proportion where the population
value is known.
Part 1 Confidence Intervals
A.
Since we want a 95% CI, we find Z such that 0.025 is to the right (and left) in the standard normal
distribution. From table 4 in Appendix B we see that z c  1.96 .


513.8
513.8
,1552.3  1.96
)  (1393,1634)
6.32
6.32
n
n

513.8
Note that the error of the estimate is: E  z c
 1.96
 159.3
6.32
n
( x  zc
, x  zc
)  (1552.3  1.96
So I’m 95% confident that the actual mean of SQFT is between the values 1393 and 1634.
B.
99% CI: we find z such that 0.005 is to the right (and left) in the standard normal distribution. From table 4
in Appendix B we see that z c  2.33 . So substituting 2.33 for z c in the CI equation gives (1363,1742). In
this case E = 189.4.
C.
The 99% CI is larger than the 95% CI because we are more confident when we have a wider range.
Part 2: Sample Size Determination
z  
A-B. Solving the formula E  z c
for n gives n   c 
n
 E 

2
Let E = 100, so the CI = (1452, 1652). To obtain this with 95% confidence, we need a sample of size:
2
 1.96 * 513.8 

  102 .
100


C. So, with a sample size of 102 I will be 95% confident that my sample mean for SQFT will be within 100
of the actual mean.
Part 3. CI for a Proportion
A. The probability of getting 2 on the role of a die is p 
1
 .1667
6
B. Using Minitab to simulate rolling a die 200 times, and seeing how many 2’s we get
Calc -> Random data -> Integer. Generate 200 rows of data and specify column to store results in.
Enter a Minimum value of 1 and a Maximum value of 6 Click OK.
To see the counts:
Stat -> Tables -> Tally. Specify the column. Check Counts and Percents. Click OK.
Tally for Discrete Variables: C4
C4
1
2
3
Count
30
34
41
Percent
15.00
17.00
20.50
4
5
6
35
33
27
17.50
16.50
13.50
C. 95% CI for proportion.
z.95  1.96, E  z c
pˆ qˆ
.17 * .83
 1.96
 0.052
n
200
From the table above, 2 came up 17% of the times, so our 95% CI is (.17 - .052, .17 + .052) =
(0.118,0.222).
D. The CI contains the true value
E. Since our confidence is 95% we’d expect 95 out of 100 students to create CI’s that contain the true
value.
Part 4: Hypothesis Testing on the Mean
A-C,E Null Hypothesis: Mean SQFT is <= 1500. Alternative mean is > 1500.
H 0 :   1500
H 1 :   1500
Level of significance,
  0.05 . The alternative Hypothesis contains a > inequality. Therefore we use a
right tailed test. p 326. The critical value, z.05
 1.645 . So if the z value that we get is greater than this we
reject the null hypothesis. We can also find the probability of getting a z value larger that what we got (the
P-value) and if it is less than 0.05 we reject the null hypothesis.
n  40, x  1552.3, s  513.8,  x  s
z
x
x

x
s
n

n  513.8 / 6.32  81.3
1552.3  1500
 0.643
513.8 6.32
Since this is less than the critical value, we cannot reject the null hypothesis.
We can also look at the P-value. It is: P( z  0.643)  0.2611. This is greater than the level of
significance (0.05), so we cannot reject the null hypothesis.
D. If we lower the level of significance from 0.05 to 0.01, so the critical value becomes more extreme, so it
makes it even harder to get z greater than the critical value. Alternatively the P-value must even be smaller.
Using Minitab to obtain results with 95% confidence level
Stat -> Basic Statistics -> 1 Sample z: Summarized data, Sample size = 40, Mean = 1552.3, Standard
deviation = 513.8, Test mean = 1500. Click Options, Confidence level = 0.95, Alternative greater than.
Click OK, OK
Results:
One-Sample Z
Test of mu = 1500 vs > 1500
The assumed standard deviation = 513.8
N
40
Mean
1552.30
SE Mean
81.24
95%
Lower
Bound
1418.67
Z
0.64
P
0.260
Note, the 95% Lower Bound is the value of X at the .05 level of the normal distribution with mean =
1552.30 and standard deviation = 81.24. I.e.
x  z.05 x    1.645 * 81.24  1552.30  1418.67
Using Minitab to obtain results with 99% confidence level
Procedure is the same as above except use confidence level = 0.99.
Results
One-Sample Z
Test of mu = 1500 vs > 1500
The assumed standard deviation = 513.8
N
40
Mean
1552.30
SE Mean
81.24
99%
Lower
Bound
1363.31
Z
0.64
P
0.260
Note, the 99% Lower Bound is the value of X at the .01 level of the normal distribution with mean =
1552.30 and standard deviation = 81.24. I.e.
x  z.05 x    2.33 * 81.24  1552.30  13638.31
Lab 8: Chi-Square
PLAN A: Chi-Square of M&M’s.
  0.05
d. f  5
 .205  11.071 (Table 6 Appendix B for chi-square with 5 d.f.) So rejection region is  2  11.071
The data in the observed column is from some left-over M&M’s that I have so its not an official Mars
sample.
Color
Observed
Expected
(O  E ) 2
E
Brown
Yellow
Red
Orange
Green
Blue
0
.13*89 = 11.57
11.5700
6
.14*89 = 12.46
3.3492
10
.13*89 = 11.57
0.2130
40
.20*89 = 17.8
27.6876
9
.16*89 = 14.24
1.9282
24
.24*89 = 21.36
0.3263
89
  45.0744
Since 45.0744 > 11.071 we can reject the null hypothesis for the distribution of M&M’s
Note we can also check if the P-value is greater than the level of significance. From Table 6 Appendix B
we see that the largest number for 5 d.f. is 16.750 and this has a probability of 0.005. So the P-value is
much less than 0.05 and we reject the null hypothesis.
Note we can use Minitab to compute the Chi Square for Goodness of Fit. We enter the data and then use
the Calculator as explained in the class notes.
Sum of C4
Sum of C4 = 45.0744
Then we use Calc -> Probability Distributions -> Chi Square, select Cumulative Probability and enter 5
Degrees of Freedom. Enter the value of the test statistic (45.0744) for the Input Constant. click OK
The results are:
Cumulative Distribution Function
Chi-Square with 5 DF
x
45.0744
P( X <= x )
1.00000
PLAN B:
PRICE
Median Price ($887.5)and
Under
Higher than the Median
NE Corner
13
Not Northeast Corner
7
20
13
26
7
14
20
40
If PRICE and NE Corner are independent, we’d expect
Freq(PRICE <= Median and NE Corner) = Freq(PRICE <= Median)*Freq(NE Corner)/Total
The following table shows the expected frequencies if the variables are independent
20*26/40 = 13
20*14/40 = 7
20*26/40 = 13
20*14/40 = 7
Note that the expected is identical to the observed, so we know they are independent even without applying
the Chi Square test. However, we will do it anyway.
Null Hypothesis: PRICE is independent of NE Corner
Alternative Hypothesis: PRICE and NE Corner are dependent
  0.05
d . f .  (r  1)(c  1)  1
 .205  3.841. So the rejection region is  2  3.841
2  
(O  E ) 2
0
E
Since 0 < 3.841 we cannot reject the null hypothesis.
Note another way to solve is to compare the P-Value to 
The P-value is the probability of getting a more extreme value than what we got.. In our case the P-value is
1. So this also shows that we can’t reject the null hypothesis.
We can use Minitab to compute the Chi Square Independence Test. We enter the data and then choose Stat
-> Tables -> Chi-square Test, On the Chi-square Test dialog box, enter the appropriate columns for the
Columns containing the table and click OK
The results are:
Results for: Worksheet 4
Chi-Square Test: NE, Not NE
Expected counts are printed below observed counts
Chi-Square contributions are printed below expected counts
NE
13
13.00
0.000
Not NE
7
7.00
0.000
Total
20
2
13
13.00
0.000
7
7.00
0.000
20
Total
26
14
40
1
Chi-Sq = 0.000, DF = 1, P-Value = 1.000
Lab 9: Linear correlation, regression and prediction
1. Let SQFT be the explanatory variable and PRICE be the response variable. Using Minitab to construct a
scatter plot:
Graph -> Scatterplot -> Simple Click OK Enter PRICE for the Y variable and SQFT for the X variable.
Click OK.
The results are:
Scatterplot of PRICE vs SQFT
2250
2000
PRICE
1750
1500
1250
1000
750
500
1000
1500
2000
SQFT
2500
3000
2. The scatterplot indicates that the variables are correlated.
3. Using Minitab to find the correlation coefficient
Stat -> Basic Statistics -> Correlation. Choose PRICE and SQFT for the Variables and click OK
The results are:
Results for: ALBHOMEPRICESLAB.MTW
Scatterplot of PRICE vs SQFT
Correlations: PRICE, SQFT
Pearson correlation of PRICE and SQFT = 0.910
P-Value = 0.000
4. Hypothesis test
Two tailed test
H0 :   0
Ha :   0
  0.05
Comparing the P-value above to the level of significance,  , we conclude that the correlation is
significant.
We can also use Table 11, Appendix B. This shows that for a sample of size 40,
if
r  0.312 , reject H 0 . Since r  0.910 , we reject H 0 .
5. Using Minitab to determine the regression equation
Stat -> Regression -> Regression. Choose the Predictor, SQFT and the response, PRICE, click on Results
and choose Regression equation, table of coefficients, s, R-squared and basic analysis of variance. Click
OK, then click on Storage and choose Residuals. Click OK, OK.
The results are:
Regression Analysis: PRICE versus SQFT
The regression equation is
PRICE = - 96.3 + 0.719 SQFT
Predictor
Constant
SQFT
Coef
-96.35
0.71887
S = 170.511
SE Coef
86.77
0.05314
R-Sq = 82.8%
T
-1.11
13.53
P
0.274
0.000
R-Sq(adj) = 82.4%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
38
39
SS
5321578
1104810
6426388
MS
5321578
29074
F
183.04
P
0.000
From the above, we see that the regression equation is:
PRICE = - 96.3 + 0.719 SQFT.
Since the correlation is significant, we can use this equation for predictive purposes.
6. The slope is 0.719. Since the PRICE is in hundreds of dollars, this means that the price increases $71.9 per square
foot.
7. The y-intercept is -96.3. This means that if SQFT = 0, PRICE = -$9630. Since we can’t have a house with 0 SQFT,
this is meaningless.
8.
SQFT
1000
PRIICE
622.7 ($62270)
2000
1341.7 ($134170)
Using Minitab to draw the regression line:
Stat -> Regression -> Fitted line plot, choose PRICE for the Response (Y) and SQFT for the
PREDICTOR (X), select Linear for the Type of Regression Model, and click OK.
Fitted Line Plot
PRICE = - 96.35 + 0.7189 SQFT
2250
S
R-Sq
R-Sq(adj)
2000
170.511
82.8%
82.4%
PRICE
1750
1500
1250
1000
750
500
1000
1500
2000
SQFT
2500
3000
9. I predict that a 2000 square foot house will sell for about $134170.
10.Summary: Price increases about $71.9 per square foot.
Lab 10. Summary
Part 1
V2 is PRICE
Mean = 1019.5 Median = 887.5
Median is better because it is not affected by abnormally large values.
Part 2
Lab 4 indicated that there is no relation between being in the NE corner and the PRICE.
Since I don’t know about Albuquerque, I didn’t know whether there would be a relationship.
95% CI for SQFT is (1393, 1634)
99% CI for SQFT is (1363,1742).
The 99% CI is larger than the 95% CI because we are more confident when we have a wider range.
Part 4.
There is a significant relationship between SQFT and PRICE. The high correlation (or low P-value)
indicates this.
PRICE increases about $71.9 per SQFT.
Part 5.
This population is very good because it demonstrated most of the concepts well.
Lab 9, correlation and regression was very instructive because it demonstrated the relationship between
SQFT and PRICE.