Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Math 131 Labs Working with Albuquerque Home data. The Qualitative Data is NE (V 1) = Located in northeast sector of city (1) or not (0) The Quantitative Data are PRICE (V 2) = Selling price ($hundreds) and SQFT (V3) = Square feet of living space. Lab 1: Sample the data. Select a random sample of 40 from the 117 cases. Calc -> Random Data -> Sample from Columns. Sample 40 rows from columns: Select all the columns (PRICE SQFT AGE FEATS NE CUST COR TAX) Store samples in: type C1 C2 C3 C4 C5 C6 C7 C8. PRICE 945 739 729 660 876 1560 755 750 995 899 2050 780 1050 710 2150 1045 872 1020 540 1449 875 2150 730 1270 1109 749 805 975 1030 600 1250 1160 810 759 670 725 700 975 1695 1170 SQFT 1580 970 1007 1159 1156 1920 1275 1030 1500 1464 2650 1080 1920 1083 2664 1630 1229 1478 1142 1710 1173 2848 1027 1880 1740 1733 1258 1500 1540 1198 2180 1720 1365 997 1350 1140 1505 1430 2931 1928 AGE 9 4 19 * * 1 * * 15 * 13 * 8 22 6 6 6 53 * 1 6 4 * 8 4 43 7 7 6 * 17 5 * 4 * * * * 28 18 FEATS 3 4 6 0 1 5 5 1 4 2 7 3 4 4 5 4 3 3 0 3 4 6 3 6 3 6 4 3 2 4 4 4 2 4 2 3 2 3 3 8 NE 0 0 1 0 1 1 1 1 1 1 1 0 0 1 1 0 0 1 0 1 1 1 1 1 0 1 1 0 0 0 1 0 1 1 1 1 0 1 1 1 CUST 0 0 0 0 0 1 0 0 0 1 1 1 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 COR 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 0 TAX 810 541 513 225 * 1161 * 486 743 566 1639 600 944 504 1193 750 721 626 223 1010 456 1487 427 930 816 656 821 700 826 * 1141 867 673 461 622 490 591 752 1142 600 Lab 2: Organize the data into frequency distributions and graphs. A. Preparing frequency and relative Frequency distribution for the qualitative data. Stat -> Tables -> Tally Individual Variables. Choose Count and Percent. The result presented in the Session Window is: Tally for Discrete Variables: NE NE 0 1 N= Count 14 26 40 Percent 35.00 65.00 B. Preparing a Pie Chart Graph -> Pie Chart Choose Chart Raw Data. Choose NE as the Categorical Variable. Click Labels Select Slice Labels, check off Category name and Percent. Click OK twice. Pie Chart of NE Category 0 1 0 35.0% 1 65.0% C. Prepare a Pareto Chart (Bar Chart) Graph -> Bar Chart. Bars represent Counts of Unique Values. Choose Simple Table. Click OK. Choose NE as the Categorical variable. Chart of NE 25 Count 20 15 10 5 0 0 1 NE D. They both give a good idea of the relative size of each category. E. The NE has more sales. Part 2: Quantitative Data Relative Frequency Histogram for PRICE Graph -> Histogram Select Simple Click OK, Select PRICE. Click on Scale and under the tab Y-scale Type, Choose Percent click OK To include the values on the graph, click on the Data Labels tab and select Use y-value labels click OK, OK Histogram of PRICE 42.5 40 Percent 30 20 20 12.5 10 10 5 5 2.5 2.5 0 0 800 1200 1600 2000 PRICE Part 3: Quantitative Data Frequency Histogram for SQFT Graph -> Histogram Select Simple Click OK, Select SQFT. To include the values on the graph, click on the Data Labels tab and select Use y-value labels. Click on Scale and under the tab Y-scale Type, Choose Frequency click OK Histogram of SQFT 12 11 10 8 Frequency 8 7 6 5 4 4 3 2 1 0 1 0 1000 1500 2000 SQFT 2500 3000 C. Prepare a frequency polygon using class midpoints (Note I just let Minitab choose the points) Graph -> Histogram Select Simple Click OK, Select SQFT. To change the title to indicate Polygon, click on the Labels button, and under the Titles/Footnotes tab specify the title to ‘Polygon of SQFT’. To include the values on the graph, click on the Data Labels tab and select Use y-value labels. To make a polygon instead of a histogram, click on the Data view button. On the Data Display tab, remove check mark from Bars and place check mark on Symbols. Under the Smoother Tab, choose Lowess for Smoother, make the Degree of smoothing 0 and the Number of steps 1. Then click OK twice. Polygon of SQFT 12 11 10 8 Frequency 8 7 6 5 4 4 3 2 1 0 1 0 1000 1500 2000 SQFT 2500 3000 D. Prepare an ogive using upper class boundaries Place SQFT data in column C1 of a new Worksheet then sort the data to make it easier to find frequencies in each class. Data -> Sort -> Sort column C1 by Column C1. Choose Store sorted data in original column. 970,997,1007,1027,1030,1080,1083,1140,1142,1156,1159,1173,1198,1229,1258,1275,1350,1365,1430, 1464,1478,1500,1500,1505,1540,1580,1630,1710,1720,1733,1740,1880,1920,1920,1928,2180,2650 2664,2848,2931 Dividing the range by the number of classes: 2931 970 1961 217.9 . Rounding up give a class 9 9 width of 218. 970 - 1187 1188 - 1405 1406 - 1623 1624 - 1841 1842 - 2059 2060 - 2277 2278 - 2495 2496 - 2713 2714 -2931 Class Boundaries 969.5, 1187.5 1405.5 1623.5 1841.5 2059.5 2277.5 2495.5 2713.5 2931.5 Midpoints Freq 1079 1297 1515 1733 1951 2169 2387 2605 2823 12 6 8 5 4 1 0 2 2 Rel freq 0.30 0.15 0.20 0.125 0.10 0.025 0 0.05 0.05 Cum freq Cum rel freq 12 18 26 31 35 36 36 38 40 0.30 0.45 0.65 0.775 0.875 0.90 0.90 0.95 1.00 To use Minitab to plot the Ogive, in a new worksheet make the Class Boundaries column C1 and the Cum rel freq column C2 as follows: 969.5 0.000 1187.5 1405.5 1623.5 1841.5 2059.5 2277.5 2495.5 2713.5 2931.5 0.300 0.450 0.650 0.775 0.875 0.900 0.900 0.950 1.000 Then select Graph -> Scatterplot -> With Connect Line. Select C2 for the Y-variable and C1 for the Xvariable. Click on the Data View button and be sure that both Symbols and Connect line are selected. By choosing both Symbol and Connect line, Minitab will connect the dots at each data point on the graph. Click on Labels and title the ogive ‘Ogive of SQFT’. To label the points click the Data labels tab and choose use y-value labels. Click OK After the graph is created, it should be edited to show each upper class limit. Right-click on the X-axis of the graph and select Edit X scale. Enter the Position of ticks as 969.5: 2931.5/218. This tells Minitab that the tick marks should start at 969.5 and go up in steps of 218. The results are: Ogive of SQFT 1.000 1.0 0.875 0.900 0.900 0.950 0.775 0.8 Cum rel freq 0.650 0.6 0.450 0.4 0.300 0.2 0.0 0.000 969.5 1187.5 1405.5 1623.5 1841.5 2059.5 2277.5 2495.5 2713.5 2931.5 Class Boundaries E. They tell me that the number of homes with square feet around 1200 is the greatest. Lab 3: Finding descriptive statistics for the quantitative data. A-D. Stat -> Basic Statistics -> Display Descriptive Statistics Choose Price, click on Statistics Choose Mean, Median, range and standard deviation. Note Minitab does not provide Mode. Descriptive Statistics: PRICE Variable PRICE N 40 N* 0 Mean 1019.5 StDev 405.9 Median 887.5 Range 1610.0 E List the five number summary and make the box-and-whisker plot. Stat -> Basic Statistics -> Display Descriptive Statistics Choose Price, click on Statistics Choose Minimum, Maximum, First Quartile, Median, Third Quartile and InterQuartile Range. Descriptive Statistics: PRICE Variable PRICE N 40 N* 0 Minimum 540.0 Q1 741.5 Median 887.5 Q3 1147.3 Maximum 2150.0 IQR 405.8 Click on Graph->Boxplot and select Simple boxplot. Click on OK. Select PRICE for the Graph variable. To view a horizontal boxplot (rather than a vertical one) click on Scale and select Transpose value and category scales. Click on OK twice. Boxplot of PRICE 500 750 1000 1250 1500 PRICE 1750 2000 2250 Calc -> Standardize Specify PRICE as input column. Store results in C9 (an empty column). Choose Subtract mean and divide by std dev. The results for each PRICE are given. the highest PRICE (2150) has a standard score of 2.78490. The mean has a standard score of 0, and the lowest PRICE (540) has a standard score of -1.18130. Part 2: A. Estimating the mean for the SQFT variable using the technique from the text p 62 x ( x f ) n 1000 * 7 1250 *11 1500 * 8 1750 * 5 2000 * 4 2250 *1 2750 * 3 3000 *1 1575 40 From p 78: (Note on p 78 the x’s are specific values so the formula is exact; on p 79 the x’s are the midpoints of classes, so the formula is an approximation) s ( x x ) 2 f n 1 Substituting x = 1000, 1250,…,3000 and x 1575 gives s 531.7 B. Chebychev’s Theorem: P(| X | k 1 k . For the SQFT, the mean is about 1575 and the standard deviation is about 532. Thus 2 standard deviations on either side of the mean goes from 511 to 2639. Chebychev’s Theorem says that 75% of the data should lie between these two values. If we look at the histogram for SQFT in Lab 2 Part 3 we see that this is easily the case. 2 Lab 4 Finding simple probabilities, conditional probabilities and using the Multiplication and Addition Rules. Part 1 Median Price ($887.5)and Under Higher than the Median NE Corner 13 Not Northeast Corner 7 13 26 7 14 20 20 Part 2 A. B. C. D. Probability that Price is less than or equal to the median = 0.5 Probability that Price is greater than the median = 0.5 Probability of being in the NE Corner = 0.65 Probability of being in NE Corner and less than or equal to the median = 0.325 Part 3 A. B. C. D. Probability that Price is less than or equal to the median given home is in NE corner = 0.5 Let A be NE Corner and B be <= median: P(B|A) = P(B). So the events are independent Probability of being in NE Corner given Price is higher than the median = 0.65 P(A|B) = P(A), so events are independent. Part 4 Probability both homes are in NE Corner = P(A)*P(A|A) =(26/40)*(25/39) = 0.65*0.641 = 0.4167 Part 5 A. P(A or B) = P(A) + P(B) – P(AB) = 0.65 + 0.5 – 0.325 = 0.825 B. P(B) + P(B’) = 1 C. Two mutually exclusive events are being in row 1 and in row 2, and being in col 1 and col 2. Lab 5 Standardizing data, computing probabilities using the standard normal distribution, and finding values given probabilities. The mean and standard deviation for the variable SQFT were estimated using the techniques aon p 62 and 79 to be x 1575 s 531.7 Using Minitab to get the exact values: Stat -> Basic Statistics -> Display Descriptive Statistics. Select SQFT, click Statistics, Choose Mean and Standard Deviation Results for: ALBHOMEPRICESLAB.MTW Descriptive Statistics: SQFT Variable SQFT Mean 1552.3 StDev 513.8 The values are slightly different because the technique used in Lab 3 Part 2 is an estimate based on the midpoints in the ranges. Sort the SQFT data Data -> Sort -> Sort Columns: Select SQFT, Sort by: Select SQFT, Store sorted data in New worksheet. Click OK To compute the z-score in Minitab: Calc -> Standardize. Input column is SQFT. Store results in C2. Click on Subtract mean and divide by std.dev., click OK Line Number SQFT z-score = 1 20 40 L = 970 M = 1430 H = 2931 -1.13312 -0.17174 2.68318 Data that goes with following z-scores: z-score -2.50 3.20 0 0.5 SQFT None that small None that large 1540 is nearest 0 1880 is nearest 0.5 SQFT 1552.3 513.8 Normal curve Arrows locate L = 970, M = 1430, and H = 2931on it based on p in table 4 Appendix B for corresponding z-score: L = 970, z = -1.13312 (p = 0.13), M = 1430, z = -0.17174 (p = 0.43), H = 2931, z = 2.68318 (p = .996) P(L < X < H) P(L < X < M) P(M < X < H) P(X < L) P(X > M) P(X < H) P(-1.13312 < Z < 2.68318) P(-1.13312 < Z < -0.17174) P(-0.17174 < Z < 2.68318) P(Z < -1.13312) P(Z > -0.17174) P(Z < 2.68318) 1 - .1292 – (1 - .9963) = 0.1329 .4325 - .1292 = 0.3033 1 - .4325 – (1 - .9963) = 0.5702 0.1292 0.4325 0.0037 Sketch of Normal curve with area where the lowest 10% are indicated by line, determined by finding z .10 on table 4 Appendix B and translating to x as shown. z.10 1.28, x zz 1.28 * 513.8 1552.3 894.6 . 10% of my population are below 894.6 Sketch of Normal curve with area where the highest 5% are shaded, determined by finding z.95 on table 4 Appendix B and translating to x as shown. z.95 1.645, x zz 1.645 * 513.8 1552.3 2397.5 5% of my population are above 2397.5 Lab 6: Define a binomial experiment and find binomial probabilities by three different methods A. B. C. D. Success is NE Failure is not NE P(NE) = 0.65 P(not NE) = 0.35 Part 2: Can use a tree diagram as on page 176 or the Binomial Probability Formula also on p 176. Tree: NE p3 Not NE p2q NE p2q Not NE pq 2 NE p2q Not NE pq 2 NE pq 2 Not NE q3 NE NE Not NE NE Not NE Not NE Binomial Probability Formula n P( x) p x q n x x Either way, the results are: x p (x ) 3 results for p = 0.65 0.042875 0 q 1 3pq 2 0.238875 2 3 p 2q 0.443625 3 p 3 0.274625 Part 3 A: Create a table with n = 15 trials: Calc -> Probability Distributions -> Binomial -> Select Probability and enter 15 for the Number of Trials and .65 for the Probability of Success and the Input Column Probability Density Function Binomial with n = 15 and p = 0.65 x 0 1 2 3 4 5 P( X = x ) 0.000000 0.000004 0.000052 0.000422 0.002353 0.009612 6 7 8 9 10 11 12 13 14 15 B. 0.029751 0.071037 0.131926 0.190560 0.212339 0.179247 0.110962 0.047555 0.012617 0.001562 P(X = 10) = 0.212339. P(X <= 3) = 0.000478 P(X >= 7) = 0.957805 P(2 <= X <= 9) = 0.435713 Part 4. Normal Approximation to Binomial Distribution A. The Normal Approximation can be used if np >=5 and nq >= 5. Here np = 200*.65 = 130 and nq = 200*.35 = 70, so it can be used. B. Want P(80 <= X <= 120). Approximate with a normal distribution with mean and standard deviation as follows: np 130 npq 6.745 x Noting that z P(79.5 <= X <= 120.5) = P(-7.49 <= z <= -1.41) = 0.0793. Lab 7. Forming Confidence Intervals and performing hypothesis testing, determining the appropriate sample size for a selected error tolerance and forming a CI for a certain proportion where the population value is known. Part 1 Confidence Intervals A. Since we want a 95% CI, we find Z such that 0.025 is to the right (and left) in the standard normal distribution. From table 4 in Appendix B we see that z c 1.96 . 513.8 513.8 ,1552.3 1.96 ) (1393,1634) 6.32 6.32 n n 513.8 Note that the error of the estimate is: E z c 1.96 159.3 6.32 n ( x zc , x zc ) (1552.3 1.96 So I’m 95% confident that the actual mean of SQFT is between the values 1393 and 1634. B. 99% CI: we find z such that 0.005 is to the right (and left) in the standard normal distribution. From table 4 in Appendix B we see that z c 2.33 . So substituting 2.33 for z c in the CI equation gives (1363,1742). In this case E = 189.4. C. The 99% CI is larger than the 95% CI because we are more confident when we have a wider range. Part 2: Sample Size Determination z A-B. Solving the formula E z c for n gives n c n E 2 Let E = 100, so the CI = (1452, 1652). To obtain this with 95% confidence, we need a sample of size: 2 1.96 * 513.8 102 . 100 C. So, with a sample size of 102 I will be 95% confident that my sample mean for SQFT will be within 100 of the actual mean. Part 3. CI for a Proportion A. The probability of getting 2 on the role of a die is p 1 .1667 6 B. Using Minitab to simulate rolling a die 200 times, and seeing how many 2’s we get Calc -> Random data -> Integer. Generate 200 rows of data and specify column to store results in. Enter a Minimum value of 1 and a Maximum value of 6 Click OK. To see the counts: Stat -> Tables -> Tally. Specify the column. Check Counts and Percents. Click OK. Tally for Discrete Variables: C4 C4 1 2 3 Count 30 34 41 Percent 15.00 17.00 20.50 4 5 6 35 33 27 17.50 16.50 13.50 C. 95% CI for proportion. z.95 1.96, E z c pˆ qˆ .17 * .83 1.96 0.052 n 200 From the table above, 2 came up 17% of the times, so our 95% CI is (.17 - .052, .17 + .052) = (0.118,0.222). D. The CI contains the true value E. Since our confidence is 95% we’d expect 95 out of 100 students to create CI’s that contain the true value. Part 4: Hypothesis Testing on the Mean A-C,E Null Hypothesis: Mean SQFT is <= 1500. Alternative mean is > 1500. H 0 : 1500 H 1 : 1500 Level of significance, 0.05 . The alternative Hypothesis contains a > inequality. Therefore we use a right tailed test. p 326. The critical value, z.05 1.645 . So if the z value that we get is greater than this we reject the null hypothesis. We can also find the probability of getting a z value larger that what we got (the P-value) and if it is less than 0.05 we reject the null hypothesis. n 40, x 1552.3, s 513.8, x s z x x x s n n 513.8 / 6.32 81.3 1552.3 1500 0.643 513.8 6.32 Since this is less than the critical value, we cannot reject the null hypothesis. We can also look at the P-value. It is: P( z 0.643) 0.2611. This is greater than the level of significance (0.05), so we cannot reject the null hypothesis. D. If we lower the level of significance from 0.05 to 0.01, so the critical value becomes more extreme, so it makes it even harder to get z greater than the critical value. Alternatively the P-value must even be smaller. Using Minitab to obtain results with 95% confidence level Stat -> Basic Statistics -> 1 Sample z: Summarized data, Sample size = 40, Mean = 1552.3, Standard deviation = 513.8, Test mean = 1500. Click Options, Confidence level = 0.95, Alternative greater than. Click OK, OK Results: One-Sample Z Test of mu = 1500 vs > 1500 The assumed standard deviation = 513.8 N 40 Mean 1552.30 SE Mean 81.24 95% Lower Bound 1418.67 Z 0.64 P 0.260 Note, the 95% Lower Bound is the value of X at the .05 level of the normal distribution with mean = 1552.30 and standard deviation = 81.24. I.e. x z.05 x 1.645 * 81.24 1552.30 1418.67 Using Minitab to obtain results with 99% confidence level Procedure is the same as above except use confidence level = 0.99. Results One-Sample Z Test of mu = 1500 vs > 1500 The assumed standard deviation = 513.8 N 40 Mean 1552.30 SE Mean 81.24 99% Lower Bound 1363.31 Z 0.64 P 0.260 Note, the 99% Lower Bound is the value of X at the .01 level of the normal distribution with mean = 1552.30 and standard deviation = 81.24. I.e. x z.05 x 2.33 * 81.24 1552.30 13638.31 Lab 8: Chi-Square PLAN A: Chi-Square of M&M’s. 0.05 d. f 5 .205 11.071 (Table 6 Appendix B for chi-square with 5 d.f.) So rejection region is 2 11.071 The data in the observed column is from some left-over M&M’s that I have so its not an official Mars sample. Color Observed Expected (O E ) 2 E Brown Yellow Red Orange Green Blue 0 .13*89 = 11.57 11.5700 6 .14*89 = 12.46 3.3492 10 .13*89 = 11.57 0.2130 40 .20*89 = 17.8 27.6876 9 .16*89 = 14.24 1.9282 24 .24*89 = 21.36 0.3263 89 45.0744 Since 45.0744 > 11.071 we can reject the null hypothesis for the distribution of M&M’s Note we can also check if the P-value is greater than the level of significance. From Table 6 Appendix B we see that the largest number for 5 d.f. is 16.750 and this has a probability of 0.005. So the P-value is much less than 0.05 and we reject the null hypothesis. Note we can use Minitab to compute the Chi Square for Goodness of Fit. We enter the data and then use the Calculator as explained in the class notes. Sum of C4 Sum of C4 = 45.0744 Then we use Calc -> Probability Distributions -> Chi Square, select Cumulative Probability and enter 5 Degrees of Freedom. Enter the value of the test statistic (45.0744) for the Input Constant. click OK The results are: Cumulative Distribution Function Chi-Square with 5 DF x 45.0744 P( X <= x ) 1.00000 PLAN B: PRICE Median Price ($887.5)and Under Higher than the Median NE Corner 13 Not Northeast Corner 7 20 13 26 7 14 20 40 If PRICE and NE Corner are independent, we’d expect Freq(PRICE <= Median and NE Corner) = Freq(PRICE <= Median)*Freq(NE Corner)/Total The following table shows the expected frequencies if the variables are independent 20*26/40 = 13 20*14/40 = 7 20*26/40 = 13 20*14/40 = 7 Note that the expected is identical to the observed, so we know they are independent even without applying the Chi Square test. However, we will do it anyway. Null Hypothesis: PRICE is independent of NE Corner Alternative Hypothesis: PRICE and NE Corner are dependent 0.05 d . f . (r 1)(c 1) 1 .205 3.841. So the rejection region is 2 3.841 2 (O E ) 2 0 E Since 0 < 3.841 we cannot reject the null hypothesis. Note another way to solve is to compare the P-Value to The P-value is the probability of getting a more extreme value than what we got.. In our case the P-value is 1. So this also shows that we can’t reject the null hypothesis. We can use Minitab to compute the Chi Square Independence Test. We enter the data and then choose Stat -> Tables -> Chi-square Test, On the Chi-square Test dialog box, enter the appropriate columns for the Columns containing the table and click OK The results are: Results for: Worksheet 4 Chi-Square Test: NE, Not NE Expected counts are printed below observed counts Chi-Square contributions are printed below expected counts NE 13 13.00 0.000 Not NE 7 7.00 0.000 Total 20 2 13 13.00 0.000 7 7.00 0.000 20 Total 26 14 40 1 Chi-Sq = 0.000, DF = 1, P-Value = 1.000 Lab 9: Linear correlation, regression and prediction 1. Let SQFT be the explanatory variable and PRICE be the response variable. Using Minitab to construct a scatter plot: Graph -> Scatterplot -> Simple Click OK Enter PRICE for the Y variable and SQFT for the X variable. Click OK. The results are: Scatterplot of PRICE vs SQFT 2250 2000 PRICE 1750 1500 1250 1000 750 500 1000 1500 2000 SQFT 2500 3000 2. The scatterplot indicates that the variables are correlated. 3. Using Minitab to find the correlation coefficient Stat -> Basic Statistics -> Correlation. Choose PRICE and SQFT for the Variables and click OK The results are: Results for: ALBHOMEPRICESLAB.MTW Scatterplot of PRICE vs SQFT Correlations: PRICE, SQFT Pearson correlation of PRICE and SQFT = 0.910 P-Value = 0.000 4. Hypothesis test Two tailed test H0 : 0 Ha : 0 0.05 Comparing the P-value above to the level of significance, , we conclude that the correlation is significant. We can also use Table 11, Appendix B. This shows that for a sample of size 40, if r 0.312 , reject H 0 . Since r 0.910 , we reject H 0 . 5. Using Minitab to determine the regression equation Stat -> Regression -> Regression. Choose the Predictor, SQFT and the response, PRICE, click on Results and choose Regression equation, table of coefficients, s, R-squared and basic analysis of variance. Click OK, then click on Storage and choose Residuals. Click OK, OK. The results are: Regression Analysis: PRICE versus SQFT The regression equation is PRICE = - 96.3 + 0.719 SQFT Predictor Constant SQFT Coef -96.35 0.71887 S = 170.511 SE Coef 86.77 0.05314 R-Sq = 82.8% T -1.11 13.53 P 0.274 0.000 R-Sq(adj) = 82.4% Analysis of Variance Source Regression Residual Error Total DF 1 38 39 SS 5321578 1104810 6426388 MS 5321578 29074 F 183.04 P 0.000 From the above, we see that the regression equation is: PRICE = - 96.3 + 0.719 SQFT. Since the correlation is significant, we can use this equation for predictive purposes. 6. The slope is 0.719. Since the PRICE is in hundreds of dollars, this means that the price increases $71.9 per square foot. 7. The y-intercept is -96.3. This means that if SQFT = 0, PRICE = -$9630. Since we can’t have a house with 0 SQFT, this is meaningless. 8. SQFT 1000 PRIICE 622.7 ($62270) 2000 1341.7 ($134170) Using Minitab to draw the regression line: Stat -> Regression -> Fitted line plot, choose PRICE for the Response (Y) and SQFT for the PREDICTOR (X), select Linear for the Type of Regression Model, and click OK. Fitted Line Plot PRICE = - 96.35 + 0.7189 SQFT 2250 S R-Sq R-Sq(adj) 2000 170.511 82.8% 82.4% PRICE 1750 1500 1250 1000 750 500 1000 1500 2000 SQFT 2500 3000 9. I predict that a 2000 square foot house will sell for about $134170. 10.Summary: Price increases about $71.9 per square foot. Lab 10. Summary Part 1 V2 is PRICE Mean = 1019.5 Median = 887.5 Median is better because it is not affected by abnormally large values. Part 2 Lab 4 indicated that there is no relation between being in the NE corner and the PRICE. Since I don’t know about Albuquerque, I didn’t know whether there would be a relationship. 95% CI for SQFT is (1393, 1634) 99% CI for SQFT is (1363,1742). The 99% CI is larger than the 95% CI because we are more confident when we have a wider range. Part 4. There is a significant relationship between SQFT and PRICE. The high correlation (or low P-value) indicates this. PRICE increases about $71.9 per SQFT. Part 5. This population is very good because it demonstrated most of the concepts well. Lab 9, correlation and regression was very instructive because it demonstrated the relationship between SQFT and PRICE.