Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Five-Number Summary 1 Smallest Value 2 First Quartile 3 Median 4 Third Quartile 5 Largest Value Five-Number Summary Lowest Value = 425 First Quartile = 445 Median = 475 Third Quartile = 525 425 440 450 465 480 510 575 430 440 450 470 485 515 575 430 440 450 470 490 525 580 435 445 450 472 490 525 590 435 445 450 475 490 525 600 Largest Value = 615 435 445 460 475 500 535 600 435 445 460 475 500 549 600 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615 Box Plot A box is drawn with its ends located at the first and third quartiles. A vertical line is drawn in the box at the location of the median (second quartile). 375 400 425 450 475 500 525 550 575 600 625 Q1 = 445 Q3 = 525 Q2 = 475 Box Plot Limits are located (not drawn) using the interquartile range (IQR). Data outside these limits are considered outliers. The locations of each outlier is shown with the symbol * . … continued Box Plot The lower limit is located 1.5(IQR) below Q1. Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(75) = 332.5 The upper limit is located 1.5(IQR) above Q3. Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(75) = 637.5 There are no outliers (values less than 332.5 or greater than 637.5) in the apartment rent data. Box Plot • Whiskers (dashed lines) are drawn from the ends of the box to the smallest and largest data values inside the limits. 375 400 425 450 475 500 525 550 575 600 625 Smallest value inside limits = 425 Largest value inside limits = 615 Measures of Association Between two Variables •Covariance •Correlation coefficient Covariance • Covariance is a measure of linear association between variables. • Positive values indicate a positive correlation between variables. • Negative values indicate a negative correlation between variables. To compute a covariance for variables x and y xy ( xi x )( yi u y ) For populations N ( xi x )( yi y ) s xy n 1 For samples Mortgage Interest Rates and Monthly Home Sales, 1980-2004 17 Mortgage Interest Rate (Percent) n = 299 x 60.3 II 15 I 13 11 y 9.02 9 IV 7 III 5 3 15 35 55 75 95 Monthly Home Sales (thousands) 115 If the majority of the sample points are located in quadrants II and IV, you have a negative correlation between the variables— as we do in this case. Thus the covariance will have a negative sign. The (Pearson) Correlation Coefficient A covariance will tell you if 2 variables are positively or negatively correlated—but it will not tell you the degree of correlation. Moreover, the covariance is sensitive to the unit of measurement. The correlation coefficient does not suffer from these defects The (Pearson) Correlation Coefficient rxy s xy sx s y xy xy x y Note that: For populations For samples 1 xy 1 and 1 rxy 1 Distance Traveled in 5 Hours (Miles) Correlation Coefficient = 1 500 400 300 200 100 0 0 20 40 60 Average Speed (MPH) 80 100 I have 7 hours per week for exercise Time Spent Swimming (Hours) Correlation Coefficient = -1 8 7 6 5 4 3 2 1 0 0 2 4 6 Time Spent Jogging (Hours) 8 Example: Golf Stats A golfer is interested in investigating the relationship, if any, between driving distance and 18-hole score. Average Driving Average Distance (yds.) 18-Hole Score 277.6 69 259.5 71 269.1 70 267.0 70 255.6 71 272.9 69 Using Excel to Compute the Covariance and Correlation Coefficient 1 2 3 4 5 6 7 8 Formula Worksheet A Average Drive 277.6 259.5 269.1 267.0 255.6 272.9 B 18-Hole Score 69 71 70 70 71 69 C D E Pop. Covariance =COVAR(A2:A7,B2:B7) Samp. Correlation =CORREL(A2:A7,B2:B7) Using Excel to Compute the Covariance and Correlation Coefficient 1 2 3 4 5 6 7 8 Value Worksheet A Average Drive 277.6 259.5 269.1 267.0 255.6 272.9 B 18-Hole Score 69 71 70 70 71 69 C D E Pop. Covariance Samp. Correlation -5.9 -0.9631 The Weighted Mean and Working with Grouped Data • • • • Weighted mean Mean for grouped data Variance for grouped data Standard deviation for grouped data. GPA Example A grade point average is a weighted-mean. That is, 4hour courses are weighted more than 3- hour courses when computing a GPA The Weighted Mean wi xi x wi Where wi is the weight attached to observation i Example: Raw Materials Purchase Purchase Cost per Pound($) Number of Pounds 1 3.00 1200 2 3.40 500 3 2.80 2750 4 2.90 1000 5 3.25 800 Let x1 = 3.00, x2 = 3.40, x3 = 2.80, x4 =2.90, and x5 = 3.25 Let w1 = 1200, w2 = 500, w3 = 2750, w4 =1000, and w5 =800 1200(3) 500(3.40) 2750(2.80) 1000(2.90) 800(3.25) 1200 500 2750 1000 800 18,500 2.96 6250 x Thus: Grouped Data The weighted mean computation can be used to obtain approximations of the mean, variance, and standard deviation for the grouped data. To compute the weighted mean, we treat the midpoint of each class as though it were the mean of all items in the class. We compute a weighted mean of the class midpoints using the class frequencies as weights. Similarly, in computing the variance and standard deviation, the class frequencies are used as weights. Sample Mean for Grouped Data fi M i N fi M i x n For populations For samples Where fi is the frequency of class i and Mi is the midpoint of class i Example: Apartment Rents Given below is the previous sample of monthly rents for 70 studio apartments, presented here as grouped data in the form of a frequency distribution. Rent ($) 420-439 440-459 460-479 480-499 500-519 520-539 540-559 560-579 580-599 600-619 Frequency 8 17 12 8 7 4 2 4 2 6 Sample Mean for Grouped Data Rent ($) 420-439 440-459 460-479 480-499 500-519 520-539 540-559 560-579 580-599 600-619 Total fi 8 17 12 8 7 4 2 4 2 6 70 Mi 429.5 449.5 469.5 489.5 509.5 529.5 549.5 569.5 589.5 609.5 f iMi 3436.0 7641.5 5634.0 3916.0 3566.5 2118.0 1099.0 2278.0 1179.0 3657.0 34525.0 34,525 x 493.21 70 This approximation differs by $2.41 from the actual sample mean of $490.80. Variance for Grouped Data 2 f ( M ) 2 i i N 2 f ( M x ) i i s2 n 1 For populations For samples Sample Variance for Grouped Data Rent ($) 420-439 440-459 460-479 480-499 500-519 520-539 540-559 560-579 580-599 600-619 Total fi 8 17 12 8 7 4 2 4 2 6 70 Mi 429.5 449.5 469.5 489.5 509.5 529.5 549.5 569.5 589.5 609.5 Mi - x -63.7 -43.7 -23.7 -3.7 16.3 36.3 56.3 76.3 96.3 116.3 (M i - x )2 f i (M i - x )2 4058.96 32471.71 1910.56 32479.59 562.16 6745.97 13.76 110.11 265.36 1857.55 1316.96 5267.86 3168.56 6337.13 5820.16 23280.66 9271.76 18543.53 13523.36 81140.18 208234.29 continued Sample Variance for Grouped Data • Sample Variance s2 = 208,234.29/(70 – 1) = 3,017.89 • Sample Standard Deviation s 3,017.89 54.94 This approximation differs by only $.20 from the actual standard deviation of $54.74.