Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Practice Questions for Exam 1 1. A used car lot evaluates their cars on a number of features as they arrive in the lot in order to determine their worth. Among the features looked at are miles per gallon (MPG), make and model (ie. Toyota Camry), number of cylinders, horsepower, weight, and year made. List these variables and state whether each is quantitative or categorical. Quantitative Catergorical MPG make # of cylinders model horsepower weight year 2. High temperatures for 35 major US cities were collected for January 23, 2006 and were put into the stem plot below with leaf unit=1.0. 1 5 11 17 (3) 15 9 7 6 4 1 3 3 4 4 5 5 6 6 7 7 8 4 6689 001122 566678 113------------18th position is 51 678899 04 8 12 889 3 a) What is the minimum, median and maximum for this dataset? min: 34 x 1.0 = 34 position of median: max: 83 x 1.0 = 83 M = 51 x 1.0 = 51 b) Find the range for this dataset. Range = max – min 83 – 34 = 49 1|Page 3. Test scores for a class of 15 economics students were as follows: 86 95 78 93 34 58 65 68 72 98 92 84 73 84 91 34 58 65 68 72 73 78 84 84 86 91 92 93 95 98 M a) Find the mean, median, and mode. ̅ = 78.06 Pos. med. = Mode =84 M = 84 b) Find the IQR, range, standard deviation and variance. IQR = range: 98 – 34 = 64 - s = 17.07 = 291.39 92 – 68 = 24 c) Create a stem and leaf plot. 3 4 5 6 7 8 9 4 8 58 238 446 12358 d.) Create a boxplot. 30 40 50 60 70 80 90 100 4. As they left the movie theater in Gainesville, 17 people were asked how long they had to wait in line for their tickets. a.) What is the population of interest? All Gainesville residents that go to the movie theater 2|Page b.) What is the sample? 17 people c.) What is the variable being measured? Time spent in line d.) Is this variable discrete quantitative, continuous quantitative or categorical? continuous quantitative The results of the above question were as follows: Histogram of Minutes 6 Frequency 5 4 3 2 1 0 0 4 2 6 8 Minutes 12 16 14 18 e) Describe the shape, center, and spread of this histogram. Shape: Skewed right Center: 4 to 6 Spread: 0 to 18 f) Are there any outliers? The value at 18 is might be considered an moderate outlier. 3|Page 5. Last year a small accounting firm paid each of its five clerks $22,000, two junior accountants $50,000 each, and the firm’s owner $270,000. a) What is the mean salary paid at the firm? ̅ = 60,000 b) How many employees earn less than the mean? 7 employees earn less than the mean 22,000 22,000 22,000 22,000 22,000 50,000 50,000 270,000 c) What is the median salary? d) What does this tell us about the mean and the median? The mean is affected by outliers, but the median is not. 6. Two students took the same English course. Their grades were based on five compositions. The grades are as follows. Comp. 1 Comp. 2 Comp. 3 Comp. 4 Comp. 5 Student A 78 52 84 95 92 Student B 84 63 80 92 89 a) Find the mean and standard deviation for each student. Student A ̅ = 80.12 s = 17.12 Student B ̅ = 81.6 s = 11.4 b) Which student has more variability in their scores? Student A, because the standard deviation is higher. 4|Page 7. The following are the golf scores of 12 members of a women’s golf team in tournament play: 89 90 87 95 86 81 102 105 83 88 91 79 a) Display the distribution by a stemplot and describe its main features. 7 8 9 10 9 136789 015 25 Center- upper 80’s Spread – 79 to 105 Shape – bell No outliers b) Compute the mean, variance and standard deviation of these golf scores. Mean: ̅ = 89.67 Standard deviation: s = 7.83 Variance: = 61.31 c) Then compute the median, the quartiles and the IQR. 79 81 83 : 86 87 M: 88 89 90 91 95 102 105 : IQR: Q3 - Q1= 93-84.5 = 8.5 d) Are there any outliers? No 5|Page 8. Colleges and universities are requiring an increasing amount of information about applicants before making acceptance and financial aid decisions. Classify each of the following types of data required on a college application as discrete quantitative, continuous quantitative or categorical. a) High School GPA Continuous quantitative b) Gender of applicant Categorical c) Parent’s income Continuous quantitative d) High School class rank Discrete quantitative 9. Answer the following questions: a) What is the primary disadvantage of using the range to compare the variability of data sets? The range is heavily influenced by outliers b) Can the variance of a data set ever be negative? No c) The variable of interest is height which is measured in inches. What is the unit of the standard deviation? What is the unit of the variance? Units for standard deviation is inches Units for variance is squared inches d) Give an example of a dataset where the standard deviation equals 0. Any dataset that has all of the same numbers Example: 5 5 5 5 5 6|Page 10. Describe the following scatterplots. a) Elevation ( in meters) versus mean annual temperature(in Centigrade). Negative Linear No outliers Strong Avg. Gas Price b). Year vs. Gas Price Average from 1976 to 2004. 2.00 Positive 1.75 Linear 1.50 No outliers 1.25 Moderate 1.00 0.75 0.50 1980 1985 1990 Year 1995 2000 2005 7|Page c). Age (in months) vs. Score on a Cognitive Abilities Test Positive 36 34 Linear 32 2 potential outliers Age 30 28 26 strong 24 22 20 15 20 25 Score 30 35 8|Page 11. A least squares regression was fit to the data shown above of Year vs. Gas Prices. The result was the following: Fitted Line Plot Avg. Gas Price = - 44.78 + 0.02310 Year 2.00 y- intercept Avg. Gas Price 1.75 S R-Sq R-Sq(adj) slope 0.210031 47.3% 45.3% 1.50 1.25 1.00 0.75 0.50 1980 1985 1990 Year 1995 2000 2005 a). Identify the explanatory and response variables. explanatory(x) = year response(y) = avg. gas price b). What is R2 and what is its interpretation? R2 = 47.3 % of variation of gas price is explained by year c). What is the slope and what is its interpretation? Slope = 0.02310, 0.02310 is the average change in gas price every year d). What is the y-intercept and what is its interpretation? -44.78, do not interpret {No data around x = 0} e). What would you predict the gas price to be in 2030? Is this reliable? -44.78 + 0.02310(2030) = 2.113 Not reliable, 2030 is too far away from our data. 9|Page 12. Eleven members of a golf team play two rounds a piece. A bystander wants to predict the round 2 score based on the round 1 score. Their scores are as follows: Round 1 89 90 87 95 86 81 105 83 88 91 79 Round 2 94 85 89 89 81 76 89 87 91 88 80 Round 1 = x Round 2 = y ̅ = 88.5 𝑥 ̅ = 86.27 𝑦 sx = 7.13 sy = 5.31 a). The least squares regression line minimizes the sum of the squared residuals. OR The least squares regression line minimizes the sum of the squared distances of the points to the line. b). Identify the explanatory and response variables. Explanatory variable: Round 1 Response variable: Round 2 c). What is R2 and what is its interpretation? R2 = (.549)2 x 100 = 30.1% OR 1.) square r – 0.5492 = 0.301 2.) Make it a decimal by moving the decimal two places to the right. 30.1% 30.1% is the percent of variation of round 2 scores that can be explained by round 1 scores d). What is the slope and what is its interpretation? sy 5.31 0.40 sx 7.13 Slope: 0.41 is the average change in round 2 golf scores for a one point change in round 1 golf scores br 0.549 10 | P a g e e). What is the y-intercept and what is its interpretation? a y bx 86.27 0.40(88.55) 50.85 No round 2 scores near zero. Do not interpret. e). What is the least squares regression equation? yˆ a bx LSR Line : yˆ 50.85 0.40 x f). Find the residual for golfer 7 (round 1=105, round 2=89). obs x = 105 obs y = 89 yˆ 50.85 0.40 * (105) Residual = obs y – pred y = 89 – 92.85 = -3.85 g.) When this point is removed, the value of r changes to 0.661. Is that point an influential outlier? Yes, because r changed a lot. 11 | P a g e 13) The following plot shows how much water (cubic kilometers) was released by the Mississippi River in the years 1954-1980. Fitted Line Plot Water Release = - 14837 + 7.808 Year 900 S R-Sq R-Sq(adj) Water Release 800 120.066 21.7% 18.6% 700 600 500 400 300 1955 1960 1965 1970 Year 1975 1980 a). What is the correlation between year and water release? √ 1) Change to decimal 2) Take square root 3) Determine sign b). In 1973, a major flood occurred. That year, the river discharged 880 cubic kilometers of water. Find the residual for this point. observed x observed y yˆ 14837 7.808(1973) 568.184 residual = obs y – pred y = 880 – 568.184 = 311.816 12 | P a g e If we remove the observation from 1973, our new plot is: Fitted Line Plot Water Release = - 12468 + 6.598 Year S R-Sq R-Sq(adj) 800 103.614 21.3% 18.0% Water Release 700 600 500 400 300 1955 1960 1965 1970 Year 1975 1980 c). How did the LSR equation change? R2? The slope went down, y intercept went up, and R2 went down a little. d). Was 1973 an influential outlier? No, the line didn’t change very much and R2 only changed a little. 14. For each of the descriptions below, determine what type of mistake is taking place: extrapolation or misuse of cause and effect. a). There is a high correlation between being a newspaper subscriber and having a high income. Should I subscribe to the newspaper if I want to make more money? Misuse of cause and effect, high correlation does not mean causation b). 15 year-old Abby is an aspiring professional golfer. For the last 5 years, she has recorded her average score at a local course. Using this information, she predicts what her average score will be when she is 25. Extrapolation, 25 is too far away from the data observed 13 | P a g e c)In any given city, the number of churches and the number of bars are highly correlated. Does church attendance cause drinking? Misuse of cause and effect, high correlation does not mean causation. There could be lurking variables such as: the number of people in a city will cause both the number of churches and the number of bars to increase. 15. A student who waits on tables at a Chinese restaurant in a college neighborhood records the cost of means and the tip left by single diners. The student wants to predict the tip based on the price of the meal. r = 0.954 (x) Meal $4.50 $5.79 $6.24 $4.62 $6.35 ̅ = 5.5 sx = 0.884 (y) Tip $0.50 $0.75 $0.85 $0.60 $1.00 ̅ = 0.74 sy = 0.1981 a) Compute the least-squares regression line for these data. ( ̅ ) ( ̅ ) LSR Line 𝑦 𝑥 b) Make a scatterplot of the data and draw the regression line on your plot. 1.00 𝑦 ( ) 𝑦 ( ) .75 Now plot (4, .4193) and (6, .8469), and draw a line through the points. .50 .25 4 4.5 5 5.5 6 6.5 (Cost of meal in $) 14 | P a g e c) The next diner orders a meal costing $4.89. Use your regression line to predict the ( ) ⏞ tip. Predicted tip 61cents 16. Below are boxplots for the amount of calories for the different types of cereals based on whether they are on the bottom, middle or top shelf. Shelf 1 is the bottom shelf, shelf 2 is the middle shelf and shelf 3 is the top shelf. Boxplot of calories vs shelf 175 calories 150 125 Median 100 75 50 1 2 shelf 3 about 110 calories 1.) What is the median amount of calories for boxes on the top shelf? top_ 2.) Which shelf has the largest spread? Symmetric _3.) What is the shape of the distribution of calories for boxes sold on the top? (right, left, symmetric) (You can know that a boxplot is symmetric, but you can not determine from a boxplot that it is normal). 15 | P a g e 17. Answer the following questions. a.) What is the strongest value of correlation? 1 or -1 b.) What is the measure of center that is most influenced by outliers? Mean c.) What is the measure of spread that is the most influenced by outliers? Range d.) What percentage of the data is less than Q1? 25% e.) What percentage of the data is less than Q3? 75% f.) What is an influential outlier? This is when there is a point outside of the trend of most of the data and when the point is removed the slope of the line changes drastically and the value of R2 changes substantially. g.) What type of graph would you use to explore the relationship between two quantitative variables? scatterplot h.) What type of graph would you use to explore the relationship between a quantitative variable and a categorical variable? boxplots i.) What type of graph would you use to explore the relationship between two categorical variables? Contingency table 16 | P a g e 18. Below are boxplots of foal weights divided by gender(M=Males and F=Females). Answer the following questions about the plot. Boxplot of Weight vs Gender 130 Weight 120 110 100 90 F M Gender a.) What is the median of the male foals? _____about115_________________ b.) Approximate the IQR for the female foals. ___Q3-Q1=128-95 =33______ c.) Compare the centers of the weights of the male and female foals. Use a complete sentence. The median weight for the female foals is slightly lower than the median weight of the male. d.) Compare the spread of the weights of the male and female foals. Use a complete sentence. The IQR and the range for the female foals is larger than for the male foals. e.) Compare the shapes of the two distributions. Use a complete sentence. The shape of the distribution for the female foals is fairly symmetric, but the shape of the distribution for the male foals is slightly right skewed. 17 | P a g e