Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Name: Practice Test 1 Instructions: Do ALL of the following questions. Each question is worth 10 points (a total of 120 points). Budget your time wisely, not spending too much time on low point questions and not being able to finish high point questions. It is probably best to look over all questions before you start. You want to make sure to have enough time to do all the calculation questions. Note: The actual test will only be 100 points, not 120. Formulae: Class width = ( Largest data value – Smallest data value ) / number of classes CV z x Position of Q1: L .25 * n Position of pth percentile: L Position of Q3: L .75 * n p *n 100 Finding the percentile of a given data value: Percentile 100 * IQR Q3 Q1 LOB Q1 1.5IQR yˆ a bx residual y yˆ Chebyshev’s Inequality: 1 1 / K 2 (number of values less than x) .05 n UOB Q3 1.5IQR 1. A restaurant wants to use a sample to gather information from its patrons. A) What is a sample? Why use samples? B) Describe (not just name, but describe) 2 different ways the restaurant could properly conduct a sample C) Describe one catastrophically bad way the restaurant could conduct a sample, and why would it be catastrophic? 2. A study is conducted to examine the relationship between the consumption of red wine and the occurrence of heart disease among people over the age of 50. A cohort of 250 people is selected, and the subjects record their wine red wine consumption over then next 10 years. The heart-health of each subject is also monitored. A) What type of study is this? B) A negative correlation is found between “the quantity consumed of red wine” and a measured presence of heart disease. What does this mean? C) One reason this relationship is exists could be because consuming red wine causes a change to person’s heart-health. Offer another explain of why this relationship might exist using the idea of confounding factors. 3. The Nike Corporation commissions a study that looks at ownership of athletic footwear. A survey is done looking at a treatment variable of “number of athletic shoes” own by minors and an outcome of “hours spent in physical activity each day”. The study finds a statistically significant positive correlation between the number of athletic shoes a child owns and the amount of time they spend in physical activity each day. A) What does it mean there’s a “positive correlation” between the number of athletic shoes and physical activity of a child? B) What types of variables are “number of athletic shoes” and “time spent in physical activity”? C) Why is there is serious potential for bias in this study? D) The study concludes that if the government bought every child two extra pairs of athletic shoes, it would increase the amount of physical activity by 1.3 hours per day. Explain why this conclusion might be incorrect. 4. Financial economists have shown that if you buy a portfolio of stocks, say 20 different stocks, instead of just one, the mean return of your financial investment will stay the same, but the standard deviation will decrease significantly. A) What does that mean that the standard deviation of your return on investments will decrease? B) Assume the mean return to stocks is 7%, and the standard deviation of portfolio of stocks is 5%,and the frequency histogram of returns is approximately bell-shaped, what fraction of returns would be within 1 standard deviation of the mean? How much are returns that are 1 standard deviation from the mean? C) How much would be returns that are 2 standard deviations above the mean? Using the empirical rule, how often would we expect to see returns 2 standard deviations above the mean? 5. Formulae A) What is the mathematical formula for a population mean, and explain it B) What is the mathematical formula for a sample standard deviation, and explain it. 6. Cumulative Frequency Distribution The following is the relative cumulative frequency distribution of how many units students have successful completed who have been at Cabrillo for at least 1 year (called “returning students). Fictional data. Number of units successful completed Relative Cumulative Frequency 0-5 6-11 12-17 18-23 24-29 30-35 36-41 42-47 48-53 54-59 3% 10% 21% 34% 48% 64% 77% 87% 92% 95% 60+ ? A) What is value for 60+ units? B) What fraction (in percentage) of returning students have completed at least 30 units? C) What fraction of returning students have completed less than 24 units? D) What fraction of returning students have completed from 42 through 47 units? 7. The following is the HDL cholesterol levels of 40 randomly selected women (data were sorted) 27 37 57 66 28 40 58 70 30 45 61 72 32 47 62 73 34 48 63 73 36 49 63 74 37 53 64 80 37 53 64 80 37 54 64 81 37 56 65 84 A) What is the median level of cholesterol? B) What is the location of the first quartile? C) What is the location of the 33rd percentile? D) What is the value of the 33rd percentile? E) A cholesterol rate of 72 is what percentile? F) Assume the population mean is 55, and the population variance is 270. What is the z score of the maximum HDL cholesterol observation in the sample ( = 84)? 8. 20 16 12 8 4 0 5 10 15 20 25 30 35 40 Murder Rate of a sample of major cities (per 100,000) 45 50 Use the frequency histogram above that shows the results of a sample of the murder rate (per 100,000 people) of major US cities. A) How many cities were sampled? B) What is the relative frequency of cities with a murder rate of 20 to less than 25 murders? C) What is the cumulative frequency of a murder rate from 15 to less than 20? D) How many cities had a murder rate of at least 25? E) How many cities had a murder rate less than 15? F) Describe the shape of the histogram 9 & 10 data, Student Scores Data. The following are the final course scores for a sample of students taking statistics (fictional data): 69, 65, 54, 56, 76, 60, 81, 87, 68, 84, 87, 60, 43, 68, 74, 77, 68, 70, 73, 79, 72, 62, 59, 29, 43, 55, 54, 11, 73, 59, 71, 63, 66, 79, 52, 73, 62, 75, 91, 84, 63, 90, 93, 37, 77, 66, 73, 65, 67, 77 9. Calculating basic summary statistics A) Calculate all measures of center for the student scores: B) Calculate the standard deviation and variance for the student scores: C) Find the Five-Number Summary of the student scores: D) Draw a modified boxplot of the student scores: 0 10 20 30 40 50 60 70 80 90 100 10. (continued from 9) Draw a histogram of the scores with a class width of 10 and starting at zero. Label the top of each column with the frequency. What shape does the histogram have? 0 10 20 30 40 50 60 70 80 90 100 Data for 11 & 12 Student scores 2. The next questions use the following sampled data: Stat score is the score the student received in the statistics class Age is the age of the student Algebra score is the score the student received in their intermediate algebra class. All data is fictitious. Important: keep track of which variable is the dependent and which is the explanatory. Stat Score 43 54 56 59 62 66 69 77 77 79 79 81 84 93 Algebra Score Age 20 22 33 37 29 26 33 20 27 31 18 21 23 19 50 53 58 64 69 71 68 79 74 75 85 77 87 96 11. Regression 1 A) Regress Stat Score (Y) on Age (X). Write the estimated regression equation (report the calculated a & b in equation form) B) Interpret the X-Coefficient C) Interpret the Y-axis intercept C) Predict the Statistics Score if a Student is 15 years old D) Predict the Statistics Score if a Student is 55 years old 12. Regression 2 A) Regress Stat Score (Y) on Algebra Score (X) and write the regression equation: B) Interpret the Y-axis intercept C) Calculate and interpret the X-axis intercept D) If a student wants to earn a score of 100 in Statistics, what will they need to achieve in Algebra? E) Draw the estimated regression line on the following graph: 100 90 80 Stat Score 70 60 50 40 30 20 10 0 0 10 20 30 40 50 60 Algebra Score END OF TEST 70 80 90 100 Definitions Be able to do two things: 1) Define the following terms 2) Explain why it’s important or why we want to know it. 3) There are other aspects of these words/concepts that you will need to know that is not part of the definition (such as how to calculate it) and is not part of this worksheet. Population Sample Simple Random Sample Types of Samples: Sample of Convenience Stratified Cluster Systematic Voluntary Response Statistic Parameter Qualitative Variables Nominal Variables Ordinal Variables Quantitative Variables Discrete Variables Continuous Variables Randomized Experiment Observational Study Prospective Retrospective Cross-section Outcome Treatment Double-Blind Confounding Bias Frequency Frequency Distribution Relative Frequency Distribution Cumulative Frequency Distribution Relative Cumulative Frequency Distribution Mean Median Mode Mid-Range Range Variance Standard Deviation Empirical Rule Chebyshev’s Inequality z-score First Quartile Third Quartile Percentile Positive Association (or relation) Negative Association Correlation Coefficient Regression Line