Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Bootstrapping (statistics) wikipedia , lookup
Confidence interval wikipedia , lookup
Foundations of statistics wikipedia , lookup
Inductive probability wikipedia , lookup
Statistical inference wikipedia , lookup
History of statistics wikipedia , lookup
Probability amplitude wikipedia , lookup
Statistics & Data Analysis Course Number Course Section Meeting Time B01.1305 31 Wednesday 6-8:50 pm Midterm Review Midterm Format Open book and open notes • No solution guides or other resources are permitted A scientific calculator will be required All questions will be short answer Entire class period is available for exam Professor S. D. Balkin -- March 12, 2003 -2- Exam Coverage Chapter 1 • Understand reasons for statistics Chapter 2 • • • • Distinguish between qualitative and quantitative variables Describe and interpret plots of data Understand and calculate measures of center Understand and calculate measures of variation Professor S. D. Balkin -- March 12, 2003 -3- Exam Coverage Chapter 3 • Understand different sources of probabilities • Understand and use basic principles of probability • Addition • Compliments • Multiplication • Calculate conditional and unconditional probabilities • Understand, use and determine statistical independence • Be able to construct and interpret probability tables and trees Chapter 4 • Understand probability distributions • Calculate the expected value and standard deviation of a probability distribution Professor S. D. Balkin -- March 12, 2003 -4- Exam Coverage Chapter 5: Some Special Probability Distributions • Calculate probability of an event using • Counting methods • Binomial distribution • Normal distribution Chapter 6: Random Samples and Sampling Distributions • Understand and identify sources of sample bias • Understand difference between the distribution of a summary statistic and distribution of a population • Identify the sampling distribution of the sample mean • Understand the use of the Central Limit Theorem • Interpret a normal probability plot Professor S. D. Balkin -- March 12, 2003 -5- Exam Coverage Chapter 7: Point and Interval Estimation • Understand unbiased and efficient estimators • Calculate and interpret confidence intervals • For population mean with standard deviation known • For population proportion • For population mean with standard deviation unknown • Determine sample sizes for a given confidence level and tolerance width • Understand t-distribution • Understand key assumptions underlying confidence interval methods Professor S. D. Balkin -- March 12, 2003 -6- Practice Problems with Answers in Book 2.26 3.35 3.36 3.46 3.47 3.48 3.53 3.54 3.55 3.59 3.60 3.63 3.64 3.65 3.66 3.67 3.68 4.35 4.36 5.37 5.38 5.40 5.41 6.29 6.35 6.36 6.37 7.41 Professor S. D. Balkin -- March 12, 2003 7.42 7.47 7.48 7.58 7.59 7.60 7.76 7.77 -7- Interpretation Review • • • • • • • • • • • • • Mode: value or category with the highest frequency in the data Median: middle value when the data are arranged from lowest to highest Mean: sum of measurements divided by the number of measurements Variance: squared deviations from the mean Empirical Rule: IQR: 75th percentile – 25th percentile Random Variable: quantitative result from an experiment that is subject to random variability Expected Value: probability-weighted average of possible values Permutations: number of sequences of r symbols taken k at a time Combinations: number of subsets of r symbols taken k at a time Central Limit Theorem: For any population, the sampling distribution of the sample mean is approximately normal if the sample size is sufficiently large. Interval estimate: states the range within which a population parameter probably lies 95% Confidence interval: • About 95% of similarly constructed intervals will contain the parameter being estimated Professor S. D. Balkin -- March 12, 2003 -8- Question #1 Fortune magazine publishes a list of the world's billionaires each year. The 1992 list includes 233 individuals. Describe this distribution of wealth. Why do you think the distribution is the way it is (Hint: is this a representative sample)? 150 100 0 50 Frequency 200 Histogram of wealth 0 10 20 30 40 wealth Professor S. D. Balkin -- March 12, 2003 -9- Question #2 As a marketing consultant, you observed 50 consecutive shoppers at a grocery store, and recorded how much money each shopper spent in the store. (a) Create and interpret a histogram of these data. (b) Create and interpret a stem-and-leaf plot of these data. (c) Create and interpret a boxplot of these data. (d) Provide your client with an executive summary of your analysis. Professor S. D. Balkin -- March 12, 2003 - 10 - Question #3 A narcotics enforcement unit works with customs officers at an airport that serves international travelers on a route that has plausible links to the drug trade. This enforcement unit has developed a smuggler profile that it uses to initiate full searches of people who meet the profile. These profiles typically require meeting a number of conditions such as (a) male under 40, (b) traveling alone, (c) loose clothing, and so on. Fully 100% of the travelers who meet the profile were searched, and 10% of those who did not meet the profile were searched. After collecting considerable data, these figures resulted: Percentage of people who meet the profile: 4% Percentage of people who meet the profile and then are found to have illegal drugs 35% Percentage of people who do not meet the profile and then are found to have illegal drugs 3% (a) Based on these figures, what percentage of travelers on this particular route is carrying illegal drugs? (b) What percentage of the drug-carrying travelers will be captured by this procedure? Assume that all drug carriers who are searched will be captured. (c) Given that a traveler is carrying illegal drugs (whether captured or not), what is the probability that this person will meet the profile? Professor S. D. Balkin -- March 12, 2003 - 11 - Question #4 A restaurant has collected data on its customers’ orders and had estimated probabilities about what happens after the main course. It was found that 20% of the customers had dessert only, 40% had coffee only, and 30% had both dessert and coffee. (a) Draw a probability tree for this situation (b) Find the probability of the event “had coffee.” (c) Find the probability of the event “did not have dessert” (d) What percentage of customers will have “neither coffee nor dessert”? (e) What percentage of customers will have “coffee OR dessert”? (f) Are the events “had coffee” and “had dessert” mutually exclusive? How do you know? (g) Given that a customer had coffee, what is the probability that the same customer had dessert? (h) Are “had dessert” and “had coffee” independent events? How do you know? (i) Find the conditional probability of having dessert GIVEN that the customer did not have coffee (j) Find the conditional probability of having dessert GIVEN that the customer did have coffee (k) Based on your analyses above, who is more likely to order dessert, a customer who orders coffee, or one who does not? Professor S. D. Balkin -- March 12, 2003 - 12 - Question #5 Acorn is the acronym for Association of Community Organizations for Reform Now. These data were presented by Acorn to a Joint Congressional Hearing on discrimination in lending. Acorn concluded, "Banks generally have exhibited a pervasive pattern of lending practices that have the effect, intended or not, of racial discrimination. Wide disparities in rejection rates for minority and white applicants, even in comparable income groups, were found in all SMA's, and at nearly every institution studied." The data provide are as follows: Data: bankdata.txt Number of cases: 20 Variable Names: Name of bank MIN = refusal rate for minority applicants WHITE = refusal rate for white applicants HIMIN = refusal rate for high income minority applicants HIWHITE = refusal rate for high income white applicants Using the data provided and the methods learned in class, write a short argument in support of or disputing Acorn’s claim that banks have exhibited racial discrimination. Use both graphics and text to help make you case. Professor S. D. Balkin -- March 12, 2003 - 13 - Question #6 Research on insider traders who were arrested revealed that 38% of them committed some other white-collar crime. What is the probability that of the last 100 arrested insider traders, 30 committed another crime? Professor S. D. Balkin -- March 12, 2003 - 14 - Question #7 Here is a table of American households classified by education and income Education < 4 years of high school 4 years of high school 1-3 years of college 4+ years of college (a) (b) (c) (d) Income Class (thousands of dollars) <15 15-34 35-49 50+ 11,668 7,217 1,909 1,180 8,088 12,417 5,776 4,279 2,626 5,263 3,230 3,173 1,597 5,189 4,334 7,888 What is the probability that a randomly selected household has an income of at least $50,000. What is the conditional probability that a household earns over $50,000 given that the householder completed at least 4 years of college? What is the conditional probability that the householder completed at least 4 years of college given that the household income is at least $50,000? Are the random variables household income and years of education independent? Why or why not? Professor S. D. Balkin -- March 12, 2003 - 15 - Question #8 Identify a situation relating to your work or business interests in which statistical sampling might be (or has been) helpful (a) Describe the population and indicate how a sample could be chosen (b) Identify a population parameter of interest and indicate how a sample statistic could shed light on this unknown. (c) Explain the concept of the sampling distribution of this statistic for your particular example. Professor S. D. Balkin -- March 12, 2003 - 16 - Question #9 Suppose an investment has the following probabilities associated with levels of profit: PROFIT PROBABILITY $300 0.05 $200 0.25 $100 0.35 $0 0.20 -$50 0.10 -$100 0.05 (a) Find the expect return (value) and the risk (standard deviation) of this investment (b) Draw / Create the probability distribution (bar chart) for your profit for investment. Indicate the mean and standard deviation. (c) If you had a choice between this investment or a $100.00 gift, which would you prefer and why? (d) Suppose you had to decide between this investment and one with an expected return of $150.00 and standard deviation (risk) of $200. What would your decision depend on? Professor S. D. Balkin -- March 12, 2003 - 17 - Question #10 A city decides to determine the mean expenditures per tourist per visit. A random sample of 100 finds that the average expenditure is $800. The standard deviation of expenditures for all tourists is $120. A) What is the standard deviation of the mean, given that the standard deviation of the whole population is $120 and the number of people sampled is 100? B) What is a 95% confidence interval for the value of the expenditures per tourist? Provide an interpretation. C) If the city wants to determine the average expenditure within plus or minus $20, how many people does it need to sample? Professor S. D. Balkin -- March 12, 2003 - 18 - Question #11 In border towns such as Detroit and Buffalo, Canadian coins frequently end up in business cash registers. Canadian denominations are identical to U.S. denominations, and the coins are virtually identical in size, color, and weight. At present, the exchange rate favors the U.S., and banks encourage their customers to sort out the Canadian coins. A Buffalo bank has been monitoring the deposits of one of its large customers, a supermarket. The bank has recorded on 45 days the face value of Canadian coins per $100 deposited. For these 45 days, the average amount was $3.46, with a standard deviation of $0.52. Give a 95% confidence interval for the population mean. Professor S. D. Balkin -- March 12, 2003 - 19 -