Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STA-2023 Supplementary Exercises (Revised: Oct. 2013) Chapter 2: Descriptive Statistics 1. Consider the annual dividend yields (in dollars) for a random sample of 30 stocks: 20.5 15.4 16.9 13.4 8.8 19.5 12.7 7.8 14.3 22.1 15.6 5.4 23.3 19.2 20.8 24.1 17.0 11.8 9.2 12.6 9.9 28.6 18.4 16.8 15.9 27.8 21.9 15.2 12.0 5.1 a) Complete the following frequency table Class Boundaries Frequency Rel. Freq. (3 dec.) Perc. Freq. 4.95 – 8.95 8.95 – 12.95 12.95 – 16.95 16.95 – 20.95 20.95 – 24.95 24.95 – 28.95 b) c) d) e) How many stocks had a dividend yield more than $20.95? What percent of stocks had a dividend yield less than $12.95? What are the boundaries of the modal class? Construct a percent frequency histogram. f) Use your calculator’s statistical functions to compute the sample mean and sample standard deviation (round to 2 decimals). g) Determine the Five Number Summary. h) Identify the outliers, if any, in this data set? Justify your answer. Hint: Calculate the lower and upper fences. 2. The PFA-100 is a medical device that measures the platelet function of subjects by simulating the clotting process of the blood. It requires only a small blood sample and the results are reported in less than five minutes as “closure times”. The following data are the closure times (in seconds) for 25 healthy individuals: 105 92 80 94 113 97 95 79 106 88 83 99 95 91 117 76 114 102 93 95 84 108 93 106 87 a) Construct a Stem & Leaf diagram for this data set. Specify the units for the stems and leaves b) What was the shortest and longest closure time? c) What was the most frequent closure time? d) How many subjects had a closure time less than 90 seconds? e) What percent of these 25 subjects had closure times of at least 100 seconds? f) Use your calculator’s statistical functions to compute the sample mean and sample standard deviation (round to the nearest integer) 3. Troponin-I is an enzyme related to muscle activity. An immunochemistry test based on Troponin-I showed the following results (ng/dL) for 30 subjects with chest pain admitted to the emergency room of a hospital: 3.1 4.3 2.5 1.7 5.2 3.4 2.0 3.6 6.5 4.8 1.8 2.9 1.6 3.1 5.4 3.9 3.8 2.3 4.3 4.5 3.7 6.9 2.2 1.5 1.8 5.7 2.8 5.0 2.8 2.6 a) Construct a stem & leaf plot for this data set. Specify the measurement units for the stems and leaves. b) What was the lowest and highest Troponin-I measurement? c) What’s the class interval with the highest frequency? d) How many subjects had a Troponin-I level of 5.0 (ng/dL) or above? e) What percent of subjects had Troponin-I level below 3.0 ng/dL? f) Obtain the median based on the stem & leaf plot 4. Determine the Five Number Summary for the weights of 21 male FIU students. Identify any outlier that may exist using the lower and upper fences. 121, 173, 157, 165, 170, 161, 142, 171, 184, 115, 172, 159, 187, 166, 158, 163, 145, 196, 172, 130, 171 5. The blood type for twenty adult subjects was recorded (see data below). a) Construct a freq. table and bar graph for this categorical data set b) What is the most frequent blood type? c) How many subjects have blood type different from “O” d) What percent of subjects have either blood type A or B? 6. Twenty one biology majors conducted an experiment during a Bio lab class. Each student measured their own rock’s density by weighing the rock (in grams) and then dropping the rock into a cylinder with a know amount of water to record the volume, in milliliters, of the rock (see data below, Table 1). a) Use your calculator statistical capabilities to find the mean and standard deviation b) Obtain the Five Number Summary c) Determine the presence of outliers by calculating the upper and lower fences. Justify you answer 7. Sixteen FIU students determined their own Body Mass Index (BMI) (see data below, Table 1). Weight and height measurements were used by each student to calculate his/her BMI a) Use your calculator statistical capabilities to find the mean and standard deviation b) Obtain the Five Number Summary c) Determine the presence of outliers by calculating the upper and lower fences. Justify your answer 8. Each of 23 biology scholars blew Carbon Dioxide into a tube containing the phenol red indicator to turn the solution from red to yellow (i.e. basic to acidic). Each student then added their own piece of Elodea into the solution, closed the tube, covered it with foil and let it sit for 10 minutes. Once the 10 minutes had passed, the pH of the solution was recorded, the foil removed and the tube placed directly in the light for 10 minutes. Once the 10 minutes had elapsed, the pH was recorded again (see data below, Table 1). a) Obtain the Five Number Summary for the pH measurements before and after light effect b) Determine the presence of outliers for each data set by calculating the upper and lower fences. Justify your answer c) Use your calculator statistical capabilities to find the mean and standard deviation d) Do you observe a substantive difference in the location of center before and after light effect? Explain 9. The pulse rate before and after exercise were determined for 15 FIU science majors (see data below, Table 1) a) Obtain the Five Number Summary for the pulse rates before and after exercise b) Determine the presence of outliers for each data set by calculating the upper and lower fences. Justify c) Do you observe a substantive difference in the location of center before and after exercise based on the medians? Table 1 BMI 21.6 23.0 19.2 20.4 27.4 21.6 20.1 27.1 23.6 23.0 19.3 27.1 22.6 27.3 22.1 24.7 Rock Dens 1.58 1.80 1.81 1.50 1.09 2.11 1.20 1.80 2.06 1.90 2.15 2.43 1.75 1.70 2.02 3.85 1.15 3.10 1.60 2.69 1.40 Ph Before 5.04 5.10 5.07 5.23 5.25 5.60 5.34 4.93 5.33 4.95 5.65 5.61 5.54 5.19 5.49 5.28 5.01 5.01 4.90 4.87 4.69 4.91 5.33 Ph After 6.15 6.28 6.35 6.47 6.55 6.13 5.68 6.23 6.26 6.09 6.44 6.25 6.40 6.54 5.99 6.04 6.21 5.88 6.02 6.23 6.21 6.04 6.07 Pulse B 88 92 96 72 80 78 74 72 88 57 62 83 63 67 67 Pulse A 184 192 168 124 140 172 130 136 176 147 126 172 127 119 119 Blood Type A O B B A AB O A A B AB A AB B B A AB O A B Chapter 3: Probability 1. Consider the random experiment consisting of rolling a fair die twice (Remember the 36 outcome sample space) (a) List all sample points included in the following events: A: the sum of the two rolls is less than four B: the sum of the two rolls is greater than nine C: observing the same result for both rolls D: the sum of the two rolls is an even number (b) Find P(A) P( Bc ) P(B ∩ C) P(A U D) 2. A survey was conducted on 200 cars at a given Miami dealership. The survey involved two questions: Type of Transmission Automatic (A) Manual (M) Car Model Sedan: 4 doors (F) Coupe: 2 doors (T) Hatchback: 3 doors (H) The results of the survey are summarized in the following contingency table where the rows represent the type of transmission and columns the car model: A M F 64 16 T 46 10 H 36 28 a) If a car is chosen at random, among the 200 cars included in the study, find the probability of the following events: Choosing an Automatic car Choosing a Sedan car Choosing a car that is both Automatic and Sedan Choosing a car that is either Automatic or Sedan b) Find the following conditional probabilities P(M | H) P(H | M) c) Are the events Manual transmission (M) and Hatchback model (H) mutually exclusive? Explain. d) Are the events Manual transmission (M) and Hatchback model (H) independent? Justify your answer numerically. 3. A group of 200 files in a medical clinic classifies the patients by gender and by type of diabetes (I or II). The grouping is shown as follows. The table gives the number of patients in each classification: F M I 70 50 II 40 40 If one patient is chosen at random from the 200 files, a) What is the probability that the patient has Type I diabetes? (5) b) What is the probability that the patient is male? c)What is the probability that the patient chosen is male or the patient chosen has Type I diabetes? d) What is the probability that the patient is male, given the patient has Type I diabetes? (10) Chapter 4: Probability Distributions Part I: Discrete random variables 1. The random variable “X” designates the number of weekly breakdowns for a photocopy machine. The table below describes the probability distribution for “X’ X 0 1 2 3 4 P(X) .20 .40 .25 .10 .05 a) Find the probability of the following events for any randomly chosen week and interpret the result for each part: Observing less than 2 machine breakdowns Observing at most 2 machine breakdown Observing at least 3 machine breakdowns Observing exactly 4 machine breakdowns b) Compute and interpret the mean of “X” c) How many machine breakdowns are expected over a 25-week period? Justify your answer numerically d) Compute the Standard Deviation of “X”. Interpret the result e) Draw a point probability graph for this problem f) Find P(X < μ – σ) g) Calculate the probability that “X” exceeds two standard deviations above the mean? 2. Assumed that 30% (hypothetical figure) of all public state universities belong to ethnic minorities. a) Find the probability of observing the following events among a random sample of 15 public state university students: Exactly four belong to ethnic minorities (use the binomial formula and your calculator) At most 6 belong to ethnic minorities (use the binomial table) At least 8 belong to ethnic minorities (use the binomial table) Less than 12 but more than 8 belong to ethnic minorities (use the binomial table) b) What is the expected number of students in this sample that belong to ethnic minorities? Justify your answer numerically. 3. A local newspaper claims that 60% of the items advertised in its classifieds section are sold within a week of the first appearance of the ad. a) If a random sample of 25 advertised items from last month are randomly selected, find the probability of observing the following events: Less than 20 were sold within a week More than 10 were sold within a week Between 10 and 20 inclusive were sold within a week. b) What is the expected number of items sold within a week among random samples of 25 items? Justify your answer numerically 4. Suppose the number of errors on income tax forms processed by an accounting firm averages 2.5 per month and assume that the errors occur at random and independent one to another. What’s the probability that during the first month of the next tax season the following events will be observed for this accounting firm? No errors (use the Poisson formula) At most two errors (use the Poisson formula) Between 1 and 3 errors inclusive (use the Poisson formula) 5. Students arrive to a given booth of the GC food court at the average rate of 3.2 people per minute. Assuming the students arrival follow a Poisson process, a) Find the probability of observing the following events in a given period of one minute: Exactly 3 students arrive (use the Poisson formula) At least 2 students arrive (use the Poisson formula) Less than 3 students arrive (use the Poisson formula) b) Find the probability of observing the same events in a two minute period Chapter 4: Probability Distributions Part II: Continuous random variables 1. The IQ score of the 5th grade children population in a school district has a normal distribution with a mean 105 and standard deviation 10. a) Draw the normal curve of IQ scores for this population b) If a child is chosen at random from this population, find the probability of observing the following events. For each part, graph the probability as the associated area under the normal curve of IQ scores. An IQ score less than 90 An IQ score of 130 or higher An IQ score between 84 and 128 c) What is the IQ score (rounded to the nearest whole number) that has 90% of the defined population below it? Show the graph supporting your solution. 2. A big tech company measures the emotional intelligence of job applicants with a particular test. Last year it was observed a mean score of 85 points and standard deviation of 6. If the distribution of scores was Normal (bell shaped), a) What percent of candidates exceeded 80 points? b) What percent of candidates scored between 75 and 90 points? c) A candidate accepted for a given position fell at the 96th percentile of the classification. What was the candidate score (round to one decimal)? 3. The time it takes to drain a fully charged battery in a typical smart phone has a mean of 8.2 hours with a standard deviation of 1.1 hours. Assuming that this time is normally distributed, a) Find the probability of observing the following events for a fully charge smart phone The battery is drained in under 7 hours The battery is drained in after 9 hours The battery is drained between 7 and 9 hours b) Determine the number of hours (rounded to one decimal) that limits the top 25% of batteries. 4. The fuel consumption of a Boeing 787 aircraft in cruising mode averages 3210 gallons per hour. Assume that the consumption is normally distributed with a standard deviation of 180 gallons per hour. If a Boeing 787 is in cruising mode, what is the probability (as a percentage) the fuel consumption is a. Less than 3400 gallons per hour? b. More than 3000 gallons per hour? c. Between 3100 and 3200 gallons per hour? Chapter 6: Estimation with Confidence Intervals 1. Assume that data from Chapter 2 exercises #6-9 come from simple random samples. Estimate the population mean for each problem using a 95% confidence level. What assumption is required for the validity of the procedure used? 2. A survey was conducted on home gardening in Miami-Dade County. To that end a random sample of 65 households with backyard gardens was selected. The sample mean size of the gardens was 579 square feet with a sample standard deviation of 148 square feet. a) Determine a point estimate for the mean size of all Miami-Dade backyard home gardens. b) Find a 90% confidence interval (rounded to the nearest integer) for the population mean and interpret the result in the context of the problem. c) Find a 99% confidence interval (rounded to the nearest integer) for . Compare the width of this interval estimate to the result from part (b) d) Discuss the precision of the two previous interval estimates 3. A poll asked people in a given city about their intention of vote for a given candidate to a political race. Among a random sample of 613 registered voters 324 showed support to this candidate. a) Determine a point estimate (rounded to 3 decimals) for “p”, the proportion of all registered voters in this city that support the given candidate. b) Find a 98% confidence interval (rounded to 3 decimals) for “p” and interpret the result in the context of the problem. c) If the sample size is increased, what will be the effect on the precision of our interval estimate? Explain. 4. Consider a study investigating the physiological changes that accompany laughter. Ninety six randomly selected students from some college watched film clips designed to evoke laughter. During the laughing period, the researchers measured the heart rate (beats per minute) of each subject, with the following results: sample mean = 75.3 and sample std. dev. = 6.2. Obtain a 97% confidence interval (rounded to one decimal) for the population mean of all students at this college. 5. Suppose that a random sample of twenty runners among the world class category of marathon runners was chosen. Their best time over the last year was obtained, resulting in a sample mean of 134.6 minutes with a sample standard deviation of 5.4 minutes. a) Obtain a 95% confidence interval (rounded to one decimal) for the population mean of all runners in this class. b) What will be the impact on the interval’s precision if the sample size is reduced in half? Explain 6. The proportion of students who use a cell phone on college campuses across the country has increased tremendously over the past few years. Four hundred students were selected at random from a large university with 350 of them indicating the use of cell phones on campus. Find a 94% confidence interval (rounded to 3 decimals) for “p”, the proportion of students who use a cell phone on this university campus, and interpret the result in the context of the problem. Chapter 7: Hypothesis Testing based on a single sample For the following problems conduct the five step procedure of hypothesis testing discussed in class 1. Using sample data from exercise #1, chapter 2, test the hypothesis that the mean yield of all income producing stocks is higher than $15. Use a significance level α = 0.05 2. Using sample data from exercise #2, chapter 2, test the hypothesis that mean PFA closure time of the healthy population is less than 100. Use a significance level α = 0.01 3. Using sample data from exercise #3, chapter 2, test the hypothesis that mean Troponin-I level of the chest pain patient population differs from 3.0 ng/dL. Use a significance level α = 0.10 4. Using sample data from exercise #7, chapter 2, test the hypothesis that mean BMI of the FIU student population is less than 25. Use a significance level α = 0.05 5. Using sample data from exercise #3, chapter 6, test the hypothesis that a majority of registered voters in this city support the given candidate. Use a significance level α = 0.01 6. The average retail price for tomatoes in Miami-Dade County last year was $1.45 per pound. A recent study considered a sample of 14 different supermarkets in this county that gave an average of $1.54 with a standard deviation of 9 cents. a) Do these data provide sufficient evidence at the 5% significance level to conclude that the current mean price for tomatoes in MiamiDade is higher than last year? b) Determine the p-value for this test and interpret the result. 7. A recent poll of 603 randomly selected youngsters from a given city revealed that 62 of them are currently drug users. (Note: figures are hypothetical.) a) Do these data provide sufficient evidence at the 1% significant level to conclude that the current percentage of drug users among the given youngster population is different from 11%? b) Calculate and interpret the p-value for this test. 8. A study claims that the nationwide average annual tuition for private high schools is less than $7000. A random sample of 55 private high schools had an average annual tuition of $6625 and a sample standard deviation of $1210. Test the claim at a 10% level of significance. Make the conclusion and find the p-value for the test. 9. A random sample of 175 students is taken from a large university on the West coast to estimate the proportion of students whose parent bought a car for them when they left for college. When interviewed, 91 students in the sample responded that their parents bought them a car. a) Do these data provide sufficient evidence at the 4% significant level to conclude that the proportion of all students from this West Coast University whose parents bought a car for them when they left for college is less than 55%? b) Calculate and interpret the p-value for this test. Answers to selected supplementary exercises Chapter 2: Descriptive Statistics 1. b) 6 stocks; c) 33.3%; d) $12.95 - $16.95; f) = $16.07, s = $6.06; g) Min = 5.10, Q1 =12.00, Q2 = 15.75, Q3 = 20.50, Max = 28.60; h) No Outliers 2. b) Min = 76 sec, Max = 117 sec; c) Mode = 95 sec; d) 7 subjects; e) 32%; f) = 95.7 sec, s = 11.2 sec 3. b) Min = 1.5, Max = 6.9; c) Modal class: 2.0 – 2.9; d) 6 subjects; e) 43.3%; f) Median = 3.25 4. a) Min = 115, Q1 = 151, Q2 = 165, Q3 = 172, Max = 196; b) Fences: LF = 119.5, UF = 203.5; c) One outlier at the lower end = 115 pounds 5. b) Blood type A; c) 17 subjects; d) 65% 6. a) = 1.94, s = 0.66 b) Min = 1.09, Q1 = 1.54, Q2 = 1.80, Q3 = 2.13, Max = 3.85 c) Fences: LF = 0.65, UF = 3.02; Two outliers at the upper end: 3.10 and 3.85 7. a) = 23.1, s = 2.9 b) Min = 19.2, Q1 = 21.0, Q2 = 22.8, Q3 = 25.9, Max = 27.4 c) Fences: LF = 15.1, UF = 31.8; No outliers Chapter 3: Probability 1. a) A = { (1,1), (1,2), (2,1) } B = { (4,6), (5,5), (5,6), (6,4), (6,5), (6,6) } C = { (1,1), (2,2), (3,3), (4,4), (5,5), (6,6) } b) P(A) = 3/36 = .083 P( Bc ) = 30/36 = .833 P(B ∩ C) = 2/36 = .056 P(A U D) = 20/36 = .556 2. a) P(A) = .73, P(F) = .40, P(A ∩ F) = .32, P(A U F) = .81 b) P( M | H ) = .44, P( H | M ) = .52 c) Not, because there are 28 cars at the intersection d) Not, because P(M | H) ≠ P(M) [.44 ≠ .27] 3. a) P(I) = .60, P(M) = .45, P(M U I) = 0.80, P(M | I) = 0.42 Chapter 4, Part I: Discrete random variables 1. a) P(X < 2) = .60, P(X ≤ 2) = .85, P(X ≥ 3) = .15, P(X = 4) = .05 b) μ = E(x) = 1.4 c) 35 machine breakdowns d) σ = √1.14 = 1.07 f) P(X < μ – σ) = P(X < 0.33) = P(X = 0) = 0.20 g) P(X > μ + 2σ) = P(X > 3.54) = P(X = 4) = 0.05 2. a) P(X = 4) = .219; P(X ≤ 6) = .869, P(X ≥ 8) = .050, P(8 < X < 12) = .015 b) E(x) = 4.5 3. a) P(X < 20) = .971, P(X > 10) = .966, P(10 ≤ X ≤ 20) = .978 b) E(x) = 15 4. P(X = 0) = .082, P(X ≤ 2) = .543, P(1 ≤ X ≤ 3) = .676 5. a) P(X = 3) = .223, P(X ≥ 2) = .829, P(X < 3) = .380 b) P(X = 3) = .074, P(X ≥ 2) = .987, P(X < 3) = .047 Chapter 4, Part II: Continuous random variables 1. b) P(X < 190) = .0668, P(X ≥ 130) = .0062, P(84 ≤ X ≤ 128) = .9714 c) IQ score = 118 2. a) P(X > 130) = 79.67% b) P(75 ≤ X ≤ 90) = 74.92% c) Test score = 95.5 3. a) P(X < 7) = .1379, P(X > 9) = .2327, P(7 ≤ X ≤ 9) = 0.6294 b) Drain time = 8.9 hours 4. a) P(X < 3400) = 85.54% b) P(X > 3000) = 87.90% c) P(3100 ≤ X ≤ 3200) = 20.52% Chapter 6: Estimation 2. Point estimate = = 579 ft2 b) 90% C.I. for µ: (549 , 609) ft2 c) 99% C. I. for µ: (532 , 626) ft2 d) For a given sample size, as the confidence level increases the interval estimate becomes wider and less precise due to a larger margin of error 3. a) Point estimate = 0.529 or 52.9% b) Interval estimate for “p”: (0.482 , 0.576) or (48.2% , 57.6%) c) For a given confidence level, as the sample size increases the margin of error decreases making the interval estimate narrower and more precise 4. Confidence limits for µ: 75.3 +/- 1.4 or (73.9 , 76.7) 5. a) 95% confidence limits for µ: 134.6 +/- 2.5 or (132.1 , 137.1) b) For a given confidence level, as the sample size decreases the margin of error increases making the interval estimate wider and less precise 6. Confidence limits for “p”: 0.875 +/- 0.074 or (0.801 , 0.949) Chapter 7: Hypothesis testing 1. Ho: μ ≤ 15 Ha: μ > 15 RR = { TS > 1.645 } TS = 0.97 Decision: Fail to reject Ho Conclusion: Insufficient evidence to conclude that the mean yield of all income producing stocks is higher than $15. 2. Ho: μ ≥ 100 Ha: μ < 100 RR = { TS < -2.492 } TS = -1.93 Decision: Fail to reject Ho Conclusion: Insufficient evidence to conclude that the mean PFA closure time of the healthy population is less than 100 seconds. 3. Ho: μ = 3.0 Ha: μ ≠ 3.0 RR = { TS > 1.645 or TS < -1.645 } TS = 1.92 Decision: Reject Ho Conclusion: Sufficient evidence to conclude that the mean Troponin-I level of the chest pain patient population differs from 3.0 ng/dL 4. Ho: μ ≥ 25 Ha: μ < 25 RR = { TS < -1.753 } TS = -2.66 Decision: Reject Ho Conclusion: Sufficient evidence to conclude that the mean BMI of the FIU student population is less than 25. 5. Ho: p ≤ 0.50 Ha: p > 0.50 RR = { TS > 2.33 } TS = 1.43 Decision: Fail to reject Ho Conclusion: Insufficient evidence to conclude that the majority of all registered voters in this city support the given candidate. 6. Ho: μ ≤ 1.45 Ha: μ > 1.45 RR = { TS > 1.771 } TS = 3.75 Decision: Reject Ho Conclusion: Sufficient evidence to conclude that the current mean price of bananas is higher than last year 7. Ho: p = 0.11 Ha: p ≠ 0.11 RR { TS < -2.575 or TS > 2.575 } TS = 0.55 Decision: Fail to reject Ho Conclusion: Insufficient evidence to conclude that the population proportion of drug users differs from 11% 8. Ho: μ ≥ 7000 Ha: μ < 7000 RR = { TS < -1.28 } TS = -2.30 Decision: Reject Ho Conclusion: Sufficient evidence to conclude that the nationwide average annual tuition for private high schools is less than $7000 9. Ho: p ≥ 0.55 Ha: p < 0.55 RR = { TS < -1.75 } TS = -0.80 Decision: Fail to reject Ho Conclusion: Insufficient evidence to conclude that the proportion of all students from this West Coast University whose parents bought a car when they left for college is less than 55%