Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
EVEN Homework Lesson 3 ANSWERS p. 496 #34, 35, 37, 41, 43, 46, 47 34.) (a) Population: undergraduates at a large university. Parameter: true proportion of students who would be willing to report cheating. (b) Random: The sample was an SRS. Normal: (172)(.11) and (172)(.89) are both greater or equal to 10. Independent: Since this is a large university, it is most likely less than 10% of the undergraduate student population (c) (0.049, 0.172) (d) We are 99% confident that the interval from 0.049 and 0.172 captures the true proportion of students who would be willing to report cheating. 46.) 356 Unit 8 Lessons 4 and 5 ESTIMATING A POPULATION MEAN SECTION 8.3 OBJECTIVES Construct and interpret a confidence interval for a population mean. Determine the sample size required to obtain a level C confidence interval for a population mean with a specified margin of error. Carry out the steps in constructing a confidence interval for a population mean: define the parameter; check conditions; perform calculations; interpret results in context. Determine sample statistics from a confidence interval Understand why each of the three inference conditions – Random, Normal, and Independent – is important. One-Sample z Interval for a Population Mean Statistic ± (critical value) • (standard deviation of statistic) Let’s rewrite this specifically for the population mean: x z * n This is from an SRS of size n from a large population that contains an unknown mean µ and known standard deviation σ. As long as the Normal and Independent conditions are met, z* is the critical value for the standard Normal curve with area C between –z* and z*. This interval is sometimes called a one-sample z interval for a population mean. Choosing the Sample Size When you start planning a study, you might be unsure how large of a sample you want, but you might have an idea of the margin of error you want it to be/keep below. This part Recall the confidence intervalrepresents for estimating a your margin population’s mean of error x z* n If we want to keep it within a specific margin of error, we can set up an inequality that would look like this: z* n ME Choosing the Sample Size z* This critical value will be determined by the Confidence level (given to you in the problem) n ME Wait, how do we know the population’s standard deviation? This n is what we want to solve for! We will have to guess the . Get a reasonable value by using a standard deviation from a pilot study/past experience with similar studies . EXAMPLE Administrators at your school want to estimate how much time students spend on homework, on average, during a typical week. They want to estimate µ at the 90% confidence level with a margin of error of at most 30 minutes. A pilot study indicated that the standard deviation of time spent on homework per week is about 154 minutes. How many students need to be surveyed to estimate the mean number of minutes spent on homework per week with 90% confidence and a margin of error of at most 30 minutes? 154 1.645(154) 1.645 15 15 n nneed The administrators 1.645 ) 15atnleast 1.645 (154) to(154 survey 286 n 2 15 645(154) 1.students. n 285.2 n 15 What if the standard deviation is unknown??? Most of the times, if you don’t know the population mean, you’re not going to know the population standard deviation either! Recall that if the sampling distribution of x is close to Normal, we can find probabilities involving x by standardizing: x z ( / n ) Since we typically don’t know the standard deviation, we can use the standard deviation of our sample: x (sx / n ) ACTIVITY: BINGO! If we are doing inference about a population mean µ, what happens when we use the sample standard deviation sx to estimate the population standard deviation σ? Before we look into this, let’s look at a population we might know more information about: Let’s start with a Normal population with mean µ=100 and standard deviation σ = 5. We are going to take an SRS of size 4 from the population; We are going to compute the sample mean Standardize the value of the sample mean using the “known” value σ = 5. In your calculator, this is what you will type: randNorm(100, 5, 4)L1: 1-Var Stats L1: x ( -100)/(5/√4) Math PRB option 5 STAT Calc option 1 STO> LIST (2nd STAT) L1 : (ALPHA .) : (ALPHA .) VARS 5: Statistics option 2 ACTIVITY: BINGO! Hit Enter 100 times and say BINGO every time your z-score is above 3 or below -3 standard deviations Write down the value every time you say BINGO According to the 68-95-99.7 rule, about how often should a “Bingo!” occur? ACTIVITY: BINGO! Now let’s see what happens when you standardize the value of x using the sample’s standard deviation sx instead of the “known” σ. In your calculator, this is what you will type: ENTRY (2nd ENTER) … this pulls up what you previously typed You want to change the standard deviation of 5 to the sample’s standard deviation sx . To do this: Using the arrow keys, scroll to the last command and put your cursor on 5, then hit VARS option 5: Statistics option 3:Sx AGAIN, Hit Enter 100 times and say BINGO every time your z-score is above 3 or below -3 standard deviations Write down the value every time you say BINGO. ACTIVITY: BINGO! What did you notice the difference between when we used the POPULATION’S standard deviation σ versus when you used the SAMPLE’S standard deviation sx? There were more Bingos the 2nd time – meaning that there were more z-scores outside of 3 standard deviations. – way more than 0.03%. What does this mean?!? t distribution When we used our sample’s standard deviation, what happened? More values were outside 3 standard deviations! This is representing a NEW distribution: the t distribution: x t (sx / n ) It has a different shape than the standard Normal curve - still symmetric with single peak, but with much more area in the tails. See page 504 in your textbook to see a picture of the shape differences William S. Gosset (1876 – 1937) This distribution was discovered when William S. Gosset worked for the Guinness Brewery – his goal in life was to make better beer. He used his new t procedures to find the best combination of barley and hops, which got him the job of head brewer. Gosset used the penname “Student” when publishing this mathematical work, so often the t distribution is referred as the “Student’s t”. Degrees of Freedom The statistic t has the SAME interpretation as any standardized statistic: it says how far x is from its mean in standard deviation units. There is a different t distribution for each sample size. Because of this, we need to identify a particular t distribution by number of degrees of freedom (df ) df = n – 1 (subtract 1 from the sample size) The notation to identify a t distribution with a particular degrees of freedom is t n-1 The t distributions, Degrees of Freedom Draw an SRS of size n from a large population that has Normal distribution with mean µ and standard deviation σ. The statistic: x t (sx / n ) Has the t distribution with degrees of freedom df = n-1. The statistic will have approximately a tn-1 distribution as long as the sampling distribution x is close to Normal. More about the Degrees of Freedom The density curves of t distributions are similar in shape to the standard Normal curve The spread of t distributions is a bit greater than that of the standard Normal distribution. As the degrees of freedom increase, the t density curve approaches the standard Normal curve more closely. This is because the larger the sample size, the closer your sx gets to σ. Using TABLE B Table B shows the critical values t* for the t distributions. The left column represent the degrees of freedom Common confidence levels are given at the bottom of the table. By looking down any column, you can check that the t critical values approach the Normal critical values z* as the degrees of freedom increase. EXAMPLE - Using TABLE B Suppose you wanted to construct a 90% confidence interval for the mean µ of a Normal population based on an SRS of size 10. What critical value t* should you use? Using the line for df = 10 -1 = 9 and the column with a tail probability of .05 (10%/2), the desired critical value is t* = 1.833. YOUR TURN Use Table B to find the critical value t* that you would use for confidence interval for a population mean µ for a 98% confidence interval based on n=22 observations. Using the line for df = 22 -1 = 21 and the column with a tail probability of .01 (2%/2), the desired critical value is t* = 2.518. In your calculator For TI-84: DISTRIBUTION (2nd VARS), option 4: invT( In the parentheses, type in the area to the left of the desired critical value, the degrees of freedom. From the last problem, you would type in invT(.01,21) CONTRUCTING A CONFIDENCE INTERVAL FOR µ First, check your conditions! RANDOM: The data comes from a random sample of size n from the population of interest of a randomized experiment. NORMAL: The population has a Normal distribution of the sample size is large (n≥30). INDEPENDENT: 10% rule: The sample size is no more than 1/10 of the population. The One-Sample t Interval for a Population Mean Estimate ± (critical value)(standard deviation of statistic) sx x t* n Remember, technically we call sx n the standard error of the sample mean – which describes how far x will be from µ on average, in repeated SRSs of size n. EXAMPLE As part of their final project in AP Statistics, Christina and Rachel randomly selected 18 rolls of generic brand of toilet paper to measure how well this brand could absorb water. To do this, they poured ¼ cup of water onto a hard surface and counted how many squares it took to completely absorb the water. Here are the results from their 18 rolls: 29 20 25 29 21 24 27 25 24 29 24 27 28 21 25 26 22 23 Construct and interpret a 99% confidence interval for µ = the mean number of squares of generic toilet paper needed to absorb ¼ cup of water. EXAMPLE STATE: We want to estimate µ = the mean of number of squares of generic toilet paper needed to absorb ¼ cup of water with 99% confidence. EXAMPLE PLAN: We will construct a one-sample t interval, provided the following conditions are met: RANDOM: The students selected the rolls of generic toilet paper at random. NORMAL: Since the sample size is small (n=18), and we aren’t told that the population is Normally distributed, we need to check whether it is reasonable to believe that the population has Normal distribution. (draw a dotplot to see that there are no outliers and it roughly follows a Normal distribution) INDEPENDENT: Since we are sampling without replacement, we must check the 10% condition. It is reasonable to believe that there are at least 10(18) = 180 rolls of generic toilet paper. EXAMPLE x 24.94 and the sample standard deviation is s x 2.86 . DO: The sample mean for these data is Since there are 18 – 1 = 17 degrees of freedom and we want 99% confidence, we will use a critical value of t* = 2.898 (from Table B). sx 2.86 x t* 24.94 2.898 n 18 24.94 1.95 (22.99,26.89) EXAMPLE CONCLUDE: We are 99% confident that the interval from 22.99 squares to 26.89 squares captures the true mean number of squares of generic toilet paper needed to absorb ¼ cup of water. Using t Procedures Wisely What happens when a condition for using a t procedure is violated?? If your result is still pretty accurate, then we call that procedure ROBUST. According to the Merriam Webster Dictionary, a definition for robust “is having or showing vigor, strength, or firmness”….or another one is “capable of performing without failure under a wide range of conditions “ According to Statistics, an inference procedure is robust if the probability calculations involved in that procedure remain fairly accurate when a condition for using the procedure is violated. Using t Procedures Wisely Good news for us! The t procedures are quite robust against non-Normality of the population EXCEPT when outliers or strong skewness is present. Larger samples improve the accuracy of critical values from the t distributions when the population is not Normal because: x The sampling distribution of is close to Normal if the sample size is large enough (CLT!) As the sample n grows, the sample standard deviation sx will be an accurate estimate of σ whether or not the population has Normal distribution. Using t Procedures Wisely Always plot the data to check if it’s roughly Normal – but more importantly, that there’s no outliers or major skewness. And it’s definitely MORE important to make sure that it comes from RANDOM data, rather than being picky about how Normal it looks. Follow these procedures to help using sample size n: If n<15: Use t procedures if the data appears close to Normal (roughly symmetric, single peak, no outliers). If the data are clearly skewed or there are outliers, DO NOT USE t. If n≥15: The t procedures can be used except in the presence of outliers or strong skewness. Large samples (n≥30): The t procedures can be used even for clearly skewed distributions when the sample is large! EXAMPLE – can we use t? Don’t use t for these situations either: If your sample data gives a biased estimate for some reason And if you have all the data for all your population of interest Determine whether we can safely use a one-sample t interval to estimate the population mean in each of the following settings: A.) Below is a histogram of the total number of students in class and their heights. NO. We have data for the entire number of students, so we do NOT need inference. Remember, you only use inference when you are ESTIMATING something about the population (because that proportion or mean is unknown!) EXAMPLE – can we use t? Determine whether we can safely use a one-sample t interval to estimate the population mean in each of the following settings: B.) The dot plot below shows expenditure costs for 6 of the employees at a company NO. This is a sample of 6 (less than 15!), so we can only use t procedures if the data appears close to Normal. It does not appear that way. EXAMPLE – can we use t? Determine whether we can safely use a one-sample t interval to estimate the population mean in each of the following settings: C.) The boxplot below shows the SAT Math scores for a random sample of 20 students at your high school. YES. The sample size is 20 (greater than 15!) Although slightly skewed, there doesn’t seem to be strong skewness or the presence of any outliers. For more examples with these, see p. 513 in book In your calculator: One-sample t intervals for µ on your calculator: STAT TESTS option 8:Tinterval… If the problem gives you actual data: (Make sure your data is in L1) Choose Data option, List:L1, Freq:1, Clevel: (type in your confidence level out of percent form), Highlight Calculate and press ENTER If the problem just gives you summary statistics: Choose Stats option, and type in your sample mean, sample standard deviation, sample size n, and C-level (out of percent form), Highlight Calculate and press ENTER Calculators aren’t always right! Be careful because sometimes there are called “parallel solutions” where your calculator and your calculations might give you two different answers. That’s why you always need to show your work and explanation on exams! EXAMPLE – for practice! The principal at a large high school claims that students spend at least 10 hours per week doing homework, on average. To investigate this claim, an AP Statistics class selected a random sample of 250 students from their school and asked them how long they spent doing homework during the last week. The sample mean was 10.2 hours and the sample standard deviation was 4.2 hours. (a) Construct and interpret a 95% confidence interval for the mean time that students at this school spent doing homework in the last week. (b) Based on your interval in part (a), what can you conclude about the principal’s claim? EXAMPLE STATE: We want to estimate µ = the mean time spent doing homework in the last week for students at this school with 95% confidence. EXAMPLE PLAN: We will construct a one-sample t interval, provided the following conditions are met: RANDOM: The students were randomly selected. NORMAL: We are not told if the population is normal. However, since the sample size is large (n=250), we are safe using t procedures. INDEPENDENT: Since we are sampling without replacement, we must check the 10% condition. It is reasonable to believe that there are at least 10(250) = 2500 students since it is a large high school. EXAMPLE x 10.2 and the sample standard deviation is s x 4.2 . DO: The sample mean for these data is Since there are 250 – 1 = 249 degrees of freedom and we want 95% confidence, we will use a critical value of t* = 1.984 (from Table B). sx 4.2 x t* 10.2 1.984 n 250 10.2 0.53 (9.67,10.73) EXAMPLE CONCLUDE: We are 95% confident that the interval from 9.67 hours to 10.73 hours captures the true mean of hours that students at this school spent doing homework in the last week. (b) Since the interval of plausible values for µ includes values less than 10, the interval does not provide convincing evidence to support the principal’s claim that students spend at least 10 hours on homework per week, on average. Lesson 4: Homework problems Read textbook pages: p. 499-511 Complete exercises: p. 498 #49-52 p. 518 #55, 57, 59, 60, 63 Check answers to odd problems Lesson 5: Homework problems Read textbook pages: p. 511-517 Complete exercises: p. 519 #65-67, 71, 73-78 Check answers to odd problems