Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The mean and the std. dev. of the sample mean • Select a SRS of size n from a population and measure a variable X on each individual in the sample. • The data consists of observations on n r.v’s X1,X2…,Xn. • If the population is large we can consider X1,X2…,Xn to be independent. • The sample mean of a SRS of size n is X . X 1 X 2 ... X n X n • If the population has mean and std dev. , what is the: mean of the total T = X1+X2+···+Xn ? Answer: μT = μ X1+X2+···+Xn = n·μ week9 1 Mean of the sample mean X ? X 1 n X 1 X 2 X n Variance of the total T ? T2 2X X 1 2 n 2 X n Variance of the sample mean X ? 1 2 n n n 2 X2 12 n X 1 X 2 X n week9 2 2 Sampling distribution of a sample mean • If a population has the N(,) distribution, then the sample mean X of n independent observations has the N(, / n ) • Example A bottling company uses a filling machine to fill plastic bottles with a popular cola. The bottles are supposed to contain 300 milliliters (ml). In fact, the contents vary according to a normal distribution with mean 298 ml and standard deviation 3 ml. (a) What is the probability that an individual bottle contains less than 295 ml? (b) What is the probability that the mean contents of the bottles in a six-pack is less than 295ml? . week9 3 The central limit theorem • Draw a SRS of size n from a population with mean and std dev. . When n is large, sampling distribution of a sample mean X is approximately normal with mean and std dev. / n . • Note: The normal approximation for the sample proportion and counts is an important example of the central limit theorem. • Note: The total T = X1+X2+···+Xn is approximately normal n with mean n and stdev. ·. week9 4 Example (Question 24 Final Dec 98) Suppose that the weights of airline passengers are known to have a distribution with a mean of 75kg and a std. dev. of 10kg. A certain plane has a passenger weight capacity of 7700kg. What is the probability that a flight of 100 passengers will exceed the capacity? week9 5 Example In a certain University, the course STA100 has tutorials of size 40. The course STA200 has tutorials of size 25, and the course STA300 has tutorials of size 15. Each course has 5 tutorials per year. Students are enrolled by computer one by one into tutorials. Assume that each student being enrolled by computer may be considered a random selection from a very big group of people wherein there is a 50-50 male to female sex ratio. Which of the following statements is true? A) Over the years STA100 will have more tutorials with 2/3 females (or more). B) Over the years STA200 will have more tutorials with 2/3 females (or more). C) Over the years STA300 will have more tutorials with 2/3 females (or more). D) Over the years, each course will have about the same number of tutorials with 2/3 females (or more). E) No course will have tutorials with 2/3 females (or more). week9 6 Question State whether the following statements are true or false. (i) As the sample size increases, the mean of the sampling distribution of the sample mean X decreases. (ii) As the sample size increases, the standard deviation of the sampling distribution of the sample mean X decreases. (iii) The mean X of a random sample of size 4 from a negatively skewed distribution is approximately normally distributed. (iv) The distribution of the proportion of successes X in a sufficiently large sample is approximately normal with mean p and standard deviation np1 p where p is the population proportion and n is the sample size. (v) If X is the mean of a simple random sample of size 9 from N(500, 18) distribution, then X has a normal distribution with mean 500 and variance 36. week9 7 Question State whether the following statements are true or false. o A large sample from a skewed population will have an approximately normal shaped histogram. o The mean of a population will be normally distributed if the population is quite large. o The average blood cholesterol level recorded in a SRS of 100 students from a large population will be approximately normally distributed. o The proportion of people with incomes over $200 000, in a SRS of 10 people, selected from all Canadian income tax filers will be approximately normal. week9 8 Exercise A parking lot is patrolled twice a day (morning and afternoon). In the morning, the chance that any particular spot has an illegally parked car is 0.02. If the spot contained a car that was ticketed in the morning, the probability the spot is also ticketed in the afternoon is 0.1. If the spot was not ticketed in the morning, there is a 0.005 chance the spot is ticketed in the afternoon. a) Suppose tickets cost $10. What is the expected value of the tickets for a single spot in the parking lot. b) Suppose the lot contains 400 spots. What is the distribution of the value of the tickets for a day? c) What is the probability that more than $200 worth of tickets are written in a day? week9 9 Exercises 1. Z ~ N(0, 1). Find P (-1.96 < Z < 1.96). 2. Z ~ N(0, 1). Find the value of c such that P(-c < Z < c) = 0.95. 3. Z ~ N(0, 1). Find the value of c such that P(-c < Z < c) = 0.90. 4. X ~ N(500, 15). Find the values of c and d such that P(c < X < d ) = 0.95. week9 10 5. X~N(, ). Find the values of c and d (in terms of , and ) such that P(c < X < d ) = 0.95 6. X~N(, ). Find the values of c and d (in terms of , and ) such that P(c < X < d ) = 0.90 7. X~N(500, 15). Let X be the mean of a random sample of size 9. Find the values of c and d such that P( c < X < d ) = 0.95 8. X~ N(, ) Let X be the mean of a random sample of size n Find the values of c and d such that P( c < X < d ) = 0.95 week9 11 Point Estimates and CI • A basic tool in statistical inference is point estimate of the population parameter. However, an estimate without an indication of it’s variability is of little value. • Example: Parameter Estimate μ σ2 X S2 p p̂ Std. Error • A level C confidence interval for a parameter is an interval computed from sample data by a method that has probability C of producing an interval containing the true value of the parameter. week9 12 Confidence interval for the population mean • Choose a SRS of size n from a population having unknown mean and known stdev. . A level C confidence interval for is an interval of the form, x z* n ,xz x z n n • Here z* is the value on the standard normal curve with area C between z* and z* . The interval is exact when the population distribution is normal and approximately correct for large n in other cases. • In general CIs have the form: Estimate margin of error • In the above case, Margin of error = m = z* n week9 13 • Note, in the above formula for the CI for the population mean, n is the stdev. of the sample mean X (this is also known as the std. error of the sample mean X ) and it can also be written as x z*Std.Error( X ) • The width of any CI is L = 2m i.e. twice the margin of error. • Here are three ways to reduce the margin of error (and the width of the CI) Use a lower level of confidence (smaller C) Increase the sample size n. Reduce (usually not possible). week9 14 Sample size for desired margin of error • The CI for population mean will have a specified margin of error m when the sample size is z* n m 2 • Example: A limnologist wishes to estimate the mean phosphate content per unit volume of lake water. It is known from previous studies that the stdev. has a fairly stable value of 4mg. How many water samples must the limnologist analyze to be 90% certain that the error of estimation does not exceed 0.8 mg? week9 15 Example • You want to rent an unfurnished one-bedroom apartment for next semester. The mean monthly rent for a random sample of 10 apartments advertised in the local newspaper is $580. Assume that the stdev. is $90. Find a 95% CI for the mean monthly rent for unfurnished one-bedroom apartments available for rent in this community. • How large a sample of one-bedroom apartments would be needed to estimate the mean µ within ±$20 with 90% confidence? week9 16 Exercise • Data on the Degree of Reading Power (DRP) scores for 44 students are recorded. Suppose that the SD of the population of DRP scores is know to be σ =11. 95% CI for the population mean score is given in the MINITAB output below. DRP Scores 40 26 39 47 19 26 52 25 35 47 35 48 14 35 35 22 42 34 33 33 18 15 29 41 Z Confidence Intervals The assumed sigma = 11.0 Variable N Mean StDev DRP Scor 44 35.09 11.19 25 44 34 51 43 40 41 27 SE Mean 1.66 46 38 49 14 27 31 28 54 19 46 52 45 95.0 % CI (31.84 , 38.34) • MINITAB Command Stat > Basic Statistics > 1 Sample Z and select ‘Confidence interval’ week9 17 Exercise a) b) c) d) e) A random sample of 85 students in Chicago city high schools taking a course designed to improve SAT scores. Based on these students a 90% CI for the mean improvement in SAT scores for all Chicago high school students is computed as (72.3, 91.4) points. Which of the following statements are true? 90% of the students in the sample improved their scores by between 72.3 and 91.4 points. 90% of the students in the population improved their scores by between 72.3 and 91.4 points. 95% CI will contain the value 72.3. The margin of error of the 90% CI above is 9.55. 90% CI based on a sample of 340 ( 85 X 4) students will have margin of error 9.55/4. week9 18 Statistical Tests • A significance test is a formal procedure for comparing observed data with a hypothesis whose truth we want to assess. The hypothesis is a statement about the parameters in a population or model. • Null hypothesis The statement being tested in a test of significance is called the null hypothesis. The test of significance is designed to assess the strength of the evidence against the null hypothesis. Usually the null hypothesis is a statement of “no effect” or “no difference”. • We abbreviate “null hypothesis” as H0 . week9 19 Example Each of the following situations requires a significance test about a population mean . State the appropriate null hypothesis H0 and alternative hypothesis Ha in each case. (a) The mean area of the several thousand apartments in a new development is advertised to be 1250 square feet. A tenant group thinks that the apartments are smaller than advertised. They hire an engineer to measure a sample of apartments to test their suspicion. (b) Larry's car consume on average 32 miles per gallon on the highway. He now switches to a new motor oil that is advertised as increasing gas mileage. After driving 3000 highway miles with the new oil, he wants to determine if his gas mileage actually has increased. (c) The diameter of a spindle in a small motor is supposed to be 5 millimeters. If the spindle is either too small or too large, the motor will not perform properly. The manufacturer measures the diameter in a sample of motors to determine whether the mean diameter has moved away from the target. week9 20 Test Statistic • The test is based on a statistic that estimate the parameter that appears in the hypotheses. Usually this is the same estimate we would use in a confidence interval for the parameter. When H0 is true, we expect the estimate to take a value near the parameter value specified in H0. • Values of the estimate far from the parameter value specified by H0 give evidence against H0. The alternative hypothesis determines which directions count against H0. • A test statistic measures compatibility between the null hypothesis and the data. • We use it for the probability calculation that we need for our test of significance • It is a random variable with a distribution that we know. week9 21 Example • An air freight company wishes to test whether or not the mean weight of parcels shipped on a particular root exceeds 10 pounds. A random sample of 49 shipping orders was examined and found to have average weight of 11 pounds. Assume that the stdev. of the weights () is 2.8 pounds. • The null and alternative hypotheses in this problem are: H0: μ = 10 ; Ha: μ > 10 . • The test statistic for this problem is the standardized version of X Z X / n • Decision: ? week9 22 P-value and Significance level • The probability computed under the assumption that H0 is true, that the test statistic would take a value as extreme or more extreme than that actually observed is called the P-value of the test. The smaller the P-value the stronger the evidence against H0 provided by the data. • The decisive value of the P is called the significance level. It is denoted by . • Statistical significance If the P-value is as small or smaller than , we reject H0 and say that the data are statistically significant at level . • The P-value is the smallest level α at which the data are significant. week9 23 Z Test for a population mean ( known) • To test the hypothesis H0: µ = µ0 based on a SRS of size n from a population with unknown mean µ and known stdev σ, compute the test statistic x z 0 n • In terms of a standard Normal variable Z, the P-value for the test of H0 against Ha : µ > µ0 is P( Z ≥ z ) Ha : µ < µ0 is P( Z ≤ z ) Ha : µ ≠ µ0 is 2·P( Z ≥ |z|) • These P-values are exact if the population distribution is normal and are approximately correct for large n in other cases. week9 24 Critical value approach • We can base our test conclusions on a fixed level of significant α without computing the P-value. • For this we need to find a critical value z* from the standard normal distribution with a specified tail area (to the right or left depending on Ha). This tail area is called the rejection region. • If the test statistic falls in the rejection region we reject H0 and conclude that the data are statistically significant at level . • A P-value is more informative then a reject-or-not finding at a fixed significance level because it can tell us about the strength of evidence we found against the H0. week9 25 Example • The Pfft Light Bulb Company claims that the mean life of its 2 watt bulbs is 1300 hours. Suspecting that the claim is too high, Nalph Rader gathered a random sample of 64 bulbs and tested each. He found the average life to be 1295 hours. Test the company's claim using = 0.01. Assume = 20 hours. week9 26 Exercise • A standard intelligence examination has been given for several years with an average score of 80 and a standard deviation of 7. If 25 students taught with special emphasis on reading skill, obtain a mean grade of 83 on the examination, is there reason to believe that the special emphasis changes the result on the test? Use = 0.05. week9 27 Exercise • Data on the Degree of Reading Power (DRP) scores for 44 students in a suburban school district (same data as on slide 17). Suppose that the SD of scores in this school district is known to be σ =11. The researcher believes that the mean score μ of all the students in this district is higher than the national mean which is 32. The MINITAB output for the test is given below. Z-Test Test of mu = 32.00 vs mu > 32.00 The assumed sigma = 11.0 Variable N Mean StDev SE Mean DRP Scor 44 35.09 11.19 1.66 Z 1.86 P 0.031 • MINITAB Command Stat > Basic Statistics > 1 Sample Z and select ‘Test mean’ week9 28 Confidence Intervals and two-sided tests • A level two-sided significance test rejects a hypothesis H0: μ = μ0 exactly when the value μ0 falls outside the 1- α confidence interval for . • Example For the exercise on slide 27 a 95% CI is 83 ± 1.96·(7/5) = (80.256, 85.744) The value 80 is not in this interval and so we reject H0: = 80 at the 5% level of significance. week9 29