Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sufficient statistic wikipedia , lookup
History of statistics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Confidence interval wikipedia , lookup
Taylor's law wikipedia , lookup
Statistical inference wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Class Four To Be Turned In: Chapter 3: 28, 30, 36, 42 Chapter 4: 24, 32, 36, 38 Chapter 5: 30, 32, 34, 44 For Class Five: Chapter 11: problems 24, 32, 36 Chapter 14: problems 36, 38, 50 Quiz 2 Read Chapters 15 & 17 Objectives for Class Four • Compare and contrast population parameters and sample statistics. • Define basic terms related to statistics, including sampling variability, sampling distribution of a statistic, and unbiased statistic. • Describe the sampling distribution of the sampling mean and calculate probabilities regarding the sample mean by using the central limit theorem. • Explain the meaning of statistical confidence. • Compute confidence intervals for the mean of a population (assuming the standard deviation is known). • Discuss the relationship between sample size and margin of error in a confidence interval. • Correctly state the null and alternative hypotheses for one-sample hypothesis tests. • Calculate and interpret the p-value for one sample hypothesis test. • Explain the meaning of statistical significance and determine if a hypothesis test is significant at a given level. Parameters and Statistics • parameter: a number that describes the population which is usually unknown because we cannot examine the entire population • statistic: is a number that can be computed from the sample data without making use of any unknown parameters which is usually used to estimate an unknown parameter • “population mean” = μ • “sample mean” = x • Law of Large Numbers: draw observations from any population with finite mean μ. As the number of observations drawn increases, the mean x of the observed values gets closer and closer to the mean μ of the population. Sampling Distributions • sampling distribution of a statistic: is the distribution of values taken by the statistic in all possible samples of the same size from the same population • sampling distribution of a sampling mean: if individual observations have N(μ, σ) distribution, then the sample mean x of n independent observations has the N(μ, σ/ n ) distribution. • mean and standard deviation of sample mean: if x is the mean of an SRS of size n drawn from a large population with mean μ and standard deviation σ, then the mean of the sampling distribution of x is μ and its standard deviation is σ/ n – the average of all the sample means will be the mean of the population which makes it an unbiased estimator of the parameter – averages of this nature are less variable than individual observations Central Limit Theorem • Draw an SRS of any size n from a population with mean μ and standard deviation σ. When n is large, the sampling distribution of the sample mean x is approximately Normal x is approximat ely N , n Statistical Inference • statistical inference: provides methods for drawing conclusions about a population from sample data • Inference about a mean under simple conditions – We have an SRS from the population of interest. – The variable we measure has perfectly Normal distribution N(μ, σ) in the population. – We don’t know the population mean μ. Our task is to infer something about μ from the sample data, but we do know the population standard deviation σ. • confidence interval: a level C confidence interval for a parameter has two parts – an interval calculated from the data, usually of the form estimate ± margin of error – a confidence level C, which gives the probability that the interval will capture the true parameter value in repeated samples i.e. the success rate of the method Statistical Inference • confidence interval for the mean of a population: draw an SRS of size n from a Normal population having unknown mean μ and known standard deviation σ. A level C confidence interval for μ is x z * n • the margin of error is n z * • z* values for differing levels of confidence C can be found on the bottom row of Table C. NAEP Quantitative Scores The NAEP survey includes a short test of quantitative skills, covering mainly basic arithmetic and the ability to apply it to realistic problems. Scores on the test range from 0 to 500, with higher scores indicating greater numerical abilities. It is known that NAEP scores have standard deviation = 60. In a recent year, 840 men 21 to 25 years of age were in the NAEP sample. Their mean quantitative score was 272. On the basis of this sample, estimate the mean score in the population of all 9.5 million young men of these ages with 95% conidence. x z * n 60 272 1.960 840 60 272 1.960 267.9424 840 60 272 1.960 276.0576 840 Using a sample of 840 men aged 21 to 25, we are 95% confident that the mean quantitative skills score on the NAEP exam for this population is between 267.9424 and 276.0576 with MOE of 4.0576. Margins of Error • there is a trade off between margin of error and confidence level. – to obtain a smaller margin of error for the same data you must be willing to accept lower confidence – it is easier to pin down μ when σ is small – increasing the sample size reduces the margin of error • choosing a sample size for a desired margin of error – the confidence interval for the mean of a Normal population will have a specified margin of error m when the sample size is z* n m 2 Tests of Significance • The goal of a test of significance is to test the evidence provided by data about some claim, called a null hypothesis, concerning a parameter of the population – an outcome that would rarely happen if a claim were true is good evidence that the claim is not true • this is based on the idea of a counterexample from logic: it takes only one instance of when a statement is untrue to show that it is unreliable and should be considered false • examples, no matter how numerous, only demonstrate truth in that one instance they do not demonstrate the truth of the statement in all circumstances – tests of significance start with an SRS from an exactly Normal population with standard deviation known to us Stating Hypotheses • null hypothesis (H0): is the statement being tested statistically – the test is designed to assess the strength of the evidence against the null hypothesis – usually the null hypothesis is a statement of “no effect” or “no difference” • alternative hypothesis (Ha): the claim about the population that we are trying to find evidence for – one sided: we are interested in whether the parameter is greater than or less than the “no effect” level but not both – two sided: we are interested in whether the parameter is both greater than and less than the “no effect” level The Hypotheses for Means Null: H0: μ = μ 0 One sided alternatives Ha: μ > μ 0 Ha: μ < μ 0 Two sided alternative Ha: μ ≠ μ 0 Test Statistic • The test is based on a statistic that compares the value of the parameter stated in the null hypothesis with an estimate of the parameter from the sample data. The estimate is usually the same one used in a confidence interval for the parameter. • Large values of the test statistic indicate that the estimate is far from the parameter value specified by H0. These values give evidence against H0. The alternative hypothesis determines which directions count against H0. x • the z test statistic is given by: z / n – measures how far the sample data diverge from the null hypothesis. – the probability of this occurring is given by a p-value P-values & Statistical Significance • p-value: the probability, computed assuming that H0 is true, that the test statistic would take a value as extreme or more extreme that that actually observed. exact p values can be found on Table A or with software – the smaller the p-value the stronger the evidence against H0 provided by the data – large p-values fail to give evidence against H0. – Ha: μ > μ 0 P-value is the probability of getting a value as large or larger than the observed test statistic (z) value. ─ Ha: μ < μ 0 P-value is the probability of getting a value as small or smaller than the observed test statistic (z) value. ─ Ha: μ ≠ μ 0 P-value is two times the probability of getting a value as large or larger than the absolute value of the observed test statistic (z) value. • significance level (): a fixed value (critical values from Table C) of the p-value that we consider to be decisive i.e. a predetermined level of evidence required to reject H0. ranges from weak evidence near 0.10 to some evidence near 0.05 no evidence good evidence varying degrees of strong evidence Suppose we know that for any cola, the sweetness loss scores vary from taster to taster according to a Normal distribution with standard deviation σ = 1. The mean μ for all tasters measures loss of sweetness. The sweetness losses for a new cola, as measured by 10 trained testers, yields an average sweetness loss of x = 1.02. Do the data provide sufficient evidence that the new cola lost sweetness in storage? The null hypothesis is no average sweetness loss occurs, while the alternative hypothesis (that which we want to show is likely to be true) is that an average sweetness loss does occur. H0: μ = 0 Ha: μ > 0 This is considered a one-sided test because we are interested only in determining if the cola lost sweetness (gaining sweetness is of no consequence in this study). x 0 1.02 0 z 3.23 1 10 n For test statistic z = 3.23 and alternative hypothesis Ha: μ > 0, the P-value would be: P-value = P(Z > 3.23) = 1 – 0.9994 = 0.0006 If H0 is true, there is only a 0.0006 (0.06%) chance that we would see results at least as extreme as those in the sample; thus, since we saw results that are unlikely if H0 is true, we therefore have evidence against H0 and in favor of Ha and we say: “Using samples of 10 trained tasters, we have extremely strong evidence (P ≤ 0.0006) to reject the null hypothesis that there is no loss of sweetness in cola.” Objectives for Class Four • Compare and contrast population parameters and sample statistics. • Define basic terms related to statistics, including sampling variability, sampling distribution of a statistic, and unbiased statistic. • Describe the sampling distribution of the sampling mean and calculate probabilities regarding the sample mean by using the central limit theorem. • Explain the meaning of statistical confidence. • Compute confidence intervals for the mean of a population (assuming the standard deviation is known). • Discuss the relationship between sample size and margin of error in a confidence interval. • Correctly state the null and alternative hypotheses for one-sample hypothesis tests. • Calculate and interpret the p-value for one sample hypothesis test. • Explain the meaning of statistical significance and determine if a hypothesis test is significant at a given level. Next Week Class Five To Be Completed Before Class Five: Chapter 11: problems 24, 32, 36 Chapter 14: problems 36, 38, 50 Quiz 2 Read Chapters 15 & 17