Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sufficient statistic wikipedia , lookup
History of statistics wikipedia , lookup
Psychometrics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Foundations of statistics wikipedia , lookup
Statistical hypothesis testing wikipedia , lookup
Misuse of statistics wikipedia , lookup
4-1 Statistical Inference • Statistical inference is to make decisions or draw conclusions about a population using the information contained in a sample from the population. Its two major areas: 1. Parameter Estimation 2. Hypothesis Testing 4-2 Point Estimation • A point estimate is an observed value of a point estimator (a statistic) . Point Estimator 4-2` Interval Estimation - Confidence Interval Note: L(lower confidence limit) and U (upper confidence limit) are statistic and hence random variables. Ex) Confidence level = 95% ( 1- =0.95) P(C.I. will contain the true parameter) = 0.95 95% of all the C.I. s will contain the true parameter •The general formula for all confidence intervals is: Point Estimate ± (Critical Value) (Standard Error) Lower Confidence Limit L=Point Estimate -Critical Value*S.E Point Estimate Width of confidence interval (=2*Critical Value*S.E.) Upper Confidence Limit U=Point Estimate +Critical Value*S.E Confidence Interval for a Mean when the variance 2 is known Assumptions – Population standard deviation σ is known – Population is normally distributed – If population is not normal, use large sample (CLT) 100(1-)% (two-sided) Confidence Interval for : X z/2 or σ n (, where Zα/2 is the standardized normal distribution critical value for a probability of α/2 in each tail) Chap 8-6 100(1-)% Upper-Confidence Bound for 100(1-)% Lower-Confidence Bound for Critical Value: Zα/2 Consider a 95% confidence interval: 1- =0.95 α .025 2 α .025 2 Z1- /2 = -1.96 0 Zα/2 = 1.96 Commonly used confidence levels are 90%, 95%, and 99% Note: Z1- /2 = - Zα/2 X units: Chap 8-8 Example A sample of 11 circuits from a normal population has a mean resistance of 2.20 ohms. We know from past testing that the population standard deviation is 0.35 ohms. Determine a 95% confidence interval for the true mean resistance of the population. X Z / 2 σ 2.20 1.96 (0.35/ 11) n 2.20 .2068 (1.9932, 2.4068) Chap 8-9 Confidence Interval for a Mean when the variance 2 is unknown • If the population standard deviation σ is unknown, we can substitute the sample standard deviation, S • This introduces extra uncertainty, since S is variable from sample to sample => Use the t distribution instead of the normal distribution Assumptions – Population standard deviation is unknown – Population is normally distributed – If population is not normal, use large sample • 100(1-)% Confidence Interval for : X t /2, n -1 s n or (,where t/2, n-1 is the critical value of the t distribution with n-1 d.f. and an area of α/2 in each tail) Chap 8-10 100(1-)% Upper-Confidence Bound for 100(1-)% Lower-Confidence Bound for Student’s t Distribution • T-distriburions are symmetric and bell shaped but have flatter tails than normal • The t value depends on degrees of freedom (d.f.) • As d.f. goes infinity, t-distribution -> N(0,12) Standard Normal (t with df = ∞) t (df = 13) t (df = 5) 0 t Chap 8-12 Table of T-distiribution Example A random sample of n = 25 has the sample mean 50 and the sample variance 8. Form a 95% confidence interval for μ – d.f. = n – 1 = 24, so – The confidence interval is (48.832, 51.168) Chap 8-14 (16.457,17,483) Confidence Intervals for the variance of a normal population Chap 8-16 100(1-)% Confidence Interval for 2 100(1-)% Upper-Confidence Bound for 2 100(1-)% Lower-Confidence Bound for 2 19 Confidence Intervals for the Population Proportion, p σ p̂ p(1 p) n Chap 8-20 100(1-)% Confidence Interval for p 100(1-)% Upper-Confidence Bound for p 100(1-)% Lower-Confidence Bound for p [Example] A random sample of 100 people shows that 25 wear glasses. Form a 95% confidence interval for the true proportion of the population who wear glasses. p̂ Z / 2 p̂(1 p̂)/n 25/100 1.96 0.25(0.75)/100 0.25 1.96 (0.0433) (0.1651 , 0.3349) Note : We are 95% confident that the true percentage of people wearing glasses in the population is between 16.51% and 33.49%. Although the interval from .1651 to .3349 may or may not contain the true proportion, 95% of intervals formed from samples of size 100 in this manner will contain the true proportion. Chap 8-22 4-3 Hypothesis Testing 4-3.1 Statistical Hypotheses • A (statistical) hypothesis is a statement or claim about a population parameter(not about a sample statistic): Ex) The mean electric bill per household of this city is μ = $132. The proportion of adults in this city with full-time jobs is p =0.61. • Hypothesis testing is a procedure leading to a decision about a hypothesis based on a random sample • Null Hypothesis (H0) states the assumption to be tested. A hypothesis testing begins with the assumption that H0 is true • Alternative Hypothesis (H1) is the opposite of the null hypothesis. It is the hypothesis that the researcher is trying to prove. Ex) H0 : The mean age of smart phone users is 28. (H0: μ = 28) H1 : The mean age of smart phone users is not 28. (H1: μ 28) Example- Insight into the Hypothesis Testing Suppose that we are interested in the burning rate of a solid propellant used to power aircrew escape systems. • Suppose that our interest focuses on the mean burning rate (a parameter of the distribution of the burning rate). • If we are interested in deciding whether or not the mean burning rate is 50 centimeters per second: Two-sided Alternative Hypothesis • If we are trying to prove that the mean burning rate is less than 50 centimeters per second. One-sided Alternative Hypotheses H0 : = 50cm/s H1: < 50cm/s Note: If H1: < 50cm/s then we can write the null hypothesis as H0 : = 50cm/s or H0 : 50cm/s . Both expression lead to the same testing procedure and the same decision . 4-3.2 Testing Statistical Hypotheses • Hypothesis-testing procedures rely on using the information in a random sample from the population of interest. • If this information is consistent with the hypothesis, then we will conclude that the hypothesis is true; if this information is inconsistent with the hypothesis, we will conclude that the hypothesis is false. • Sample the population and find sample mean. • Suppose the sample mean age was = 20. • This is significantly lower than the claimed population mean 50. • If the null hypothesis were true, the probability of getting such a different sample mean would be very small, so you reject the null hypothesis . In other words, getting a sample mean of 20 is so unlikely if the population mean was 50, thus you conclude that the population mean must not be 50. Sampling Distribution of X 20 μ = 50 If H0 is true X The Test Statistic and Rejection Region • • • If the sample mean is close to the assumed population mean, the null hypothesis is not rejected. If the sample mean is far from the assumed population mean, the null hypothesis is rejected. How far is “far enough” to reject H0? •Test statistic is a statistic computed from the sample data to make a decision about the hypothesis. ex) sample mean, sample variance, sample proportion etc. Distribution of the test statistic Rejection Region • If the test statistic value falls in the rejection region, we will reject H0. • The boundaries that define the rejection regions are called the critical values. Critical Values Chap 9-28 How to decide the rejection region (critical values)? The critical values are decided by i) the distribution of the test statistic ii) the significance level ( see next page) represents critical value /2 H0: μ = 50 H1: μ ≠ 50 /2 Rejection region is shaded Two-tail test 0 H0: μ ≤ 50 H1: μ > 50 H0: μ ≥ 50 H1: μ < 50 Upper-tail test 0 Lower-tail test 0 Errors in Decision Making • • • The conclusion from a hypothesis testing may be an error since it is based on a random sample (random experiment). Type I Error Rejecting the null hypothesis when it is true. The probability of a Type I Error is called the significance level or size of the test, denoted by . The significance level is usually set by researchers in advance. Type II Error Failing to reject the null hypothesis when it is false. The probability of a Type II Error is denoted by β. 1- β is called the power of the test. Actual Situation Decision H0 True H0 False Do Not No Error Reject H0 Probability 1 - α Type II Error Probability β Reject H0 Type I Error Probability α No Error Probability 1 - β Hypothesis Testing procedure using Rejection Region 1. State the null hypothesis, H0 and the alternative hypothesis, H1 2. Choose the significance level, α. 3. Determine the test statistic to use / Convert Sample Statistic (ex. X) to Test Statistic (ex. Z-statistic ) 4. Find the critical values and determine the rejection region(s) 5. Collect data and compute the test statistic value from the sample result 6. Compare the test statistic to the critical value to determine whether the test statistic falls in the region of rejection. Make the statistical decision: Reject H0 if the test statistic falls in the rejection region. Chap 9-32 4-3.3 P-Values in Hypothesis Testing The p-value is the probability of obtaining a test statistic equal to or more extreme than the observed sample value when H0 is true. • Sometimes referred to as “the observed level of significance” or “Smallest value of for which H0 can be rejected” • The p-value measures the plausibility of the null hypothesis, H0. “The smaller the p-value, the less plausible is the null hypothesis.“ Hypothesis Testing procedure using P-value 1. State the null hypothesis, H0 and the alternative hypothesis, H1 2. Choose the significance level, α. 3. Determine the test statistic to use / Convert Sample Statistic (ex. X) to Test Statistic (ex. Z-statistic ) 4. Collect data and compute the test statistic from the sample result 5. Obtain the p-value from a distribution table of test statistic (or by using Excel, minitab etc) 6. Compare the p-value with • If p-value < , reject H0 • If p-value , do not reject H0 Chap 9-34 Hypothesis Testing on the Mean 4-4 Inference on the Mean of a Population, Variance Known Assumptions 4-4.1 Hypothesis Testing on the Mean, Variance Known Ex: Hypothesis Testing: σ Known, two-sided H0 : μ = μ o H1 : μ ≠ μ o X μo σ/ n • Convert sample statistic ( X ) to test statistic • Determine the critical Z values for a specified level of significance • Decision Rule: If the test statistic falls in the rejection region, reject H0 , otherwise do not reject H0 Z0 /2 /2 μo Reject H0 -Z Lower critical value Do not reject H0 0 X Reject Z +Z H 0 Upper critical value Example To test the claim that the mean weight of chocolate bars manufactured in a factory is 3 ounces, we weighed 100 chocolate bars and the average weight was 2.84. Suppose that, from past records, the standard deviation is known to be 0.8. 1) State the null and alternative hypotheses H0: μ = 3 H1: μ ≠ 3 (two-sided test) 2) Choose the desired level of significance Suppose that =0 .05 is chosen for this test 3) Determine the test statistic σ is known so this is a Z-test Z0 X μ 2.84 3 0 .16 2.0 0.08 σ/ n 0 . 8 / 100 4) Find the critical values and determine the rejection region(s) For = 0.05 , the critical Z-values are ±1.96 Reject H0 if z0 < -1.96 or z0 > 1.96 5) Reach a decision and interpret the result Since z0 = -2.0 < -1.96, you reject the null hypothesis. (That is, there is sufficient evidence that the mean weight of chocolate bars is not equal to 3.) Chap 9-39 Example -revisit To test the claim that the mean weight of chocolate bars manufactured in a factory is 3 ounces, we weighed 100 chocolate bars and the average weight was 2.84. Suppose that, from past records, the standard deviation is known to be 0.8. Test at =0.05 using p-value. X = 2.84 is translated to a Z score Z0 X μ 2.84 3 .16 2.0 .08 σ/ n 0 . 8 / 100 p-value = 2P(Z > lz0l ) =2P(Z>2.0)=2*0.0228=0.0456 p-value = 0.0456 < (= 0.05) Thus, we reject the null hypothesis. /2 = .025 /2 = .025 .0228 .0228 -1.96 -2.0 0 1.96 Z 2.0 Example A phone industry manager thinks that customer monthly cell phone bills have increased, and now average more than $52 per month. Past company records indicate that the standard deviation is about $10. He collect a sample of n=64 and the sample mean was 53.1 Test this claim at = 0.10 1) H0: μ ≤ 52 vs H1: μ > 52 2) Test Statistic Z0 X μ 53.1 52 0.88 σ/ n 10 / 64 3) Rejection Region: Critical Value = 1.28 If Z0>1.28 then reject H0 Reject H0 1- = .90 4) Since Z0=0.88 < 1.28, = .10 we cannot reject H0 5) We cannot say that the mean bill is greater than $52 0 1.28 Z0 = .88 Chap 9-41 P-value method: Let’s calculate the p-value and compare to 53.1 52.0 P(X 53.1) P Z P(Z 0.88) 1 0.8106 0.1894 10/ 64 We do not reject H0 since p-value = 0.1894 > (= .10) p-value = 0.1894 Reject H0 = .10 0 1.28 Z = .88 Chap 9-42 4-5 Inference on the Mean of a Population, Variance Unknown Student’s t Distribution • • • T-distriburions are symmetric and bell shaped but have flatter tails than normal The t value depends on degrees of freedom (d.f.) As d.f. goes infinity, t-distribution -> N(0,12) 4-5.1 Hypothesis Testing on the Mean, Variance Unknown Assumptions Population standard deviation is unknown Population is normally distributed, If population is not normal, use large sample Calculating the P-value Example The mean cost of a hotel room in LA is said to be $168 per night. A random sample of 25 hotels resulted in X = 172.50 and S = 15.40. Test at the = 0.05 level Assuming the data are normally distributed. H0: μ = 168 H1: μ 168 • is unknown, so use a t-statistic X μ 172.50 168 t0 1.46 S 15.40 n 25 • Critical Values: t0.025, 24 = ± 2.0639 • Reject H0 if t0>2.0639 or t0<-2.0639 • Since t0 does not fall in the rejection region, we cannot reject H0 Chap 9-46 Relationship between Tests of Hypotheses and Confidence Intervals The test of significance level of the hypothesis will lead to rejection of H0 The hypothesized value 0 is not in the 100(1 - ) percent confidence interval [l, u]. The test of significance level of the hypothesis < will lead to rejection of H0 The hypothesized value 0 is not in the 100(1 - ) percent confidence interval [-, u]. The test of significance level of the hypothesis > will lead to rejection of H0 The hypothesized value 0 is not in the 100(1 - ) percent confidence interval [l, ]. 4-6 Inference on the Variance of a Normal Population 4-6.1 Hypothesis Testing on the Variance of a Normal Population 4-7 Inference on Population Proportion 4-7.1 Hypothesis Testing on a Binomial Proportion We will consider testing: