Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Tests of Significance Statistical Significance An observed effect so large that it would rarely occur by chance is called statistically significant. The Test of Significance The test of significance asks the question: “Does the statistic result from a real difference from the supposition” or Does the statistic result from just chance variation?” Example I claim that I make 80% of my free throws. To test my claim, you ask me to shoot 20 free throws. I make only 8 out of 20. You respond: “I don’t believe your claim. It is unlikely that an 80% shooter makes only 8 of 20.” Significance Test Procedure Step 1: Define the population and parameter of interest. State null and alternative hypotheses in words and symbols. Population: My free throw shots. Parameter of interest: proportion of made shots. Suppose I am an 80% shooter This is a hypothesis, and we think that it is false. So we’ll call it the null hypothesis, and use the symbol H0. (Pronounced: H-nought) H0: p=.8 You are trying to show that I’m worse than a 80% shooter. Your alternate hypothesis is: Ha: p<80%. Significance Test Procedure Step 2: Choose the appropriate inference procedure. Verify the conditions for using the selected procedure. We are going to use the Binomial Distribution: Each trial has either success or failure. Set number of trials. Trials are independent. Probability of success is constant. Significance Test Procedure Step 3: Calculate the P-value. The P-value is the probability that our sample statistics is that extreme assuming that H0 is true. Look at Ha to calculate “What is the probability of making 8 or fewer shots out of 20?” X is the number of shots made. P(X<8)=.0001017=binomcdf(20,.8,8) Significance Test Procedure Step 4: Interpret the results in the context of the problem. You reject H0 because the probability of being an 80% shooter and making only 8 of 20 shots is extremely low. You conclude that Ha is correct; the true proportion is less than 80%. There are only two possibilities at this step “We reject H0 because the probability is so low. We accept Ha.” “We fail to reject H0 because the probability is not low enough.” Significance Test Procedure 1. 2. 3. Identify the population of interest and the parameter you want to draw conclusions about. State null and alternate hypotheses. Choose the appropriate procedure. Verify the conditions for using the selected procedure. If the conditions are met, carry out the inference procedure. 4. Calculate the test statistic. Find the P-value Interpret your results in the context of the problem Example Diet colas use artificial sweeteners to avoid sugar. These sweeteners gradually lose their sweetness over time. Manufacturers therefore test new colas for loss of sweetness before marketing them. Trained tasters sip the cola along with drinks of standard sweetness and score the cola on a “sweetness score” of 1 to 10. The cola is then stored for a month at high temperature to imitate the effect of four months’ storage. Each taster scores the cola again after storage. What kind of experiment is this? Example Here’s the data: 2.0, .4, .7, 2.0, -.4, 2.2, -1.3, 1.2, 1.1, 2.3 Positive scores indicate a loss of sweetness. Are these data good evidence that the cola lost sweetness in storage? Significance Test Procedure Step 1: Define the population and parameter of interest. State null and alternative hypotheses in words and symbols. Population: Diet cola. Parameter of interest: mean sweetness loss. Suppose there is no sweetness loss (Nothing special going on). H0: µ=0. You are trying to find if there was sweetness loss. Your alternate hypothesis is: Ha: µ>0. Significance Test Procedure Step 2: Choose the appropriate inference procedure. Verify the conditions for using the selected procedure. We are going to use sample mean distribution: Do the samples come from an SRS? We don’t know. Is the population at least ten times the sample size? Yes. Is the population normally distributed or is the sample size at least 25. We don’t know if the population is normally distributed, and the sample is not big enough for CLT to come into play. Significance Test Procedure Step 3: Calculate the test static and the Pvalue. The P-value is the probability that our sample statistics is that extreme assuming that H0 is true. x-bar=1.02, σ=1 Look at Ha to calculate “What is the probability of having a sample mean greater than 1.02?” z=(1.02-0)/(1/root(10))=3.226, P(Z>3.226) =.000619=normalcdf(3.226,1E99) µ=0, Significance Test Procedure Step 4: Interpret the results in the context of the problem. You reject H0 because the probability of having a sample mean of 1.02 is very small. We therefore accept the alternate hypothesis; we think the colas lost sweetness. Assignment Against All Odds Video www.learner.org, Episode 20. Making Sense of Statistical Significance Choosing a level of significance (alpha level) How plausible is H0? Depending on H0 plausibility, you may choose a smaller alpha. If H0 is very plausible, you will need to have collect “more” evidence to reject it. What are the consequences of rejecting H0? If rejecting H0 would costs lots of money possibly cost lives costs jobs then alpha is usually very small Fishing for significance Let’s say were trying to find a connection between eating habits and intelligence. Choose 40 foods, and assign people to increase the amount of the foods they eat, and see if there are any foods that make people smarter. Of the 40 foods, we find that peeps and green beans make you smarter with alpha=.05. Is this a problem? Inference for the Mean of a Population The t statistic The t statistic is used when we don’t know the standard deviation of the population, and instead we use the sample distribution as an estimation. The t statistic has n-1 degrees of freedom (df). x t s/ n The t statistic The t statistic is bigger than the z statistic. We say that t distribution is a more conservative distribution. There is more area in the tails. The t statistic has n-1 degrees of freedom. CI x t * s n The t statistic In statistical tests of significance, we still have H0 and Ha. We need to provide the mu in the calculation of the t statistic. Looking at the t table is fundamentally different than the z table. x t s/ n Example: Mr. Young Mopping Let’s suppose that Mr. Young has been told that he should mop by 25 after 1. We collect 12 samples with an average 27.58 minutes after 1 p.m. with a standard deviation of 3.848 minutes. Is this evidence that his true mean is after 1:25? x t s/ n Step 1: Mr. Young Mopping Population of interest: Mr. Young’s mopping Parameter of interest: average time of arrival during mopping Hypothesis H0: µ=25 Ha: µ>25 x t s/ n Step 2: Mr. Young Mopping We are using 1 sample t-test? SRS? No. Proceed with caution. Normality? Big sample size (> 40) Sample is somewhat normal because the sample distribution is single peaked, no obvious outliers. Population size is at least 10 times the sample size? We assume that Mr. Young has done a lot of mopping x t s/ n Step 3: Mr. Young Mopping Calculate the test statistic, and calculate the p-value. 27.58 25 t 3.848 / 12 2.322 P(t 2.322) is between .025 and .02 Inference for a Population Proportion Parameters vs Statistics Parameters Mean Deviation σ Proportion p Statistics Mean µ Standard x-bar Standard Deviation s Proportion p-hat What we know about inference We are trying to make sense about what is happening at the population level by looking at sample data Step 1: “What is the population and the parameter of interest?” We make assumptions in the form of H0 Step 1: “What is H0?” We need to know about the distribution of the sample statistic Step 2: “Is the distribution of sample means normal?” Our inferential work so far… Has been about the distribution of sample x means and the x distribution of the n difference of sample x1 x2 1 2 means. 12 22 x1 x2 n1 n2 x z n x t s n t x1 x2 s12 s22 n1 n2 But what about proportions? We learned in Chapter 9 about the distribution of sample proportions. pˆ p pˆ pq n But what about proportions? We know that the distribution of sample statistic-parameter Test statistic = proportions is standard dev. of statistic approximately p̂ p z normal when pq these conditions n are met… np>10 nq>10 Simulation A recent study concluded that 25% of all U.S. teenage females have a STD. Simulate sampling 500 randomly chosen teenage females using… randBin(500,.25) Simulate finding the sample proportion by using… randBin(500,.25)/500 Test of significance (This is made up) A recent sample of 500 female teenagers from southeastern Oakland county found the 22% have an STD. Is this strong evidence to suggest that teenage females from SE Oakland county have a lower infection rate than the national average? 1: Population, Parameter of Interest, H0 and Ha 2: Procedure Name & Conditions 3: Calculations 4: Interpret Confidence Intervals CI statistic critical value standard dev. of statistic CI pˆ z * ˆˆ pq n Calculate the Confidence Interval 1: Population & Parameter of Interest 2: Procedure Name & Conditions 3: Calculations 4: Interpret