Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Unit 3 Notes: Statistical Inference Testing In Chapter 15 we learn about 2 Sampling Distributions which are the distribution of all the sample proportions/means coming from samples having the same size n. o the Sampling Distribution of Proportions N(p, (pq/n) ) o the Sampling Distribution of Means N(, /n ) o Both are normally distributed and have similar conditions to check: Randomization, 10% condition, and a large-enough (success/failure) condition (np>=10 and nq>=10 for proportions, less exact for means) o We use the normal model to answer questions about how likely it is for a sample proportion or sample mean to be in a certain range of values o Note: To use the sampling distribution to determine how unusual sample statistics are, we need to know the true population parameters (p / , ) In Chapter 16 we build confidence intervals around 𝑝̂ using the SE(𝑝̂ ) to make inferences about the true population p. o The confidence interval is given by (𝑝̂ – ME, 𝑝̂ + ME) o o o o o 𝑝̂𝑞̂ ME (Margin of error) = z*SE(𝑝̂ ) = z* √ 𝑛 z* is the z-score critical value that “cuts off” the middle of the standard normal curve corresponding to our Confidence Level (e.g. middle 90% for a 90% confidence interval). We find this by sketching the normal curve and then using InvNorm(tail probability) on our calculators. Our confidence intervals tell us that we are 95% (for example) confident that the true population proportion p is within that interval. In this case we have a sample proportion p^ but usually no population proportion. We use the 1-PropZint function on our calculator to do this note you may need to calculate x from 𝑝̂ . Recall that 𝑝̂ =(x)/(n) so x = (𝑝̂ )(n) Assumptions & Conditions: Independence Assumption (check Randomization Condition, 10% Condition… sample should be no more than 10% of population) Sample Size Assumption (check success/failure condition… need at least 10 successes and 10 failures) We also use the following formula to compute the sample size (n) needed in order to give us a certain confidence interval with a certain margin of error: (z*) 2 ( pˆ )(qˆ ) n= (ME) 2 In Chapter 17 we use the One Proportion Z-Test (1-PropZtest) to compute the probability that a given sample proportion comes from a population having a known (or hypothesized) proportion. o In this case we have a hypothetical population proportion and a single sample proportion o We start by writing a null hypothesis and an alternate hypothesis: H0: p = __ HA: p1≠___, p< ___, or p>___ o To use the 1-PropZTest we must check the following Assumptions & Conditions: Independence Assumption (check Randomization Condition, 10% Condition) Sample Size Assumption (check success/failure condition) o The 1-PropZTest gives us a P-value that we test against an alpha level to see if we should reject the null hypothesis or not. The P-value is the probability of observing the sample proportion if the null hypothesis is true. If P < alpha, reject the null hypothesis, otherwise we fail to reject the null hypothesis In Chapter 18 we make inferences about the true population mean given a sample mean using the student’s t-models. o The student’s t-models rely on certain assumptions. Check the following conditions before making a t-interval or doing a t-test. Independence Assumption – check: Randomization Condition 10% condition Normal Population Assumption – check: Nearly Normal Condition (the data comes from a distribution that is unimodal and symmetric) o A One Sample T-interval around the sample mean y(bar) gives a confidence interval for the true population mean. We are 95% (for example) confident that the true population mean is within that interval. Use Stat… Tests… #8 T-Interval in most cases (this is when you don’t know the actual population standard deviation and whether it is normal.) #7 Z-Interval can be used in rare cases. Enter either raw data in L1 or enter sample stats Make sure you have checked the conditions and have nearly normal data o A One Sample T-Test gives us the probability that our sample came from a population having a given (hypothesized) population mean. Stat… Tests… #2 T-test H0: µ0=(hypothesized mean) HA: µ0 is ≠, <, or > the hypothesized mean In Chapter 19 we examine the meaning of the P-value. The P-value is the probability of seeing our sample by chance alone, if the null hypothesis were true. o We also explore how confidence intervals relate to hypothesis tests. A confidence interval corresponds to a two-tailed hypothesis test with = 100 – Clevel. e.g. of .05 corresponds to 95% Confidence. o We also define Type I and Type II error. Type I: we reject H0 when H0 is actually true Type II: we accept H0 when H0 is actually false In Chapter 20 we look at situations where we have 2 samples which we want to compare (i.e. determine if they come from the same or different populations) o First we consider confidence intervals around p^1 – p^2. We note that the sampling distribution of p1 ˆ qˆ p ˆ qˆ p p2 is normally distributed: ˆ1 p ˆ 2, 1 1 2 2 Np and we use the Normal model to build a n1 n2 confidence interval around p^1 – p^2 On our calculator we do this using 2-propZint In this case we know p^1 and p^2 but not p1 or p2. We find the confidence interval for the true difference between the two proportions of the two groups. Assumptions and Conditions: Independence Assumptions: o Randomization Condition: The data in each group should be drawn independently and at random from a homogeneous population or generated by a randomized comparative experiment. o The 10% Condition: If the data are sampled without replacement, the sample should not exceed 10% of the population. o Independent Groups Assumption: The two groups we’re comparing must be independent of each other. Sample size condition: o Success/Failure Condition: Both groups are big enough that at least 10 successes and at least 10 failures have been observed in each. o We also use the Two Proportion Z-Test (2-propZtest) to find the probability that our two sample proportions come from the same population (i.e. are really the same or are really different). H0: p1=p2, HA: p1≠p2, p1<p2, or p1>p2 This test gives us a P-value which we compare to our target alpha level to see if we should reject the null hypothesis or not. o Then we examine the difference between two sample means. We use the student’s t sampling model and estimate the standard error using the data. The confidence interval we build is called a two-sample t-interval (for the difference in means). Stat -> Tests -> 0: 2-SampTInt o (note: Don’t pool) The corresponding hypothesis test is called a two-sample t-test. Hypotheses: o H0: µ1= µ2 o HA: µ1 is ≠, <, or > µ2 Stat -> Tests -> 4: 2-SampTTest o (Don’t pool) Check the following Assumptions and Conditions for both groups. Independence Assumption – check: o Randomization Condition o 10% condition Normal Population Assumption – check: o Nearly Normal Condition (the data comes from a distribution that is unimodal and symmetric) Independent Groups Assumption (the two groups are independent!) In Chapter 21 we look at paired data (for example you have two values like before/after for each participant, or the data is otherwise paired in a natural way). This is an example of a blocked experimental design. o YOU CANNOT USE A 2 SAMPLE T-TEST WITH PAIRED DATA o We examine the pairwise differences. Because it is the differences we care about, we treat them as if they were the data and ignore the original two sets of data. o Check the following Assumptions and Conditions Paired data Assumption: The data must be paired. Independence Assumption: (The differences must be independent of each other.) Randomization Condition 10% Condition Normal Population Assumption: We need to assume that the population of differences follows a Normal model. Nearly Normal Condition: Check this with a histogram of the differences. o A paired t-test is just a one-sample t-test (Stat -> Tests -> #2 T-test) for the mean of the pairwise differences. Hypotheses: H0: d = 0 HA: d ≠, >, or < 0 Enter the differences (L1-L2) in L3 of your calculator and use this as your data for the test. The sample size is the number of pairs o You can find a confidence interval on d by entering the differences (L1L2) in L3 of your calculator and then using Stat-> Tests -> #8 TInterval In Chapter 22 we look at something a little different. The Chi Square model looks at counts of categorical data. There are three related tests (we’ll focus on Goodness-of-Fit and the Test of Independence) o Assumptions and Conditions: For all x2 tests check: Counted Data Condition: The data must be counts. Independence Assumption: (The counts in the cells should be independent of each other). check: Randomization Condition: The individuals who have been counted and whose counts are available for analysis should be a random sample from some population. Sample Size Assumption: (We must have enough data for the methods to work). check: Expected Cell Frequency Condition: The expected count in each cell must be at least 5 o The x2 Goodness-of-fit test compares the observed distribution of a single categorical variable to an expected distribution based on theory or model. Hypotheses: H0: The categorical counts are distributed according to the given model (which is…) HA: The categorical counts are not distributed according to the model Using your calculator: Put your observed data in L1 and your expected (model) data in L2 TI-84+: Stat Tests x2GOFtest TI-83: o use L3 to store (L1-L2)2/L2 o x2 = sum(L3) (access sum through 2nd stat -> MATH) o To find the p-value: DISTR -> 7: x2cdf( ans, 1E99, d.f.) o d.f. (degrees of freedom) is (1- # categories) Note: the x2 test statistic is computed as a sum of “components” – each component is given by (observed – expected)2 / expected o Tests of homogeneity compare the distribution of several groups for the same categorical variable. Hypotheses: H0: The distribution of ______ are the same for each group (identify groups) HA: The distribution of ______ are not the same for each group Compute a test of homogeneity in the same way you compute a X2 Test of Independence 2 o The x Test of Independence examines counts from a single group for evidence of an association between two categorical variables. Hypotheses: H0: _______ and _______ are independent HA: _______ and _______ are not independent Using the calculator: MATRX (which is 2nd x -1 ) -> EDIT highlight [A] and hit enter Adjust the size of matrix A (rows X columns) and fill it with the table values (leave out the totals) STAT -> Tests -> C: X2-Test o Observed: [A] Expected: [B] Calculate