Download Unit 3 Notes: Statistical Inference Testing In Chapter 15 we learn

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Statistical inference wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Unit 3 Notes: Statistical Inference Testing
In Chapter 15 we learn about 2 Sampling Distributions which are the distribution
of all the sample proportions/means coming from samples having the same size n.
o the Sampling Distribution of Proportions
 N(p, (pq/n) )
o the Sampling Distribution of Means
 N(, /n )
o Both are normally distributed and have similar conditions to check:
 Randomization, 10% condition, and a large-enough
(success/failure) condition (np>=10 and nq>=10 for proportions,
less exact for means)
o We use the normal model to answer questions about how likely it is for a
sample proportion or sample mean to be in a certain range of values
o Note: To use the sampling distribution to determine how unusual sample
statistics are, we need to know the true population parameters (p / , )
In Chapter 16 we build confidence intervals around 𝑝̂ using the SE(𝑝̂ ) to make
inferences about the true population p.
o The confidence interval is given by (𝑝̂ – ME, 𝑝̂ + ME)
ME (Margin of error) = z*SE(𝑝̂ ) = z* √ 𝑛
z* is the z-score critical value that “cuts off” the middle of the
standard normal curve corresponding to our Confidence Level (e.g.
middle 90% for a 90% confidence interval). We find this by
sketching the normal curve and then using InvNorm(tail
probability) on our calculators.
Our confidence intervals tell us that we are 95% (for example) confident
that the true population proportion p is within that interval.
In this case we have a sample proportion p^ but usually no population
We use the 1-PropZint function on our calculator to do this
 note you may need to calculate x from 𝑝̂ .
 Recall that 𝑝̂ =(x)/(n) so x = (𝑝̂ )(n)
Assumptions & Conditions:
 Independence Assumption (check Randomization Condition, 10%
Condition… sample should be no more than 10% of population)
 Sample Size Assumption (check success/failure condition… need at
least 10 successes and 10 failures)
We also use the following formula to compute the sample size (n) needed
in order to give us a certain confidence interval with a certain margin of
(z*) 2 ( pˆ )(qˆ )
 n=
(ME) 2
In Chapter 17 we use the One Proportion Z-Test (1-PropZtest) to compute the
probability that a given sample proportion comes from a population having a
known (or hypothesized) proportion.
o In this case we have a hypothetical population proportion and a single
sample proportion
o We start by writing a null hypothesis and an alternate hypothesis:
 H0: p = __
 HA: p1≠___, p< ___, or p>___
o To use the 1-PropZTest we must check the following Assumptions &
 Independence Assumption (check Randomization Condition, 10%
 Sample Size Assumption (check success/failure condition)
o The 1-PropZTest gives us a P-value that we test against an alpha level to
see if we should reject the null hypothesis or not. The P-value is the
probability of observing the sample proportion if the null hypothesis is true.
 If P < alpha, reject the null hypothesis, otherwise we fail to reject
the null hypothesis
In Chapter 18 we make inferences about the true population mean given a sample
mean using the student’s t-models.
o The student’s t-models rely on certain assumptions. Check the following
conditions before making a t-interval or doing a t-test.
 Independence Assumption – check:
 Randomization Condition
 10% condition
 Normal Population Assumption – check:
 Nearly Normal Condition (the data comes from a
distribution that is unimodal and symmetric)
o A One Sample T-interval around the sample mean y(bar) gives a
confidence interval for the true population mean. We are 95% (for
example) confident that the true population mean is within that interval.
 Use Stat… Tests… #8 T-Interval in most cases (this is when you
don’t know the actual population standard deviation and whether it
is normal.) #7 Z-Interval can be used in rare cases.
 Enter either raw data in L1 or enter sample stats
 Make sure you have checked the conditions and have nearly
normal data
o A One Sample T-Test gives us the probability that our sample came from a
population having a given (hypothesized) population mean. Stat… Tests…
#2 T-test
 H0: µ0=(hypothesized mean)
 HA: µ0 is ≠, <, or > the hypothesized mean
In Chapter 19 we examine the meaning of the P-value. The P-value is the
probability of seeing our sample by chance alone, if the null hypothesis were true.
o We also explore how confidence intervals relate to hypothesis tests. A
confidence interval corresponds to a two-tailed hypothesis test with  =
100 – Clevel. e.g.  of .05 corresponds to 95% Confidence.
o We also define Type I and Type II error.
 Type I: we reject H0 when H0 is actually true
 Type II: we accept H0 when H0 is actually false
In Chapter 20 we look at situations where we have 2 samples which we want to
compare (i.e. determine if they come from the same or different populations)
o First we consider confidence intervals around p^1 – p^2.
 We note that the sampling distribution of p1
ˆ qˆ p
ˆ qˆ 
p2 is normally distributed:
ˆ1  p
ˆ 2, 1 1  2 2 
and we use the Normal model to build a
n2 
confidence interval around p^1 – p^2
 On our calculator we do this using 2-propZint
 In this case we know p^1 and p^2 but not p1 or p2. We find
the confidence interval for
 the true difference between the
two proportions of the two groups.
 Assumptions and Conditions:
 Independence Assumptions:
o Randomization Condition: The data in each group
should be drawn independently and at random from
a homogeneous population or generated by a
randomized comparative experiment.
o The 10% Condition: If the data are sampled without
replacement, the sample should not exceed 10% of
the population.
o Independent Groups Assumption: The two groups
we’re comparing must be independent of each other.
 Sample size condition:
o Success/Failure Condition: Both groups are big
enough that at least 10 successes and at least 10
failures have been observed in each.
o We also use the Two Proportion Z-Test (2-propZtest) to find the
probability that our two sample proportions come from the same population
(i.e. are really the same or are really different).
 H0: p1=p2,
 HA: p1≠p2, p1<p2, or p1>p2
 This test gives us a P-value which we compare to our target alpha
level to see if we should reject the null hypothesis or not.
o Then we examine the difference between two sample means. We use the
student’s t sampling model and estimate the standard error using the data.
 The confidence interval we build is called a two-sample t-interval
(for the difference in means).
 Stat -> Tests -> 0: 2-SampTInt
o (note: Don’t pool)
The corresponding hypothesis test is called a two-sample t-test.
 Hypotheses:
o H0: µ1= µ2
o HA: µ1 is ≠, <, or > µ2
 Stat -> Tests -> 4: 2-SampTTest
o (Don’t pool)
Check the following Assumptions and Conditions for both groups.
 Independence Assumption – check:
o Randomization Condition
o 10% condition
 Normal Population Assumption – check:
o Nearly Normal Condition (the data comes from a
distribution that is unimodal and symmetric)
 Independent Groups Assumption (the two groups are
In Chapter 21 we look at paired data (for example you have two values like
before/after for each participant, or the data is otherwise paired in a natural way).
This is an example of a blocked experimental design.
o We examine the pairwise differences.
 Because it is the differences we care about, we treat them as if they
were the data and ignore the original two sets of data.
o Check the following Assumptions and Conditions
 Paired data Assumption: The data must be paired.
 Independence Assumption: (The differences must be independent of
each other.)
 Randomization Condition
 10% Condition
 Normal Population Assumption: We need to assume that the
population of differences follows a Normal model.
 Nearly Normal Condition: Check this with a histogram of
the differences.
o A paired t-test is just a one-sample t-test (Stat -> Tests -> #2 T-test) for
the mean of the pairwise differences.
 Hypotheses:
 H0: d = 0
 HA: d ≠, >, or < 0
 Enter the differences (L1-L2) in L3 of your calculator and use this
as your data for the test.
 The sample size is the number of pairs
o You can find a confidence interval on d by entering the differences (L1L2) in L3 of your calculator and then using Stat-> Tests -> #8 TInterval
In Chapter 22 we look at something a little different. The Chi Square model
looks at counts of categorical data. There are three related tests (we’ll focus on
Goodness-of-Fit and the Test of Independence)
o Assumptions and Conditions: For all x2 tests check:
 Counted Data Condition: The data must be counts.
 Independence Assumption: (The counts in the cells should be
independent of each other). check:
 Randomization Condition: The individuals who have been
counted and whose counts are available for analysis should
be a random sample from some population.
 Sample Size Assumption: (We must have enough data for the
methods to work). check:
 Expected Cell Frequency Condition: The expected count in
each cell must be at least 5
o The x2 Goodness-of-fit test compares the observed distribution of a single
categorical variable to an expected distribution based on theory or model.
 Hypotheses:
 H0: The categorical counts are distributed according to the
given model (which is…)
 HA: The categorical counts are not distributed according to
the model
 Using your calculator:
 Put your observed data in L1 and your expected (model) data
in L2
 TI-84+: Stat Tests  x2GOFtest
 TI-83:
o use L3 to store (L1-L2)2/L2
o x2 = sum(L3) (access sum through 2nd stat ->
o To find the p-value: DISTR -> 7: x2cdf( ans, 1E99,
o d.f. (degrees of freedom) is (1- # categories)
 Note: the x2 test statistic is computed as a sum of “components” –
each component is given by (observed – expected)2 / expected
o Tests of homogeneity compare the distribution of several groups for the
same categorical variable.
 Hypotheses:
 H0: The distribution of ______ are the same for each group
(identify groups)
 HA: The distribution of ______ are not the same for each
 Compute a test of homogeneity in the same way you compute a X2
Test of Independence
o The x Test of Independence examines counts from a single group for
evidence of an association between two categorical variables.
 Hypotheses:
 H0: _______ and _______ are independent
 HA: _______ and _______ are not independent
Using the calculator:
 MATRX (which is 2nd x -1 ) -> EDIT highlight [A] and hit
 Adjust the size of matrix A (rows X columns) and fill it with
the table values (leave out the totals)
 STAT -> Tests -> C: X2-Test
o Observed: [A] Expected: [B] Calculate