Download Unit-IV - RGMCET

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Unit-IV
t,F and Ψ2 - tests
Making inferences from experimental
sample to population using statistical
tests
Sample to Population
Testing Hypotheses
• t,F, and Ψ2 tests mathematically compare
the distribution of an
experimental sample –
i.e. the mean and
standard deviation of
your results
to
a normal distribution whose
parameters represent some
hypothesised feature of the
population, which you think
your results support
• How does this work? (without going through
the derivation of the equations…!)
• …CENTRAL LIMIT THEOREM
t-tests:
Testing Hypotheses About Means
• Formula:
( x  ) √n
t
s
x  sample mean
  population mean
s
x
x
 sample standard deviation
n = size of sample
• For a z-test you need to know the population mean and s.d. Often you don’t know the s.d. of the hypothesised or
comparison population, and so you use a t-test. This uses the sample s.d. instead.
•This introduces a source of error, which decreases as your sample size increases
•Therefore, the t statistic is distributed differently depending on the size of the sample, like a family of normal curves.
The degrees of freedom (d.f. = sample size – 1) represents which of these curves you are relating your t-value to. There
are different tables of p-values for different degrees of freedom.
 larger sample = more ‘squashed’ t-statistic distribution = easier to get significance
Kinds of t-tests (formula is slightly different for these different kinds):
• Single-sample: tests whether a sample mean is significantly different from 0
• Independent-samples: tests the relationship between two independent populations
• Paired-samples: tests the relationship between two linked populations, for example means obtained in two
conditions by a single group of participants
t-tests:
Worked Example of Single Sample t-test
• We know that finger tapping speed in normal population:
– Mean=100ms per tap
• Finger tapping speed in 8 subjects with caffeine addiction:
– Mean = 89.4ms
– Standard deviation = 20ms
• Does this prove that caffeine addiction has an effect on tapping speed?
• Null Hypothesis H0: tapping speed not faster after caffeine
• Preselected significance level was 0.05
• Calculate from t value, for ex. T(7)= √8 (89.4 -100) = -1.5
20
• Find area below t(7) = -1.5, get 0.07: i.e. 7% of the time we would expect a score
as low as this
• This value is above 0.05 => We could NOT reject H0!
• We can’t conclude that caffeine addiction has an effect on tapping speed
F-test
ANOVA = analysis of variance
involves calculating an F value whose significance is tested (similarly to a z or t value)
•
•
•
•
Like t-tests, F-tests deal with differences between or among sample means, but with any
number of means (each mean corresponding to a ‘factor’)
Q/ do k means differ?
A/ yes, if the F value is significant
Q/ how do the k factors influence each other?
A/ look at the interaction effects
ANOVA calculates F values by comparing the variability between two conditions with the
variability within each condition (this is what the formula does)
– e.g. we give a drug that we believe will improve memory to a group of people and give a
placebo to another group. We then take dependent measures of their memory
performance, e.g. mean number of words recalled from memorised lists.
– An ANOVA compares the variability that we observe between the two conditions to the
variability observed within each condition. Variability is measured as the sum of the
difference of each score from the mean.
– Thus, when the variability that we predict (between the two groups) is much greater than
the variability we don't predict (within each group) then we will conclude that our
treatments produce different results.
F-tests / ANOVAs:
What are they?
•
ANOVA calculates an F value, which has a distribution related to the sample size and number of
conditions (degrees of freedom)
F  MS
MS
factors
•
error
The formula compares the variance between and within conditions or ‘factors’ as discussed above
– we won’t worry about the derivation! (n.b. MS = mean squares)
• If the F statistic is significant, this tells us that the means of the factors differ significantly
=> are not likely to have come from the same ‘population’ = our variable is having an effect
•
•
When can we use ANOVAs?
The formula is based on a model of what contributes to the value of any particular data point, and
how the variance in the data is composed. This model makes a number of assumptions that must
be met in order to allow us to use ANOVA
– homogeneity of variance
– normality
– independence of observations
•
Remember: when you get a significant F value, this just tells you that there is a significant
difference somewhere between the means of the factors in the ANOVA. Therefore, you often need
to do planned or post-hoc comparisons in order to test more specific hypotheses and probe
interaction effects
ANOVAs:
Worked Example
•
•
Testing Differences between independent sample means: Following rTMS over the
Right Parietal cortex, are the incorrectly cued trials in a cued RT task slowed down
compared to the correctly cued trials?
“Repeated measures” ANOVA:
– 1 group of 14 healthy volunteers
– Perform 100 trials pre- and 100 trials post- stimulation
– Real vs Sham rTMS on two separate days
•
Within-session factors:
– Correct vs Incorrect trials
– Pre vs Post
•
Between-session factors:
– Real vs Sham rTMS
•
•
Null Hypothesis H0: there is no difference in the RTs of incorrectly cued trials
Many possibilities if H0 is rejected:
– All means are different from each other: meanICpreR vs. meanICpostR vs. meanICpreS vs.
meanICpostS
– Means in the Real condition are different from means in the Sham
– Interaction of means might be different (pre_post in Real diff. pre_post in Sham)
References
• http://obelia.jde.aca.mmu.ac.uk/rd/arsham/o
pre330.htm#ranova
• ‘Statistical Methods for Psychology’ (2001), by
David Howell
• SPM website:
http://www.fil.ion.ucl.ac.uk/spm/
Tutorial: Chi-Square
Distribution
Presented by: Nikki Natividad
Course: BIOL 5081 - Biostatistics
Purpose
• To measure discontinuous categorical/binned data in which a
number of subjects fall into categories
• We want to compare our observed data to what we expect to
see. Due to chance? Due to association?
• When can we use the Chi-Square Test?
– Testing outcome of Mendelian Crosses, Testing Independence – Is one
factor associated with another?, Testing a population for expected
proportions
Assumptions:
•
•
•
•
•
•
1 or more categories
Independent observations
A sample size of at least 10
Random sampling
All observations must be used
For the test to be accurate, the expected frequency
should be at least 5
Conducting Chi-Square Analysis
1) Make a hypothesis based on your basic biological
question
2) Determine the expected frequencies
3) Create a table with observed frequencies, expected
frequencies, and chi-square values using the formula:
(O-E)2
E
4) Find the degrees of freedom: (c-1)(r-1)
5) Find the chi-square statistic in the Chi-Square
Distribution table
6) If chi-square statistic > your calculated chi-square value,
you do not reject your null hypothesis and vice versa.
Example 1: Testing for Proportions
HO: Horned lizards eat equal amounts of leaf cutter, carpenter and black ants.
HA: Horned lizards eat more amounts of one species of ants than the others.
Leaf Cutter
Ants
Carpenter
Ants
Black Ants
Total
Observed
25
18
17
60
Expected
20
20
20
60
O-E
5
-2
-3
0
(O-E)2
E
1.25
0.2
0.45
χ2 = 1.90
χ2 = Sum of all: (O-E)2
E
Calculate degrees of freedom: (c-1)(r-1) = 3-1 = 2
Under a critical value of your choice (e.g. α = 0.05 or 95% confidence),
look up Chi-square statistic on a Chi-square distribution table.
Example 1: Testing for Proportions
χ2α=0.05 = 5.991
Example 1: Testing for Proportions
Leaf Cutter
Ants
Carpenter
Ants
Black Ants
Total
Observed
25
18
17
60
Expected
20
20
20
60
O-E
5
-2
-3
0
(O-E)2
E
1.25
0.2
0.45
χ2 = 1.90
Chi-square statistic: χ2 = 5.991
Our calculated value: χ2 = 1.90
*If chi-square statistic > your calculated value, then you do not reject your
null hypothesis. There is a significant difference that is not due to chance.
5.991 > 1.90 ∴ We do not reject our null hypothesis.
SAS: Example 1
Included to format
the table
Define your data
Indicate what your
want in your
output
SAS: Example 1
SAS: What does the p-value mean?
“The exact p-value for a nondirectional test is the sum of
probabilities for the table having a test statistic greater
than or equal to the value of the observed test statistic.”
High p-value: High probability that test statistic >
observed test statistic. Do not reject null hypothesis.
Low p-value: Low probability that test statistic > observed
test statistic. Reject null hypothesis.
SAS: Example 1
High probability that
Chi-Square statistic > our
calculated chi-square
statistic.
We do not reject our null
hypothesis.
SAS: Example 1
Example 2: Testing Association
c
HO: Gender and eye colour are not
associated with each other.
HA: Gender and eye colour are
associated
each how
other.much each cell
cellchi2 =with
displays
contributes to the overall chi-squared value
no col = do not display totals of column
no row = do not display totals of rows
chi sq = display chi square statistics
Example 2: More SAS Examples
Example 2: More SAS Examples
(2-1)(3-1) = 1*2 = 2
High probability that
Chi-Square statistic > our
calculated chi-square statistic.
(78.25%)
We do not reject our null
hypothesis.
Example 2: More SAS Examples
If there was an
association, can check
which interactions
describe association
by looking at how
much each cell
contributes to the
overall Chi-square
value.
Limitations
• No categories should be less than 1
• No more than 1/5 of the expected categories should be less than
5
– To correct for this, can collect larger samples or combine your
data for the smaller expected categories until their combined
value is 5 or more
• Yates Correction*
– When there is only 1 degree of freedom, regular chi-test
should not be used
– Apply the Yates correction by subtracting 0.5 from the
absolute value of each calculated O-E term, then continue as
usual with the new corrected values
What do these mean?
Likelihood Ratio Chi Square
Continuity-Adjusted Chi-Square Test
Mantel-Haenszel Chi-Square Test
QMH = (n-1)r2
 r2 is the Pearson correlation coefficient (which also
measures the linear association between row and
column)
◦ http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/
default/viewer.htm#procstat_freq_a0000000659.htm
 Tests alternative hypothesis that there is a linear
association between the row and column variable
 Follows a Chi-square distribution with 1 degree of
freedom
Phi Coefficient
Contigency Coefficient
Cramer’s V
Yates & 2 x 2 Contingency Tables
HO: Heart Disease is not associated with cholesterol levels.
HA: Heart Disease is more likely in patients with a high cholesterol diet.
High
Cholesterol
Low Cholesterol
Total
Heart Disease
15
7
22
Expected
12.65
9.35
22
Chi-Square
0.44
0.59
1.03
No Heart Disease
8
10
18
Expected
10.35
7.65
18
Chi-Square
0.53
0.72
1.25
TOTAL
23
17
40
Chi-Square Total
Calculate degrees of freedom: (c-1)(r-1) = 1*1 = 1
We need to use the YATES CORRECTION
2.28
Yates & 2 x 2 Contingency Tables
HO: Heart Disease is not associated with cholesterol levels.
HA: Heart Disease is more likely in patients with a high cholesterol diet.
High
Cholesterol
Low Cholesterol
Total
Heart Disease
15
7
22
Expected
12.65
9.35
22
Chi-Square
0.27
0.37
0.64
No Heart Disease
8
Expected
10.35
Chi-Square
0.33
TOTAL
23
Chi-Square Total
10
(|15-12.65| - 0.5)2
12.65
7.65
=0.45
0.27
17
18
18
0.78
40
1.42
Example 1: Testing for Proportions
χ2α=0.05 = 3.841
Yates & 2 x 2 Contingency Tables
HO: Heart Disease is not associated with cholesterol levels.
HA: Heart Disease is more likely in patients with a high cholesterol diet.
High
Cholesterol
Low Cholesterol
Total
Heart Disease
15
7
22
Expected
12.65
9.35
22
Chi-Square
0.27
0.37
0.64
No Heart Disease
8
10
18
Expected
10.35
7.65
18
Chi-Square
0.33
0.45
0.78
TOTAL
23
17
40
Chi-Square Total
3.841 > 1.42 ∴ We do not reject our null hypothesis.
1.42
Fisher’s Exact Test
 Left: Use when the alternative to independence is negative
association between the variables. These observations tend
to lie in lower left and upper right cells of the table. Small pvalue = Likely negative association.
 Right: Use this one-sided test when the alternative to
independence is positive association between the variables.
These observations tend to lie in upper left and lower right
cells or the table. Small p-value = Likely positive association.
 Two-Tail: Use this when there is no prior alternative.
Yates & 2 x 2 Contingency Tables
Yates & 2 x 2 Contingency Tables
HO: Heart Disease is not
associated with cholesterol
levels.
HA: Heart Disease is more
likely in patients with a high
cholesterol diet.
Conclusion
• The Chi-square test is important in testing the association
between variables and/or checking if one’s expected
proportions meet the reality of one’s experiment
• There are multiple chi-square tests, each catered to a specific
sample size, degrees of freedom, and number of categories
• We can use SAS to conduct Chi-square tests on our data by
utilizing the command proc freq
References
Chi-Square Test Descriptions:
http://www.enviroliteracy.org/pdf/materials/1210.pdf
http://129.123.92.202/biol1020/Statistics/Appendix%206%2
0%20The%20Chi-Square%20TEst.pdf
Ozdemir T and Eyduran E. 2005. Comparison of chi-square and
likelihood ratio chi-square tests: power of test. Journal of Applied
Sciences Research. 1(2):242-244.
SAS Support website: http://www.sas.com/index.html
“FREQ procedure”
YouTube Chi-square SAS Tutorial (user: mbate001):
http://www.youtube.com/watch?v=ACbQ8FJTq7k