Download The Right Questions about Statistics full set handouts

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Categorical variable wikipedia , lookup

Transcript
The Right Questions about Statistics
Maths Learning Centre
The purpose of Statistics is to ANSWER QUESTIONS USING DATA
Know the type of question and you can choose what type of statistics...
Aim: DESCRIBE
Type of question: What's going on?
Examples:
 How many chapters do novels have?
 What possibilities are there for body temperature
after a meal with or without chilli?
 What sort of relationship might the amount of
sleep a student gets have with their grades?
 What sorts of things might be related to whether
a person does volunteer work?
Aim: DECIDE
Type of question: Yes or no?
Examples:
 Is the median number of chapters in a novel 20?
 Is your body temperature higher after a meal if it
has chilli in it?
 Does getting more sleep affect a students’
grades?
 Are women more likely to participate in volunteer
work than men?
Type of Statistics: Descriptive statistics: graphs
and basic numbers
Type of Statistics: Hypothesis tests (p-values)
Aim: ESTIMATE
Type of question: What's this number?
Examples:
Aim: PREDICT / EXPLAIN
Type of question: What's the formula?
Examples:
 What is the median number of chapters in a
novel?
 How much higher is your body temperature after
a chilli meal compared to one without?
 On average, how much of an effect does 30
minutes more sleep have on a students’ grades?
 How much more (or less) likely is a woman to
participate in volunteer work than a man?
Type of Statistics: Confidence intervals



How can I explain a person’s body temperature
after a meal using their temperature before and
the chilli content of the meal?
How can I calculate a student’s grade based on
their number of hours of sleep during semester?
How can I use a person’s gender, age, income and
religion to predict their chances of participating in
volunteer work?
Type of Statistics: Modelling and regression
The purpose of Statistics is to ANSWER QUESTIONS USING DATA
Know more about your data and you can choose what statistical method...
HOW THE DATA IS COLLECTED
 what is done to the subjects?
 when is information recorded?
 how are the subjects chosen?
HOW MUCH DATA
 lots of things recorded per subject?
 lots of subjects?
 missing data?
VARIABLES IN THE DATA
 how to measure?
 what type?
 defining groups or measurements?
 what distribution?
By Dr David Butler © 2012 The University of Adelaide
1
The Right Questions about Statistics
Maths Learning Centre
DATA ENTRY
gender = M
age = 18
gender = M
age = 25
1
gender = F
age = 19
2
chilli = Y
temp = 37
chilli = N
3
temp = 36
chilli = Y
temp = 38
BECOMES...
1
2
3
4
gender
M
M
F
F
age
18
25
19
21
chilli
Y
N
Y
N
temp
37
36
38
35
Statisticians say:
"PLEASE make it consistent!"
TYPES OF VARIABLES (things you record)
Variable
NUMERICAL
Variable
CATEGORICAL
Numerical / Quantitative / Scale (numbers: how far apart has meaning)
o Continuous (measured)
o Discrete (counted)
Categorical / Qualitative (words: how far apart has no meaning)
o Nominal (names: more or less has no meaning)
o Ordinal (ordered: more or less has meaning)
DISTRIBUTIONS OF NUMERICAL VARIABLES (how the possible values are spread out)
 Approximately normal
 Skewed or worse
– parametric tests will be fine
– non-parametric tests might be better
WHAT EXPLANATORY CATEGORICAL VARIABLES DEFINE:
chilli = Y
Independent
Groups
1
chilli = Y
2
temp = 36
temp = 38
BECOMES...
chilli = N
3
chilli = N
4
1
2
3
4
chilli
Y
Y
N
N
temp
38
36
37
36
temp = 36
temp = 37
OR
(chilli = Y)
temp = 38
Repeated Measures
(matched pairs)
1
(chilli = Y)
temp = 36
(chilli = N)
temp = 37
(chilli = N)
temp = 37
(chilli = Y)
(chilli = Y)
temp = 37
3
temp = 37
(chilli = N)
(chilli = N)
temp = 36
temp = 35
By Dr David Butler © 2012 The University of Adelaide
(chilli = Y) (chilli = N)
2
BECOMES...
4
1
2
3
4
temp
temp
38
36
37
37
37
37
36
35
2
The Right Questions about Statistics
Maths Learning Centre
HOW HYPOTHESIS TESTING WORKS
A hypothesis test is designed to DECIDE the answer to a YES OR NO question using DATA.
This is how to do a hypothesis test:
 Have a yes-or-no question.
 Collect data.
 Calculate a test statistic.
 Figure out the distribution if you assume a particular answer.
 Calculate a p-value.
 Decide the answer based on the p-value.
This is what a hypothesis test means:
 It tells you if your data is likely or unlikely given a particular situation (the “null
hypothesis”).
 A low p-value means your data is unlikely and you don’t believe you’re in that
situation.
 A high p-value means your data is likely and you do believe you could be in that
situation.
HOW CONFIDENCE INTERVALS WORK
A confidence interval is designed to give a RANGE of possible answers for a “WHAT’S THE
NUMBER?” question, using DATA from a sample.
This is how to find a confidence interval:
 Have a “what’s the number?” question.
 Collect data.
 Choose a matching hypothesis test.
 Work backwards to calculate two ends.
 The confidence interval is between these two values.
This is what a confidence interval means:
 The values in the CI would be retained with a matching hypothesis test.
 The values in the CI have a high chance of producing data like yours.
 The values in the CI are those you are “happy to believe” based on your data.
By Dr David Butler © 2012 The University of Adelaide
3
The Right Questions about Statistics
Maths Learning Centre
HOW REGRESSION WORKS
Regression is a method designed to create a FORMULA that uses some information to
PREDICT/EXPLAIN an outcome, using DATA.
This is how to perform regression:
 Have a “what’s the formula?” question.
 Collect data.
 Look at the pattern – usually with a scatterplot – to choose a formula.
 Get a computer to calculate the numbers and p-values.
 Check the p-values.
 Choose your final formula.
This is what regression means:
 It tells you a formula for how an outcome varies based on other information.
 It does NOT tell you if some things CAUSE others, only how to calculate them as
accurately as possible.
 The computer output will tell you p-values and confidence intervals to answer other
types of questions.
More details:
 DESCRIBING A RELATIONSHIP:
o Scatterplot describes relationship – and helps choose a good formula
o Correlation coefficient (r) measures how strong a linear relationship is.
Ranges from -1 (perfect negative) to 0 (no relationship) to 1 (perfect positive).
Ignores how steep the slope is, only says how close to a line.
 FINDING AND INTERPRETING THE FORMULA:
o Computer program will use the data to find the numbers that make the formula fit best.
o The coefficient says how much the outcome changes (on average) for a change of 1 in the
explanatory variable.
 LOOKING AT P-VALUES:
o The p-value that goes with the F-statistic in the ANOVA table tells you whether all the
variables at once have a relationship with the outcome.
Low p-value means the relationship is “significant”.
o The p-value for each coefficient tells you whether that explanatory variable appears to have
a relationship with the outcome.
Low p-value means the effect is “significant”.
 LOOKING AT CONFIDENCE INTERVALS:
o The confidence interval that goes with an explanatory variable tells you how large or small
the real effect could be.
NOTE: Regression has assumptions that must be checked in order to use it properly,
especially if you plan to use the p-values and confidence intervals.
By Dr David Butler © 2012 The University of Adelaide
4
The Right Questions about Statistics
Maths Learning Centre
Turning a research question into a statistical question.
ORIGINAL QUESTION:
Concept
Concept
ABOUT ONE
CONCEPT
Concept
ABOUT RELATIONSHIPS
BETWEEN CONCEPTS
TYPE OF QUESTION:
DESCRIBE – what’s going on?
DECIDE – yes or no?
ESTIMATE – what’s this number?
PREDICT/EXPLAIN – what’s the formula?
TYPES OF VARIABLES:
Variable
BECOMES...
Concept
Variable
OR
CATEGORICAL
NUMERICAL
WHAT EXPLANATORY CATEGORICAL VARIABLES DEFINE:
Independent
Groups
Repeated Measures
OR
(matched pairs)
DISTRIBUTION OF OUTCOME NUMERICAL VARIABLE:
OR
OR
Note: This probably doesn’t matter if you have a lot of data.
STATISTICAL QUESTION:
Variable
eg:
Variable
DESCRIBE
NUMERICAL
Variable
eg:
DECIDE
CATEGORICAL
NUMERICAL
Independent Groups
Note: In the list below, the outcome variables are usually assumed to be normal.
By Dr David Butler © 2012 The University of Adelaide
5
The Right Questions about Statistics
Maths Learning Centre
Statistical methods for statistical questions
Variable
NUMERICAL
Variable
CATEGORICAL
Variable
DESCRIBE: Numbers: Mean & standard deviation (
median & IQR)
Graphs: Histogram / Boxplot.
DECIDE: “Is the mean equal to #?” – one sample t-test.
“Is the median equal to #?” – sign test.
ESTIMATE: “What is the mean?” – confidence interval for a mean.
DESCRIBE: Numbers: Table of percentages or proportions.
Graphs: Histogram.
DECIDE: “Is this percentage equal to #?” – z-test for a single proportion.
“Are percentages distributed according to #, #, #?” – chi-squared test for
goodness of fit.
ESTIMATE: “What is this percentage?” – confidence interval for a proportion.
Variable
(2 categories)
CATEGORICAL
NUMERICAL
Independent Groups
Variable
Variable
(2 categories)
CATEGORICAL
NUMERICAL
Repeated Measures
Variable
Variable
(any# categories)
CATEGORICAL
Independent Groups
NUMERICAL
DESCRIBE: Numbers: Means & standard deviations for each group
(
medians & IQRs for each category).
Graphs: Histograms on same scale / side-by-side
boxplots.
DECIDE: “Are the means equal?” – unpaired t-test (
MannWhitney U-test or Wilcoxon rank-sum test).
ESTIMATE: “What is the difference between the means?” –
confidence interval for the difference in means.
DESCRIBE: Numbers: Mean & standard deviation of differences
between measurements.
Graphs: Histogram of the differences between
measurements.
DECIDE: “Is there a mean difference?” – paired t-test
(
Wilcoxon signed ranks test).
ESTIMATE: “What is the mean difference?” – confidence interval for
the mean difference.
DESCRIBE: Numbers: Mean & standard deviation of each group.
Graphs: Histograms/boxplots on the same scale. Line
graph showing mean of each group.
DECIDE: “Are the means equal?” – one-way analysis of variance
ANOVA with post-hoc t-tests (
Kruskal-Wallis test).
ESTIMATE: “What are the differences between means?” – confidence
intervals for each difference in means.
By Dr David Butler © 2012 The University of Adelaide
6
The Right Questions about Statistics
Maths Learning Centre
Statistical methods for statistical questions
Variable
Variable
(any# categories)
CATEGORICAL
NUMERICAL
Repeated Measures
Variable
Variable
(2 categories)
(2 categories)
CATEGORICAL
CATEGORICAL
Independent Groups
Variable
Variable
(2 categories)
(2 categories)
CATEGORICAL
CATEGORICAL
Repeated Measures
Variable
Variable
(any# categories)
(any# categories)
CATEGORICAL
CATEGORICAL
Independent Groups
Variable
Variable
(any# categories)
(2 categories)
CATEGORICAL
CATEGORICAL
Repeated Measures
DESCRIBE: Graphs: Line graph for each subject showing changing
value of variable.
DECIDE: “On average, does the value change for each person
across categories?” – repeated measures ANOVA with
post-hoc paired t-tests / mixed effects regression.
ESTIMATE: “What are the mean differences between categories?” –
confidence intervals for mean differences.
DESCRIBE: Numbers: Two-way table of counts or %s. Odds ratios.
Graphs: Histogram for each explanatory category.
DECIDE: “Is the outcome just as likely for both explanatory
categories?”, “Are the two variables associated?” – chisquared test for independence (small amount of data:
Fisher’s exact test).
ESTIMATE: “How much more likely is the outcome in this category?”–
confidence interval for difference in proportions,
confidence interval for odds ratio.
DESCRIBE: Numbers: Two-way table of counts or %s.
Graphs: Histogram for each explanatory category.
DECIDE: “Is the outcome just as likely for both explanatory
categories?” – McNemar’s test.
ESTIMATE: “How much more likely is the outcome in one category
compared to the other?”– confidence interval for
difference in proportions.
DESCRIBE: Numbers: Two-way table of counts or %.
Graphs: Histogram for each explanatory category.
DECIDE: “Do the percentages in the outcome change across the
explanatory categories?”, “Are the two variables
associated?” – chi-squared test for independence.
DESCRIBE: Numbers: Two-way table of counts or %.
Graphs: Histogram for each explanatory category.
DECIDE: “Do the percentages in the outcome change across the
explanatory categories?”, “Are the two variables
associated?” – Cochrane’s Q-test.
By Dr David Butler © 2012 The University of Adelaide
7
The Right Questions about Statistics
Maths Learning Centre
Statistical methods for statistical questions
Variable
Variable
NUMERICAL
NUMERICAL
Variable
NUMERICAL
Variable
DESCRIBE: Numbers: Correlation coefficient (R)
Graphs: Scatterplot.
DECIDE: “Does a relationship exist?” – linear regression: t-test on
coefficient.
ESTIMATE: “How much does the output variable change when the
explanatory variable changes?” – linear regression:
confidence interval for slope.
PREDICT: “How can you calculate the output knowing the
explanatory variable?” – linear regression formula:
y = β0 + β1 x.
NOTE: May need to do a nonlinear regression if the scatterplot
indicates a curved sort of relationship.
DESCRIBE: Numbers: Mean & standard deviation for each category
of the outcome.
Graphs: Histograms/boxplots on the same scale.
(2 categories)
CATEGORICAL
DECIDE: “Does the numerical variable have an effect on the
chances of the outcome?” – unpaired t-test using the
outcome to define the two groups.
ESTIMATE: “How much does a change in the numerical variable affect
the chances of the outcome?” – logistic regression:
confidence interval for odds ratio.
PREDICT: “How can you calculate the chances of the outcome
knowing the value of the explanatory variable?” – logistic
regression formula: log(odds of y) = β0 + β1 x.
Variable
Time to event
(any# categories)
CATEGORICAL
NUMERICAL
Independent Groups
Possible
missing data!
DESCRIBE: Numbers: Proportion reaching event at certain time (eg 5year survival), median times to reach event.
Graphs: Kaplan-Meier curve showing survival
percentages.
DECIDE: “Is the time to reach the event the same in all groups?” –
survival analysis: log-rank test.
ESTIMATE: “What is the difference in proportions reaching the end
point at this particular time?” – confidence interval for
the difference in proportions.
“How much more at risk of the event is this group than
this group?” – Cox regression: confidence interval for
relative hazard.
By Dr David Butler © 2012 The University of Adelaide
8
The Right Questions about Statistics
Maths Learning Centre
Statistical methods for statistical questions
Variable
NUMERICAL
Variable
Variable
NUMERICAL
NUMERICAL
Variable
NUMERICAL
Variable
Variable
DESCRIBE: Graphs: Scatterplot for each explanatory variable with the
outcome variable.
Numbers: multiple linear regression: R2 value
DECIDE: “Does a relationship exist with any of the variables at all?”
– multiple linear regression: F-test.
“Does a relationship exist with this varable, taking into
account the others?” – multiple linear regression: t-test
on one coefficient.
ESTIMATE: “How much does the output variable change when this
explanatory variable changes?” – multiple linear
regression: confidence interval for one slope.
PREDICT: “How can you calculate the output knowing the
explanatory variables?” – multiple linear regression
formula: y = β0 + β1 x1 + β2 x2.
NOTE: This can be done for many explanatory variables.
DESCRIBE: Graphs: Scatterplot of both numerical variables for each
category.
Numbers: multiple regression: R2 value
DECIDE: See above for multiple regression.
ESTIMATE: See above for multiple regression.
PREDICT: See above for multiple regression.
NUMERICAL
NOTE: This can be done for many explanatory variables of both types.
(any# categories)
CATEGORICAL
Independent Groups
Variable
(any# categories)
CATEGORICAL
Independent Groups
Variable
Variable
NUMERICAL
(any# categories)
CATEGORICAL
Independent Groups
DESCRIBE: Graphs: Histogram for each combination of explanatory
categories. Line graph showing mean of each group.
DECIDE: “Does a relationship exist with any of the variables at all?”
– two-way ANOVA: F-test.
“Does a relationship exist with this varable, taking into
account the others?” – two-way ANOVA: F-test for one
effect.
Note: both can also answered with multiple regression
(see above).
PREDICT: “How can you calculate the output knowing the
explanatory variables?” – multiple linear regression
formula: y = β0 + β1 x1 + β2 x2.
By Dr David Butler © 2012 The University of Adelaide
9
The Right Questions about Statistics
Variable
(any# categories)
CATEGORICAL
Independent Groups
Variable
(any# categories)
CATEGORICAL
Independent Groups
DESCRIBE: Graphs: Histogram for each combination of explanatory
categories.
DECIDE: “Does a relationship exist with any of the variables at all?”
– multiple logistic regression: chi-squared test for
Variable
covariates.
“Does a relationship exist with this varable, taking into
(2 categories)
account the others?” – multiple logistic regression: Wald
CATEGORICAL
test.
ESTIMATE: “How much does the chance of the outcome change when
this explanatory variable changes?” – multiple logistic
regression: confidence interval for odds ratio.
PREDICT: “How can you calculate the chances of the outcome
knowing the explanatory variables?” – multiple logistic
regression formula: log(odds of y) = β0 + β1 x1 + β2 x2.
NOTE: This can be done with many explanatory variables – even if
some of them are numerical.
Variable
NUMERICAL
Variable
Variable
NUMERICAL
(any# categories)
CATEGORICAL
Repeated Measures
Variable
Variable
NUMERICAL
NUMERICAL
Variable
NUMERICAL
Maths Learning Centre
DESCRIBE: Numbers: multiple linear regression: R2 value
DECIDE: “Does a relationship exist with any of the variables at all?”
– mixed effects regression: F-test.
“Does a relationship exist with this varable, taking into
account the others?” – mixed effects linear regression: ttest on one coefficient.
ESTIMATE: “How much does the output variable change when this
explanatory variable changes?” – mixed effects
regression: confidence interval for one coefficient.
PREDICT: “How can you calculate the output knowing the
explanatory variables?” – mixed effects regression
formula.
NOTE: “mixed effects” may also be called “random effects”.
NOTE: This can be done for many explanatory variables, of both types,
and with a mixture of repeated-measures and independentgroups
DECIDE: “Does one variable change the way the other affects the
outcome?”– multiple linear regression: t-test on the
interaction effect.
ESTIMATE: “How much does the second variable change the effect of
the first on the outcome?”– multiple linear regression:
confidence interval for the interaction effect.
PREDICT: “How can you calculate the output knowing the
explanatory variables?” – multiple linear regression
formula: y = β0 + β1 x1 + β2 x2 + β12 x1x2.
By Dr David Butler © 2012 The University of Adelaide
10
The Right Questions about Statistics
Variable
Variable
NUMERICAL
NUMERICAL
Variable
(any# categories)
CATEGORICAL
Independent Groups
Maths Learning Centre
DESCRIBE: Graphs: Scatterplot for each category, showing line of
best fit in each case.
DECIDE: “Does one variable change the way the other affects the
outcome?”– Analysis of Covariance (ANCOVA) / multiple
linear regression: t-test on the interaction effect.
ESTIMATE: “How much does the second variable change the effect of
the first on the outcome?”– multiple linear regression:
confidence interval for the interaction effect.
PREDICT: “How can you calculate the output knowing the
explanatory variables?” – multiple linear regression formula: y
= β0 + β1 x1 + β2 x2 + β12 x1x2.
NOTE: This can be done for many explanatory variables of both types.
ANCOVA refers specifically to the case where the interaction
variable is categorical.
NOTE: There are many other methods dealing with more specific and difficult questions including (but
definitely not limited to):
 “Does this variable affect the variance of the outcome?”
 F-test for two variances
 “Do these variables affect this categorical outcome (which has several categories)?”
 Multinomial regression
 “Does the data come from a normal distribution?”
 Investigate normal quantile-quantile plot; Shapiro-Wilk test
 “To what degree do these two measuring systems agree?”
 Intraclass correlation coefficient
 “What is the best cut-off for this measurement in order to say someone needs medical attention?”
 ROC analysis
 “Do all these measurements vary together so that they could be considered as measuring some
smaller number of underlying concepts?”
 Factor analysis / Principal Component Analysis
 “Can the subjects be grouped into a few similar groups based on the similarity in their
measurements?”
 Cluster analysis
 and so on ...
By Dr David Butler © 2012 The University of Adelaide
11
The Right Questions about Statistics
By Dr David Butler © 2012 The University of Adelaide
Maths Learning Centre
12
The Right Questions about Statistics
By Dr David Butler © 2012 The University of Adelaide
Maths Learning Centre
13
The Right Questions about Statistics
By Dr David Butler © 2012 The University of Adelaide
Maths Learning Centre
14
The Right Questions about Statistics
By Dr David Butler © 2012 The University of Adelaide
Maths Learning Centre
15
The Right Questions about Statistics
By Dr David Butler © 2012 The University of Adelaide
Maths Learning Centre
16
The Right Questions about Statistics
By Dr David Butler © 2012 The University of Adelaide
Maths Learning Centre
17
The Right Questions about Statistics
By Dr David Butler © 2012 The University of Adelaide
Maths Learning Centre
18
The Right Questions about Statistics
By Dr David Butler © 2012 The University of Adelaide
Maths Learning Centre
19
The Right Questions about Statistics
By Dr David Butler © 2012 The University of Adelaide
Maths Learning Centre
20
The Right Questions about Statistics
By Dr David Butler © 2012 The University of Adelaide
Maths Learning Centre
21
The Right Questions about Statistics
By Dr David Butler © 2012 The University of Adelaide
Maths Learning Centre
22
The Right Questions about Statistics
By Dr David Butler © 2012 The University of Adelaide
Maths Learning Centre
23
The Right Questions about Statistics
By Dr David Butler © 2012 The University of Adelaide
Maths Learning Centre
24
The Right Questions about Statistics
By Dr David Butler © 2012 The University of Adelaide
Maths Learning Centre
25
The Right Questions about Statistics
By Dr David Butler © 2012 The University of Adelaide
Maths Learning Centre
26
The Right Questions about Statistics
By Dr David Butler © 2012 The University of Adelaide
Maths Learning Centre
27
The Right Questions about Statistics
By Dr David Butler © 2012 The University of Adelaide
Maths Learning Centre
28
The Right Questions about Statistics
Maths Learning Centre
SAMPLE SIZE CALCULATIONS
FOR HYPOTHESIS TESTS:
The following five things affect the sample size you need:
1. Which hypothesis test you plan to use
Hypothesis test based on categorical outcomes
(as opposed to numerical outcomes)
BIGGER
sample size
Hypothesis test uses independent groups
(as opposed to repeated measures)
BIGGER
sample size
2. Size of the difference you are looking for
Most hypothesis tests concern the differences between means or percentages.
The difference you would like to see is often called:
 Clinically significant difference
 Practically significant difference
Choosing how big this difference is requires KNOWLEDGE OF YOUR AREA OF RESEARCH.
Looking for a
SMALL DIFFERENCE
BIGGER
sample size
3. Variability of the results
HIGH VARIABILITY means many options for what could happen in a sample of a particular size
eg: for the CHI-SQUARED TEST
very high or very low expected percentage  low variability
medium expected percentage  high variability
eg: for t-tests or ANOVA
large standard deviation  high variability
You usually get this information from previous research or a pilot study.
HIGH
VARIABILITY
By Dr David Butler © 2012 The University of Adelaide
BIGGER
sample size
29
The Right Questions about Statistics
Maths Learning Centre
4. Significance level
The cut-off for saying when a p-value is significant. Usually 5%.
Also known as α (alpha) or the “Type I Error rate”.
LOW
SIGNIFICANCE LEVEL
BIGGER
sample size
5. Power
The probability of getting a significant result if in fact there IS a difference in the population.
Usually you set this at 80%.
The opposite of Type II Error rate (also known as β (beta)).
HIGH
POWER
BIGGER
sample size
[ Note that a high dropout rate also increases sample size ]
FOR CONFIDENCE INTERVALS:
Confidence intervals are related to hypothesis tests, so the considerations above are used
for confidence intervals too.
NOTE: Significance level = 100% - Confidence Level
(so for a 95% confidence interval, the significance level is 5%)
NOTE: The “difference you are looking for” is half the width of the confidence interval. Also known
as the “margin of error”.
FOR REGRESSION:
Rule of thumb: at least 10 times as many subjects as there are explanatory variables.
Proper calculations are based on the t-tests involved to see if slope is significant.
X1
Y
Y
X2
X1
X2
At least 2×10 = 20
By Dr David Butler © 2012 The University of Adelaide
X3
X4
X5
At least 5×10 = 50
30
The Right Questions about Statistics
Maths Learning Centre
SOME TERMINOLOGY:
Type I Error:
NO difference in the population
BUT there IS a difference in the sample
(also known as significance level or alpha α)
Type II Error:
There IS a difference in the population
BUT there is NO difference in the sample
(also known as beta β, or the opposite of power)
PERFORMING THE CALCULATIONS :
Russ Lenth’s has created a comprehensive suite of online calculators:
http://homepage.stat.uiowa.edu/~rlenth/Power
You need all the information mentioned above in order to use the calculators.
There are also simple formulas for the t-tests and chi-squared tests in Chapter 36 of
“Medical Statistics at a Glance” by Aviva Petrie and Caroline Sabin
You need all the information mentioned above in order to use the formulas.
By Dr David Butler © 2012 The University of Adelaide
31