Download bme stats workshop

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Gibbs sampling wikipedia , lookup

Statistical inference wikipedia , lookup

Categorical variable wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
BME STATS WORKSHOP
Introduction to Statistics
Part 1 of workshop
The way to think about inferential
statistics
• They are tools that allow us to make black
and white statements even though the
data does not clearly provide answers.
– This is to say that we will use probabilities
which speak of shades of grey but will make
statements with respect to rejecting or failing
to reject some null hypothesis.
Inferencing from data analysis
• As scientists we have the unique privilege of using
ingenious tools and methods that help us make informed
decisions.
• One of those tools is statistical analysis. It allows us to
more accurately determine the reality of our data.
• This workshop should help you make better conclusions
on your data by using simple but effective statistical tools
to cut through the levels of grey often encountered in
research.
The Essence of Inferential
Statistics
1. We compare a statistic obtained from acquired
data to a theoretical distribution of that statistic.
Thus, relativity is important in statistics.
•
You will surely have conducted t-tests in the past to
compare measures from a control with an
experimental group.
•
That t value is evaluated against a distribution of ts.
•
In statistics, size does mater. Large t values
increase the likelihood of the investigator stating
that he has significant results.
Essence con’d
2. Signal to noise ratio.
•
Most statistics used in this workshop such as the
t statistic are made up of differences due to
treatment and differences due to individuals (also
called error). Error is simply random variation.
t
X1  X 2
standard error of the difference
Essence Con’d
3. Rare events
•
This is related directly to point one.
•
In order for treatment to be successful, the obtained
statistic has to be sufficiently rare.
•
We will find out that large statistical values are
considered rare.
•
For a better understanding of these points we will
describe a Monte Carlo experiment.
The Plan!
1. Constructing a distribution.
2. How to apply a statistic obtained from an
experimental.
3. Interpretation of a result.
4. What does a significant result mean?
Constructing a Distribution:
Some Definitions
• Sample distribution:
– A distribution of values from some measurement.
• This measurement can be of anything, such as height, weight
or age to name a few.
• Sampling distribution:
– A distribution of a statistic obtained from a sample
distribution.
• This statistic can be a mean, mode, median, variance or
anything else that is a calculation from individual measures.
• As we will see, the t statistic can be used to construct a
sampling distribution.
Distributions
• Sample distributions are often bell shaped or
normal but this is guaranteed. On occasion
exponential, rectangular or odd shaped
distributions are observed.
• Sampling distributions on the other hand are
almost always normally shaped. This is true
even is the measurements used to calculate a
statistic are from non-normal distributions.
How to construct a sampling distribution of
the t statistic. An example under the null
hypothesis of equal means
• We first have to have a sample distribution of some measure from
some population with specific parameters such as 25 year old
women. The measurement of interest could be height.
• We then randomly sample from this distribution to make up two
groups of individuals of a specified sample size.
– Ex. Two groups of ten individual.
• From these two groups a t value is calculated. This t value is then
plotted. After this calculation, the individuals are returned to the
sample distribution.
• The process of “sampling” with replacement is repeated as many
times as possible. Using computers you might opt for 1000 or more
samplings. Thus, you would have a sampling distribution of 1000 ts.
How to use a sampling distribution
of ts
• In any sampling distribution there are a number of values that are
extreme. This is normal and we will use this concept to make
decisions about our experiments.
• Traditionally, we determine the t value at which point all values
greater make up 5% of all values in that distribution. If we are
concerned about both tails of that distribution we will find the value
at which point all values greater make up 2.5% of all values on the
positive tail and 2.5% on the negative tail.
How to use con’d.
• We then conduct an experiment in which
we have a control and an experimental
group.
• We calculate a t statistic from this
experiment.
• This t value is evaluated against the
sampling distribution of ts we have
constructed.
• If our obtained value is greater than the
value from the distribution that marks the
5% cutoff we end up stating that the
experiment produced a significant result. In
other words the control was significantly
different from the experimental group.
Sig.
Not Sig.
Sig.
Some specifics about using a t
distribution. What does stating
significance really mean?
• First of all when find a t value that is outside of
the critical values in a distribution we should
really start by saying, “the obtained value is rare
if when calculated from two groups obtained
from the same population.”
• We would then follow up that statement with,
“Since that value is rare and is obtained from an
experiment, it is reasonable to conclude that the
groups do not come from the same population.”
– This is indeed saying that the treatment was effective.
Thus, we have a significant result.
Monte Carlo
How will building a distribution
help us understand statistics
Monte Carlo
Building a t distribution
Distributions: ts
How do you build
distributions of a statistic? In
this case t. 1) You start with a
population of interest. 2)
Calculate means from two
samples with a specific
number of individuals. 3)
Calculate the t statistic using
those two samples. 4) Do this
again and again. Possibly
1000 times or more.
Remember that these
distributions are built under
the null hypothesis.
Population
n1=xx
n2=xx
x1
x2
Calculate t
Repeat the process as often as you can.
Family of ts
The larger the sample size used the less variability in the results. As we
can see here, the greater the degrees of freedom (df) the less extreme are
the obtain values resulting in a tighter distribution.
Note: Degrees of freedom when using the t-test are calculated at n1+n2-2.
Thus, for a sample size of 10 per group the dfs are 18.
Theoretical
Distribution of ts.
We use this table to
determine the
critical values. The
computer uses the
density functions.
Variables
• Independent variable:
– That variable you manipulate.
• Subjects are allocated to groups
• Dependent variable
– That variable which depends on the
manipulation.
• Measures such as weight or height or some other
variable that varies depending on treatment
Cause and effect
• Cause can only be inferred when subjects
are randomly allocated to groups.
– Random allocation ensures that all
characteristics are evenly distributed across
all groups.
• This way, differences between groups cannot be
due to biases in the subject selection, a very
important element of experimental design.
An example of data analysis
Comparing Reaction time Following
Alcohol Consumption.
• University males were recruited to participate in an
experiment in which they consumed a specific amount of
alcohol.
• The males were randomly separated into two groups.
One group consumed the alcohol and the other some
non-alcoholic drink.
• Ten minutes after the second drink was consumed the
subjects were asked to push a button on a box the
moment they heard a buzzer.
• When the button was pushed the buzzer stopped. The
investigator recorded the amount of time the buzzer
sounded in milliseconds.
Hypotheses
We state hypotheses in terms of populations.
This is to say that we are making statements on
what we think exists in the real world.
From our sample we will reject or fail to reject the null
hypothesis. Here we have a situation in which we
are predicting differences only. This is a nondirectional hypothesis.
H0: c = a
H1: c ≠ a
The data (Time in ms)
• Control
•
•
•
•
•
•
150
110
200
135
90
111
Alcohol group
200
250
220
225
250
234
Results from an output provided by
SPSS
Probability of a Type1 error is provided inside the red box added by
myself (not SPSS). Commonly, investigators call this the
significance level. It should be noted that statisticians would not
label that value as such.
Critical Values
• A critical value is that value using a theoretical distribution that
marks the point beyond which less than a specific percent of values
can be found.
– We typically use 5%.
• In our example we have 12 scores from 12 individuals, thus 10
degrees of freedom.
– From the distribution of all ts we can determine how large a calculated t from our
experiment must be for us to reject the null hypothesis of equal means.
– That value (see table previously shown) is 2.228.
– Our obtained t is larger than the critical value (-5.465). We reject the null
hypothesis in favour of the alternate.
– You will notice that the t value is negative for our experiment. What is important
is the magnitude, not the direction. If we were to reverse the groups in our
calculations the value would have been positive.
Interpretation of the results
• Alcohol increases the amount of time needed to
turn off the buzzer suggesting that the subjects
are impaired in their reactions.
• We are able to make this statement because the
t value obtained here would be rare if the
samples came from the same population. Due
to this situation, we give ourselves permission to
reject the null hypothesis of equal means in the
population.
Some Important Concepts
The standard deviation
• The concept of variance and standard deviation
(SD) is everything in statistics.
• It is used to determine if individuals or samples
are inside or outside of normal.
• Anyone that is more than 1.96 SD away from the
population mean of some measure is said to not
belong to that population. However, this is only
true when we have population parameters (more
on this later).
A few formulas to help us along.
Variance:
( X  X )
S  Variance 
n 1
2
Standard Deviation (SD):
SD  S
2
Standard error of the mean=SEM:
SD
SEM 
n
2
Variability is Important
•The greater the variability the
greater the noise. Note here that
with greater variability in the data,
more overlap of the sample
distributions is observed.
•This will result in smaller signal to
noise ratios. Thus, when we have
more variability we will need larger
sample sizes to detect mean
differences (more on this later).
Keep this in mind when reviewing the upcoming slides.
T-Test
• Two Sample t-test
• Comparing two
sample means.
t
X1  X 2
S X2 1
n1

S X2 2
n2
It is evident from the formula
that the smaller the
variability, the larger the t
value.
Hypothesis Testing revisited.
• We always determine whether or not a
statistic is rare given the null hypothesis
never from the alternate hypothesis. You
might remember this from the Monte Carlo
studies.
• Thus we have to deal with the concept of
the Type1 and the Type2 error.
Type 1 error
• The probability of being wrong when stating that
samples are from different populations.
• This is the p<.05 that we use to reject the null
hypothesis of equal means in the population.
– If we have a p of .02, it means that the probability of
being wrong when stating that two samples come
from different populations is .02.
– The .05 is a cutoff that is said to be acceptable.
Type 2 error.
• The probability of failing to reject the null
hypothesis when the null is not true.
• In truth, the samples are most likely from
different populations. Often, we simply
don’t have enough power or the tools are
not sensitive enough to detect these
differences.
Assumptions of a Distribution
What are they and why are they
important?
Assumptions are rules
• They are the rules by which distributions
are constructed.
• These rules must be followed in order for a
statistic obtained from an experiment to be
compared to the theoretical distribution.
• If your experiment breaks these rules, it is
possible that you will either to conservative
or to liberal when making a statement
about the reality of the population.
Assumptions
1. Samples come from a normally
distributed population
2. Both samples have equal variances
(homogeneity of variance)
3. Samples are made up of randomly
selected individuals
4. Both samples be of equal sample size.
What to do when we violate
assumptions
• 1. We can transform the data so that the
sample can have the characteristics
desired.
• 2. We can use distribution free statistics.
– These statistics are insensitive to violations of
assumptions.
• However, they do have limitations (more in later
sessions).
Part 2 of workshop
Starting out with PASW (formerly
SPSS but now SPSS again)
An introduction
What is SPSS
• It is “Statistical Package for the Social
Sciences).
• It started life as a text driven program
(SPSSx), migrated to the PC as line code
and, finally made it to the Windows
environment. This is the version we enjoy
today.
Do you need the latest version?
• No.
• With each new version there are graphical
changes and on occasion additional
statistical tools.
– However, the basics do not change. An
analysis of variance conducted with version
10 will produce the same results as those with
version 19 (the latest at the time of this
workshop).
Latest version cont’d
• One problem is with the output of different
versions.
– Older versions of SPSS cannot read the
output of newer versions. Thus, the outputs
are not backward compatible.
– One way to get around this issue is to use the
export function in the newer versions to save
the outputs as PDF, DOC, or PPT so that the
results can be read.
Getting started
• If you’ve used Excel in the past, then you
have a base from which to work.
• SPSS uses a worksheet that is similar but
not identical to Excel.
– However, the similarities end there.
Learning Curve
• If you use SPSS on a regular basis, you
should be somewhat proficient in a week
or two.
– Developing an expertise will take you
somewhat longer depending on your interest
and statistics knowledge.
– Lets get started!
This is what you see when you
start the program. In front of you is
the worksheet in the “data view”.
You enter all your data in the worksheet.
You also have the option of “variable
view” by clicking on the tab below or
clicking on the column heading “var”.
The variable view is where you write down the name of
your variable (variable name). Also in this view you have
the option of providing variable labels and other descriptors
that can help you recognize your data.
Name your variable.
Let’s start with a short review on
variables.
• Independent variable (IV): That variable which
is manipulated.
• Dependent variable (DV): That variable whose
measures depend on some manipulation.
• Any experiment can have more than one IV or
DV.
• These variables have to be set up correctly in a
worksheet in order to properly analyze data.
Let’s say that the study is designed to determine if a
certain drug facilitates weight loss
• We will need an independent variable….say
Drug Type.
– We could have two groups based on drug treatment.
• Drug 1
• Drug 2
• We will also need a dependent variable…say
weight.
– In the worksheet we will indicate the weight for each
individual after being on the drug for a period of time.
Entering data. We simply click on an empty box and begin typing as
appropriate. Shown here are the designations for group membership
for the IV in our fictitious experiment with two groups.
Back to the variable view where we change the variable name and add
a label which will help us remember what that variable means for future
reference. Also, the variable label is the text that will be printed on the
output following an analysis.
Clicking on the empty square under values allows
for the user to specify group names.
The number value is assigned a label by the user.
On returning to the worksheet, the group
labels and the variable name specified by
you replace the default labels.
We will now add the dependent
variable with data
DV
IV
Some Descriptive Statistics
• PASW easily allows us to produce
descriptive statistics.
– Mean
– Standard deviation
– Standard error
– Median
– Etc….
You conduct all analyses from the Analyze option. Here we are asking
for PASW to show descriptive statistics using the Means sub-option.
Many options for descriptive
statistics are available
Relevant output table is shown here. Note that the statistics requested
in the earlier slide are displayed in this table.
Report
This is the dependent variable for weight
This is the independent
variable
Std. Error of
Mean
dimension1
N
Std. Deviation
Mean
Drug 1
48.3000
10
7.95892
2.51683
Drug 2
80.8000
10
9.17484
2.90134
Total
64.5500
20
18.65046
4.17037
Graphs: Can be constructed from
a number of options
You may wish to use the chart
builder option but users who are
familiar with older versions of this
program sometimes find it difficult
to change. I like the legacy option
which retains the old method.
In the next slide we will see a graph
using the error bar option.
Here we have the 95% intervals but typically you
would want the error bars to represent one
standard error.
Finally an analysis
• We will conduct a two sample independent
t-test.
Here we specify the tests of means
in the compare means option
You must indicate which groups
will be compared. You must use
the number assigned to the
groups.
Independent Samples Test
Levene's Test for Equality of
Levene’s test determines if the
variance in one group is different
from the other. This is an
important assumption.
Variances
F
This is the dependent
Equal variances assumed
variable for weight
Equal variances not
Sig.
1.138
.300
assumed
Independent Samples Test
t-test for Equality of Means
t
df
Sig. (2-tailed)
This is the dependent
Equal variances assumed
-8.462
18
.000
variable for weight
Equal variances not
-8.462
17.648
.000
assumed
Independent Samples Test
t-test for Equality of Means
Mean
Std. Error
Difference
Difference
This is the dependent
Equal variances assumed
-32.50000
3.84086
variable for weight
Equal variances not
-32.50000
3.84086
assumed
Independent Samples Test
t-test for Equality of Means
95% Confidence Interval of the
Difference
Lower
Upper
This is the dependent
Equal variances assumed
-40.56935
-24.43065
variable for weight
Equal variances not
-40.58090
-24.41910
assumed
The results are significant.
Sig. (2-tailed) is the Type 1
error.
Let’s add a third group
• The same method as building the
database in the first place applies to
adding a group.
• With the addition of a third group we will
need to perform an analysis of variance
(ANOVA) with posthoc tests.
ANOVA
This is the dependent variable for weight
Sum of Squares
Between Groups
Within Groups
Total
df
Mean Square
14581.400
2
7290.700
2492.600
27
92.319
17074.000
29
F
78.973
Sig.
.000
Multiple Comparisons
This is the dependent variable for weight
Tukey HSD
(I) This is the
(J) This is the
independent variable
independent variable
Drug 1
Mean
Difference (I-J)
4.29694
.000
-53.60000
*
4.29694
.000
Drug 1
32.50000
*
4.29694
.000
Drug 3
-21.10000
*
4.29694
.000
Drug 1
53.60000
*
4.29694
.000
Drug 2
21.10000
*
4.29694
.000
Drug 3
dimension2
dimension3
Drug 3
dimension3
*. The mean difference is significant at the 0.05 level.
Multiple Comparisons
This is the dependent variable for weight
Tukey HSD
(I) This is the
(J) This is the
independent variable
independent variable
Drug 1
95% Confidence Interval
Lower Bound
Upper Bound
Drug 2
-43.1539
-21.8461
Drug 3
-64.2539
-42.9461
Drug 1
21.8461
43.1539
Drug 3
-31.7539
-10.4461
Drug 1
42.9461
64.2539
Drug 2
10.4461
31.7539
dimension3
Drug 2
dimension2
dimension3
Drug 3
Sig.
-32.50000
Drug 2
dimension3
Drug 2
Std. Error
*
dimension3
Significant
results.
Interpretation
• The ANOVA indicates that there are
differences between the groups.
• This result allowed for conducting a
posthoc Tukey test.
– All groups are considered different from one
another.
– This is shown by the observation that all
comparisons are significant.
A graph of the results obtained from the Univariate suboption is shown here.
Adding a second IV will allow us to conduct an interaction
analysis using the Univariate sub-option.
Tests of Between-Subjects Effects
Dependent Variable:This is the dependent variable for weight
Source
Type III Sum of
Squares
df
Mean Square
F
Sig.
Corrected Model
a
14581.400
2
7290.700
78.973
.000
Intercept
177870.000
1
177870.000
1926.699
.000
14581.400
2
7290.700
78.973
.000
Error
2492.600
27
92.319
Total
194944.000
30
17074.000
29
IndpendentVar
Corrected Total
a. R Squared = .854 (Adjusted R Squared = .843)
We observe a significant main effect for IV1 but not IV2. Also, there is no
significant interaction between IV1 and IV2 on the dependent variable.
See graph.
After all this you might want to
explore the interaction
• You would run simple main effect analysis
which can be done through a syntax
window.
• You write a program
• This was the norm when PASW was
SPSSx. SPSS was text driven.
Syntax
This program allows us to determine if there are differences on the
dependent variable of one IV at levels (groups) of another
variable.
Results of the Simple Main Effects
Analysis
• Next slide.
The default error term in MANOVA has been changed from WITHIN CELLS to
WITHIN+RESIDUAL. Note that these are the same for all full factorial designs.
*****************Analysis of Variance*****************
30 cases accepted.
0 cases rejected because of out-of-range factor values.
0 cases rejected because of missing data.
6 non-empty cells.
1 design will be processed.
-----------------------------------------------------------* * * * * * * * * * * * * * * * * A n a l y s i s o f V a r i a n c e -- Design 1 * * * * * * * * * * * *
****
Tests of Significance for DependentVar using UNIQUE sums of squares
Source of Variation
SS
DF
MS
F Sig of F
WITHIN+RESIDUAL
2469.20
24 102.88
INDEPENDENTVAR2 WITH
16.90
1 16.90
.16
.689
IN INDEPENDENTVAR(1)
INDEPENDENTVAR2 WITH
6.40
1
6.40
.06
.805
IN INDEPENDENTVAR(2)
INDEPENDENTVAR2 WITH
.10
1
.10
.00
.975
IN INDEPENDENTVAR(3)
INDEPENDENTVAR
14581.40
2 7290.70 70.86
.000
(Model)
14604.80
5 2920.96 28.39
.000
(Total)
17074.00
29 588.76
R-Squared =
.855
Adjusted R-Squared = .825
Here is how you would set up a database for a
repeated measures design
1. Arrange groups in columns so that group
one has data in column 1, group 2 in
column 2 and so on.
2. Specify the IV in PASW.
3. Define the groups by specifying which
column belongs to which group.
4. Click on OK.
Use repeated
measures option
Group data are in columns
Give the
variable a name
and indicate the
number of
groups (3 in this
case)
Click on add to get this
popup.
Results are significant.
We can say that there are mean differences between the groups
but we cannot say which pairs of groups differ.
Tests of Within-Subjects Effects
Measure:MEASURE_1
Source
Type III Sum of
Squares
Drug
Always interpret
using the
GreenhouseGeisser.
Error(Drug)
df
Mean Square
Sig.
Sphericity Assumed
5806.333
2
2903.167
102.344
.000
Greenhouse-Geisser
5806.333
1.276
4549.675
102.344
.000
Huynh-Feldt
5806.333
1.519
3821.923
102.344
.000
Lower-bound
5806.333
1.000
5806.333
102.344
.000
Sphericity Assumed
283.667
10
28.367
Greenhouse-Geisser
283.667
6.381
44.455
Huynh-Feldt
283.667
7.596
37.344
Lower-bound
283.667
5.000
56.733
Tests of Within-Subjects Effects
Measure:MEASURE_1
Source
Drug
F
Partial Eta
Noncent.
Observed
Squared
Parameter
Power
a
Sphericity Assumed
.953
204.689
1.000
Greenhouse-Geisser
.953
130.613
1.000
Huynh-Feldt
.953
155.483
1.000
Lower-bound
.953
102.344
1.000
a. Computed using alpha = .05