Download AAAA_NUIP Stats Lecture

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Categorical variable wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Time series wikipedia , lookup

Statistical inference wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Section I. Statistics
What do they mean and why are
they important?
What do stats mean?
• To be an intelligent consumer of statistics,
your first reflex must be to question the
statistics that you encounter. The British Prime
Minister Benjamin Disraeli famously said,
"There are three kinds of lies -- lies, damned
lies, and statistics."
• It is important to think about the numbers,
their sources, and most importantly, the
procedures used to generate them.
Top 10 ways you use statistics every day
•
•
•
•
•
•
•
•
•
•
Weather forecasts
Emergency preparedness
Predicting disease
Medical studies
Genetics
Political campaigns
Insurance
Consumer goods
Quality testing
Stock market
But I’m never going to do research!
• Six good reasons to study statistics
–
–
–
–
–
to be able to effectively conduct research,
to be able to read and evaluate journal articles,
to further develop critical thinking and analytic skills,
to act as an informed consumer,
and to know when you need to hire outside statistical help.
– Even Florence Nightingale did it!
Why nursing research
• Increasing emphasis on evidence based
practice
– Informs nurses’ decisions and actions
– Empowers nurses to make clinical decisions which
benefit their patients, whether individual or
community
– Friendly nursing research environment required
for Magnet status
– Increases recognition for nursing contribution in
health care and policy
Variables
• The characteristics we are measuring
– Varies according to the population, patient,
event, intervention
• Data levels of measurement help us measure
the variables
– Nominal
– Ordinal
– Interval
– Ratio
Data levels of measurement: Nominal
• sometimes called categorical or qualitative
– Permissible statistics: mode, chi-squared
– Lowest form of data, least sophisticated
• Names
• Characteristics/Descriptive (i.e. pain - throbbing, stabbing,
dull)
• Letters (i.e. M/F, Y/N)
• Numbers may be assigned to designate categories but have
no numerical meaning (i.e. M=1, F=2)
Data Levels of measurement: Ordinal
– Permissible statistics: median, percentile
– Can’t be added
• Rank order
–1st, 2nd, 3rd
• Rating
–Pain rating 0-10
• Likert scale
Likert scales
• Dissatisfied, somewhat dissatisfied, neither satisfied
nor dissatisfied, somewhat satisfied, very satisfied
– No numerical data to quantify
– Answers run on a continuum
Data Levels of measurement: Interval
• Permissible statistics: mean, SD, correlation,
regression, ANOVA
– Rank ordering of objects.
– Equivalent distance between each measurement
– The Fahrenheit scale is a clear example of the
interval scale of measurement
– Arbitrary zero does not represent the lowest value
Data Levels of measurement : Ratio
• Highest level of measurement
• Permissible statistics: same as interval plus more
• The ratio scale of measurement is similar to the
interval scale in that it also represents quantity and has
equality of units.
• has an absolute zero (no numbers exist below zero).
Very often, physical measures will represent ratio data
(for example, height and weight). Example: measuring
a length of a piece of wood in centimeters: you have
quantity, equal units, and the measure can’t go below
zero centimeters.
Examples of data levels of measurement
Subject
Ratio level
Interval level
Ordinal level
Nominal level
Cookie
180
70
6
2
Bunny
110
0
1
1
Frosty
165
55
4
2
Tootsie
130
20
3
1
Candy
175
65
5
2
Fluffy
115
5
2
1
Question 1
• The colors of M&M candies would be which
type of measurement?
A.
B.
C.
D.
Interval
Nominal
Ordinal
Ratio
Question 2
• Height, weight, lab test results, and age are
examples of which type of data
measurement?
A. Ratio
B. Nominal
C. Interval
D. Ordinal
Rankin Scale
• The Rankin scale is used to assess functional
status after stroke. Measurements are:
•
•
•
•
•
0 = no symptoms at all
1 = symptoms with no significant disability
2 = slight disability; unable to carry out previous activities
3 = moderate disability; needs some assistance, can walk alone
4 = moderately severe disability; unable to walk or attend bodily functions
without assistance
• 5 = severe disability; bedridden, incontinent, needs constant nursing care
• 6 = dead
Question 3
• The Rankin scale is which type of
measurement?
A. Ratio
B. Nominal
C. Interval
D. Ordinal
Section II. Descriptive Statistics and
Intro to the Normal Distribution
Descriptive Statistics= Describing the Data
• For any study, consider what parts would be
useful to describe in numbers
– Sample
– Variables of interest
• In any study where the data are numerical, data
analysis should begin with descriptive statistics.
• The appropriate choice of descriptive statistics
depends on the level of data that was collected!
Types of Summary Statistics
• Frequency distributions
– Ungrouped
– Grouped
– Percentages
• Measures of central tendency
• Measures of dispersion
Ungrouped Frequency Distributions
• The number of times something happened.
• Used with categorical data (ordinal, nominal)
• As simple as a tally or count
http://www.gigawiz.com/histograms.html
Example
• Using ungrouped frequency
distributions to describe
research variables
• How often newborns fit
each demographic criteria
or birth attendant reported
a particular behavior (ex.
using CHG vs. not)
From Rhee et al. (2008). Maternal and birth attendant hand washing and neonatal mortality in Southern
Nepal. Archives of Pediatrics and Adolescent Medicine, 162(7), 603-608
Grouped Frequency Distributions
• The number of times something happened.
• Used to break continuous data (often things like
age, weight, income) into groups.
– You will always loose some information by doing this
– There are conventions for groupings
• Groups ideally have equal ranges but may see open ended at
ends of data spectrum
• All data points must fit into a group
• Not too many, not too few (you don’t want to loose patterns
in the data)
Percentage Distributions
• What percentage of the time something
happened.
– Useful when comparing to studies with different
numbers of participants
– Often presented with other frequency
distributions in the following format: No.(%)
– Often graphically represented using pie charts, bar
charts
Example
• Questionnaires given to
parents of underimmunized children.
• The tables indicate the
number and percentage
of participants selecting
each response.
Luthy, K., Beckstrand, R., & Peterson, N. (2009).
Parental hesitation as a factor in delayed
Childhood Immunization
Question
• Which measure of
central tendency is being
used here to summarize
participant’s age:
– A- Mode
– B- Median
– C- Mean
– D- Standard deviation
Measure of Central Tendency
• Used to describe a “typical” result or the
middle of the dataset
• Most common measures:
– Median
– Mode
– Mean
Median
• Literally the number in the middle of the dataset
(odd # scores)
– 50% of scores above and 50% of scores below this
point (known as the 50th percentile)
• Most appropriately used for ordinal data
• Because focus is on middle score, the median is
less affected by outliers
Mode
• The most common score(s)
– May or may not be in the “middle” but is always a
number in the dataset
– Most appropriate for nominal data (ex. Most
answers are “yes”).
Mean
• = Sum of Scores / Total # of Scores
– Also known as an average
• Data must be continuous to generate a mean
(interval and ratio level data only!)
• Most affected by outliers
• May be denoted in a number of ways (M, X
mean)
Measures of Variance
• How spread out is the data? Or how different are
the scores from one another?
– Range
• Subtract the lowest number from the highest number in the
set. Tells the total distance between ends of the data set.
– Variance (interval or ratio levels only!)
• Computed mathematically and provides data on dispersion
or spread
– Standard deviation (interval or ratio levels only!)
• Relates dispersion of values to the mean
• Is an average of variance
• Usually reported as SD
Normal Distribution
• In a true normal distribution, the
mean, median, and mode are equal
• No real distribution exactly fits
• However, in most sets of data, the
distribution is similar to the normal
curve
Normal Distribution
•Unique properties
 All possible values fall
under the curve
 Probability of any score
occurring is related to
its location under the
curve
• Important SDs:
 68.3% of all values
within 1 SD from mean
 95.5% within 2SD from
mean
 99.7% within 3 SD from
mean
+/- 1 SD
+/- 2 SD
Section III.
Stat theory
Hypotheses
Type 1 and 2 Errors
Level of Significance
Power
Probability Theory (p values)
• Deductive
• Used to explain:
– Extent of a relationship
– Probability of an event occurring
– Probability that an event can be accurately
predicted
• Expressed as lowercase p with values
expressed as percents
Probability
• If probability is 0.23, then p = 0.23.
• There is a 23% probability that a
particular event will occur.
• Probability is usually expected to be p <
0.05.
• Example?
• Patients who cardiac arrest in the
operating room have a 5% chance of
death.
Decision Theory
• Inductive reasoning
• Assumes all groups in a study are the
same
• Up to the researcher to provide evidence
(NEVER use the words PROVE!) that
there really is a difference
• To test the assumption of no difference, a
cutoff point is selected before analysis.
Hypothesis
• Statement of the expected outcome
• Example?
• Nursing students who study in the
library have higher GPAs than nursing
students who study in their dorm
rooms/apartments.
Characteristics of a Hypothesis
•
•
•
•
•
•
Testable
Logical
Directly related to the research problem
Theoretically or Factually based
States relationship between variables
Stated so that it can be accepted or rejected
Research Hypothesis
• Directional
– explains and predicts the direction and
existence of a specific relationship
– relationship will be either positive or
negative
– more specific than the non-directional
hypothesis
– cause-and-effect hypothesis
• Non - Directional
Null hypothesis
• Statistical statement that there is no
difference between the groups under
study
Cutoff Point
• level of significance or alpha (α)
• Point at which the results of statistical
analysis are judged to indicate a
statistically significant difference between
groups
• For most nursing studies, level of
significance is 0.05.
Cutoff Point (cont’d)
Absolute
NO “CLOSE ENOUGH” - If value is
only a fraction above the cutoff point,
groups are from the same population.
Results that reveal a significant
difference of 0.001 are not considered
more significant than the cutoff point.
Inference
A conclusion/judgment based on evidence
Judgments are made based on statistical
results
Statistical inferences must be made
cautiously and with great care
Generalization
• A generalization is the application
of information that has been
acquired from a specific instance
to a general situation.
• Example?
Normal Curve
A theoretical frequency distribution of all
possible values in a population
.
Levels of significance and probability are
based on the logic of the normal curve.
Normal Curve
One-Tailed Test (cont’d)
Two-Tailed Test
Type I and Type II Errors
Type I error occurs when the
researcher rejects the null hypothesis
when it is true.
The results indicate that there is a significant
difference, when in reality there is not.
Type II error occurs when the
researcher regards the null hypothesis
as true but it is false.
The results indicate there is no significant
difference, when in reality there is a
difference.
Reasons for Errors
• Type I
– Greater @.05
level than .01
• Type II
– Greater @.01 level
than .05
– Flaws in research
methods
• Multiple variables
interact
• Precision of
instruments
• Small samples
Statistical Power
(AKA Power Analysis)
• DEF: the probability of rejecting the null
hypothesis when it should have been rejected
OR
• Probability that a statistical test will detect a
significant difference that exists
Power
• Maneuver to increase control over:
– Types of errors
– CORRECT DECISIONS
Power and Risk for Type II Error
Power analysis = 0.80 minimum
Influenced by sample size
As sample increases so does power
Influenced by effect size – degree to
which a phenomenon is present in a
population
The larger the true difference between the
two groups the greater the power
Question #1
The level of significance usually set in
nursing studies is at either:
a. .5 or .1
b. .05 or .01
c. .005 or .001
Question #2
Which of the following is TRUE about the level of
significance?
a. ensures that findings will be correct 95% of the
time if an alpha value was less than .05 was used
b. refers to a statistic calculated during computer
analysis
c. represents the risk the researcher is willing to take
in making a type I error and is established before
data is analyzed
Question #3
There is a greater risk of a Type I error with a
0.05 level of significance than with a 0.01
level of significance.
A. True
B. False
Section IV.
•Statistical Significance
•Clinical Significance
•Reliability
•Validity
•Generalizability & Inference
Statistical Significance
• Known as the Alpha ()
• The threshold at which statistical significance
is reached.
Cut Off Point
• Referred to as level of significance or alpha (α)
• Point at which the results of statistical analysis
are judged to indicate a statistically significant
difference between groups
• For many nursing studies, level of significance
is 0.05.
• Typically written as α = 0.05
Cutoff Point (cont’d)
• The cutoff point is absolute.
• If the value obtained is only a fraction above
the cutoff point no meaning can be attributed
to the differences between the groups.
Levels of Acceptable Significance
•
•
•
•
0.05
0.01
0.005
0.001
Clinical Significance
• Findings can have statistical significance but
not clinical significance.
• Related to practical importance of the findings
• No common agreement in nursing about how
to judge clinical significance
– Difference sufficiently important to warrant
changing the patient’s care?
Clinical Significance (cont’d)
• Who should judge clinical significance?
– Patients and their families?
– Clinician/researcher?
– Society at large?
• Clinical significance is ultimately a value
judgment.
Simpson & James (2005) Effects of Immediate Vs.
Delayed Pushing During Second-Stage Labor….
Significance differences between groups:
Fetal oxygen desaturation during second stage labor
(immediate: M=12.5; delayed: M=4.6), p = .001
Variable decelerations in fetal heart rate
(immediate: M=22.4; delayed: M=15.6), p = .02
There were no differences in length of labor, method
of birth, Apgar scores, or umbilical cord gases.
Question: A statistically significant finding
means that:
a. Findings are clinically important and valuable.
b. Interventions should be used in clinical practice.
c. Obtained results are not likely to have been due
to chance.
d. Results will be the same if the study is repeated
with another sample.
Question: A researcher reports that the results of
a study were not statistically significant. How is
this to be interpreted?
a. Intervention was not strong enough to make a
difference.
b. Researcher does not have enough evidence to
reject Ho.
c. Researcher’s logic or conceptualization in setting
up the study was faulty.
d. Topic is of no further interest to nurse
researchers or clinicians.
Testing Reliability of Measurement
• Examine reliability of study scales before using
them.
• The degree of consistency with which an
instrument measures a construct.
Reliability Coefficient
• A quantitative index
• Usually ranges from .00 to 1.00
• Provides an estimate of how reliable an
instrument is
• Should be at least 0.70
• Most common one is Cronbach’s alpha
Hollen et al. (1994) Measurement of QOL in
patients with.…Psychometric assessment of the
LCSS.
LCSS has good reliability
• Internal consistency of  = 0.82
• High reproducibility/stability (test-retest
reliability (n=52, r>0.75)
• High repeated inter-rater agreement
/equivalence among experts (95-100%
agreement)
Validity
1. The degree to which inferences made in a
study are accurate = Internal Validity
2. The degree to which results can be
generalized = External Validity
3. The degree to which an instrument measures
what it is intended to measure = Validity
Hollen et al. (1994) Measurement of QOL in patients
with.…Psychometric assessment of the LCSS.
Validity has been established for the LCSS
•
•
•
•
Content validity ~ expert panel
Convergence validity ~ similar QOL tool
Construct validity ~ unrelated tools
Criterion-related validity ~ correlation with a
“gold” standard (e.g. Sickness Illness Profile)
Inference
•A conclusion or judgment based on evidence
•Judgments are made based on statistical
results
•Statistical inferences must be made
cautiously and with great care
Generalization
• A generalization is the application of
information that has been acquired from a
specific instance to a general situation.
• Generalizing requires making an inference.
• Both inference and generalization require the
use of inductive reasoning.
Generalization (cont’d)
• An inference is made from a specific case and
extended to a general truth, from a part to a
whole, from the known to the unknown.
• In research, an inference is made from the
study findings to a more general population.
Simpson & James (2005) Effects of Immediate Vs.
Delayed Pushing During Second-Stage Labor….
“Results from this study suggest that delayed
second-stage pushing until the urge to push and
pushing with the open-glottis technique in
nulliparous women with epidural anesthesia is
more favorable for physiologic fetal well-being as
measured by FSpO (p. 155).”
“The benefits of less fetal oxygen desaturation
….appear to outweigh any disadvantages of a
longer second stage (p. 155).”
2
Question: Which of the following questions
relates to generalization?
a. Are the findings generally significant to
people in the study?
b. Can these findings be applied to other groups
or settings?
c. Does the degree of control in the study allow
for statistical significance?
d. How many alternative explanations can be
proposed?
Section V. Common Statistical Tests
• Independent T-Test
• One-Way ANOVA
• Chi-Square
• Correlation
• Regression
Independent T-Test
• To compare means between two groups
• The continuous variable is measured once.
For example:
Research Question
Is there a difference in self-efficacy for pain management in
week 10 between participants with Fibromyalgia (FM) in
guided imagery group and those in standard care group?
Hypotheses
Ho: µGI - µSC = 0
α = 0.05
Ha: µGI - µSC ≠ 0
Independent T-Test (Cont’d)
Tests of assumptions with the sample
• Independent groups (no overlap).
• Dependent variable is continuous (interval or ratio
level).
• Normal distribution.
• Homogeneity of Variance is met.
Group Statistics
Group
Self efficacy Guided
for pain
management Imagery (GI)
in week 10 Standard
Care (SC
N
Mean
Std.
Deviation
Std. Error
Mean
24
64.5833
22.69249
4.63209
24
49.8333
20.30992
4.14574
Independent T-Test (Cont’d)
Ho: µGI - µSC = 0
Ha: µGI - µSC ≠ 0
α = 0.05
p = 0.011 = 1.1%
t = 2.373
Conclusion:
There is a difference in selfefficacy in week 10
between participants with
Fibromyalgia (FM) in guided
imagery group and those in
standard care group. In our
sample, in week 10,
participants in guided
imagery group had greater
self-efficacy than those in
standard care group.
One-Way Analysis of Variance
(ANOVA)
• Tests for differences between means.
• More flexible than other analyses in that it can examine data from
two or more groups.
For example:
Research Question
Is there a difference in depression scores depending on types of
elderly housing and care (independent living, assisted living, and
nursing care)?
Hypotheses
Ho = µIL = µAL = µNC
Ha = At least 2 groups differ
α = 0.05
ANOVA (cont’d)
Tests of assumptions
— Independent groups
— Normal distribution
Variables
Depression scores, Mean
(SD)
- Continuous dependent variable
- Homogeneity of Variance is met
Independent
Living
(n=16)
12.25
(7.594)
Assisted
Living
(n=19)
12.84
(7.274)
Nursing
p
Care
(n=17)
16.44
0.234
(8.043) (> 0.05)
If significant, Post Hoc tests are used to determine the location of
differences.
Conclusion:
There is no difference in depression scores depending on types
of elderly housing and care (independent living, assisted living,
and nursing care).
Chi-Square Test of Independence
• Used with nominal or ordinal data
• Hypothesis:
– Ho: There is no difference in Y depending on X
– Ha: There is a difference in Y depending on X
• Assumptions:
– Frequency data
– Adequate n: > 5 expected per cell and can be
violated up to 20% of cells.
Example of Chi-Square Test
Research Question
Is there a difference in depression at week 12
depending on the helplessness category - low or high?
Hypotheses
• Ho: There is no difference in depression at week 12
depending on the helplessness category - low or
high.
• Ha: There is a difference in depression at week 12
depending on the helplessness category - low or high
Crosstabulation
AHI
Depression (cat.)
at week 12
Not Depressed Count
Expected Count
% within AHI
Depressed
Count
Expected Count
% within AHI
Total
Count
Expected Count
% within AHI
2 = 5.99, df = 1, p = 0.07 or 7%
Low
26
22.3
High
14
17.7
Total
40
40.0
89.7%
60.9%
76.9%
3
6.7
9
5.3
12
12.0
10.3%
39.1%
23.1%
29
29.0
23
23.0
52
52.0
100.0% 100.0%
100.0%
-Arthritis Helplessness Index (AHI)
Conclusion:
There is a difference in depression at week 12 depending on the helplessness
category - low or high. Those people in the high helplessness group had higher
level of depression compared to those in the low helplessness group.
Pearson Product-Moment Correlation
• Tests for the presence of a relationship between two
variables
– Called bivariate correlation
• Types of correlation are available for all levels of
data. Best results are obtained using interval data.
• Results
– Nature of the relationship (positive or negative)
– Magnitude of the relationship (–1 to +1)
– Strength of r: High= > 0.70; Moderate= 0.30-0.69;
Low= < 0.30
– Testing the significance of a correlation coefficient
– The R2 is the variation between two variables expressed
as a percentage.
Scatterplots and Correlation
Coefficients
Maximum positive
correlation
(r = 1.0)
Maximum negative
correlation
(r = -1.0)
Strong correlation
& outlier
(r = 0.71)
Correlation Results
QUESTION
Which one is significant if level of significance
used in this test is 0.01?
A. r = 0.56 (p = 0.03)
B. r = –0.13 (p = 0.2)
C. r = 0.65 (p = 0.002)
D. r = 0.33 (p = 0.04)
Regression Analysis
• Used when one wishes to predict the value of one
variable based on the value of one or more other
variables
• For example:
– one might wish to predict the possibility of passing the
credentialing exam based on grade point average (GPA)
from a graduate program.
– Or to predict the length of stay in a neonatal unit based on
the combined effect of multiple variables such as
gestational age, birth weight, number of complications,
and sucking strength.
Regression Analysis (cont’d)
• Assumptions:
–
–
–
–
Must have Independent Variable & Dependent Variable
Both variables must be continuous
Normally distributed data
Linear relationship (scatter plot)
• The outcome of analysis is the regression coefficient R.
• When R is squared, it indicates the amount of variance
in the data that is explained by the equation.
• The R2 is also called the coefficient of multiple
determination.
Regression Results
• R2 = 0.63
• This result indicates that 63% of the variance
in length of stay can be predicted by the
combined effect of age, weight, complications,
and sucking strength.
Overlay of Scatterplot and Best-Fit Line
Conclusion
• Statistical tests selection depends on the
research question.
• Some research questions can be answered by
using basic statistical tests; while others
require advanced statistical tests.