Download Amsterdam Rehabilitation Research Center | Reade

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Operations research wikipedia , lookup

History of statistics wikipedia , lookup

Categorical variable wikipedia , lookup

Transcript
Descriptive statistics
Marike van der Leeden, PhD
Amsterdam Rehabilitation Research Center | Reade
Literature
Amsterdam Rehabilitation Research Center | Reade
Content
Descriptive statistics:
•
Measurement levels of data
•
Data description
•
Distribution of data
Amsterdam Rehabilitation Research Center | Reade
Basic principles of statistics
Amsterdam Rehabilitation Research Center | Reade
3
Reasons for using statistics
•
•
•
aids in summarizing the results
helps us recognize underlying trends and tendencies
in the data
aids in communicating the results to others
Amsterdam Rehabilitation Research Center | Reade
Statistics ………!!
…. do not compensate for bad study design
…. are not a way to determine clinical relevance
Amsterdam Rehabilitation Research Center | Reade
Measurement levels of data
Amsterdam Rehabilitation Research Center | Reade
Variables
Two kinds of variables:
• Dependent variabele(s)
• Independent variabele(s)
Is drinking coffee a predictor in developing cardiovascular
diseases?
Amsterdam Rehabilitation Research Center | Reade
Categorical data (1)
Dichotomous data (2 categories)
(e.g. gender)
Nominal data (>2 categories)
(e.g. blood group)
•
•
limited number of mutually exclusive categories
categories are not ordered
Amsterdam Rehabilitation Research Center | Reade
8
Categorical data (2)
Ordinal data
(e.g. education)
•
•
•
limited number of mutually exclusive categories
categories are ordered; indicates ranking of categories
distances between scores are unequal
Amsterdam Rehabilitation Research Center | Reade
9
Numerical data (1) - discrete/continuous data Discrete data
 whole numbers (counting) (e.g. number of children)
Continuous data
 can take any value within a certain range (measuring) (e.g.
body weight)
Amsterdam Rehabilitation Research Center | Reade
10
Numerical data (2) - interval/ratio data Interval data
• scores are quantitative
•
indicates amount of differences between scores
•
distances between scores are equal
•
(e.g. NRS, temperature °C)
Amsterdam Rehabilitation Research Center | Reade
11
Numerical data (3) - interval/ratio data Ratio data
• scores are quantitative
•
indicates amount of differences between scores
•
distances between scores are equal
•
ratio comparisons (e.g. length, weight)
Amsterdam Rehabilitation Research Center | Reade
12
In summary
Variable
Categorical
(qualitative)
Nominal
categories
Ordinal
ordered
categories
Amsterdam Rehabilitation Research Center | Reade
Numerical
(quantitative
Continuous
Discrete
any value
(measuring)
whole
numbers
(counting)
13
Data description
Amsterdam Rehabilitation Research Center | Reade
Descriptive Statistics
Types of descriptive statistics:
Organize Data
• Tables
• Frequency Distributions
• Graphs
Summarize Data
• Central Tendency
• Variation
Amsterdam Rehabilitation Research Center | Reade
16
Amsterdam Rehabilitation Research Center | Reade
Graphs for numerical data: Histograms
Amsterdam Rehabilitation Research Center | Reade
Graphs for categorical data: Bar graphs
Amsterdam Rehabilitation Research Center | Reade
18
Graphs for categorical data: Pie
Amsterdam Rehabilitation Research Center | Reade
19
Summarizing categorical data
Amsterdam Rehabilitation Research Center | Reade
Summarizing numerical data
Central Tendency (or Groups’ “Middle Values”)
• Mean
• Median
• Mode
Variation (or Summary of Differences Within Groups)
• Range
• Interquartile Range
• Variance
• Standard Deviation
Amsterdam Rehabilitation Research Center | Reade
Mean
Most commonly called the “average.”
Add up the values for each case and divide by the total number of
cases.
Mean Y = (Y1 + Y2 + . . . + Yn)
n
Mean Y = Σ Yi
n
Amsterdam Rehabilitation Research Center | Reade
Median
The middle value when a variable’s values are ranked in order; the
point that divides a distribution into two equal halves.
When data are listed in order, the median is the point at which
50% of the cases are above and 50% below it.
The 50th percentile.
Amsterdam Rehabilitation Research Center | Reade
Median
89
93
97
98
102
106
109
110
115
119
128
131
140
Amsterdam Rehabilitation Research Center | Reade
Median = 109
(six cases above, six below)
Median
1.
The median is unaffected by outliers, making it a better
measure of central tendency, better describing the “typical
person” than the mean when data are skewed.
Amsterdam Rehabilitation Research Center | Reade
Median
2.
3.
If the recorded values for a variable form a symmetric
distribution, the median and mean are identical.
In skewed data, the mean lies further toward the skew than
the median.
Symmetric
Skewed
Mean
Median
Amsterdam Rehabilitation Research Center | ReadeMedian
Mean
Mode
The most common data point
In symmetric distributions, the mean, median, and mode are
the same.
In skewed data, the mean and median lie further toward the
skew than the mode.
1.
2.
3.
Symmetric
Median
Skewed
Mean
Amsterdam Rehabilitation Research Center | Reade
Mode
Mode Median Mean
Statistics
Systolisch
N
Mean
Median
Mode
Std. Deviation
Variance
Range
Percentiles
Valid
Missing
25
50
75
Amsterdam Rehabilitation Research Center | Reade
904
68
161,02
160,00
160
28,507
812,672
195
140,00
160,00
180,00
Descriptive Statistics
Summarizing Data:
Central Tendency (or Groups’ “Middle Values”)
 Mean
 Median
 Mode
Variation (or Summary of Differences Within Groups)
• Range
• Interquartile Range
• Variance
• Standard Deviation
Amsterdam Rehabilitation Research Center | Reade
Range
The spread, or the distance, between the lowest and highest
values of a variable.
The median is a quartile and divides the cases in half.
The interquartile range is the distance or range between the 25th
percentile and the 75th percentile.
25% of
cases
0
25%
25%
Amsterdam Rehabilitation Research Center | Reade
250
500
750
25% of
cases
1000
Boxplot
p100
{
12
p75
Median =
p50
p25
T EST VAR
IQR
14
10
8
6
p0
4
N=
56
1.00
TESTVAR2
Amsterdam Rehabilitation Research Center | Reade
Variance
A measure of the spread of the recorded values on a variable. A measure of
dispersion.
The larger the variance, the further the individual cases are from the mean.
Mean
The smaller the variance, the closer the individual scores are to the mean.
Mean
Amsterdam Rehabilitation Research Center | Reade
Variance
Calculating variance starts with a “deviation”
A deviation is the distance away from the mean of a case’s score:
Yi – Y-mean
Squaring the deviations will eliminate negative signs...
A Deviation Squared: (Yi – Y-mean)2
Variance (S2)
• Average of squared distances of individual points from the
mean
Amsterdam Rehabilitation Research Center | Reade
Variance
If you were to add all the squared deviations together,
you’d get what we call the
“Sum of Squares.”
Sum of Squares (SS) = Σ (Yi – Y-mean)2
SS = (Y1 – Y-bar)2 + (Y2 – Y-bar)2 + . . . + (Yn – Y-bar)2
Amsterdam Rehabilitation Research Center | Reade
Variance
The last step…
The approximate average sum of squares is the variance.
SS/N = Variance for a population.
SS/n-1 = Variance for a sample.
Variance = Σ(Yi – Y-mean)2 / n – 1
But: large and difficult to interpret
Amsterdam Rehabilitation Research Center | Reade
Standard Deviation
To convert variance into something of meaning, create standard
deviation.
The square root of the variance reveals the average deviation of
the observations from the mean: Square root of the Variance
• expressed in the original units of measurement
• Represents the average amount of dispersion in a sample
𝑠𝑑 =
Amsterdam Rehabilitation Research Center | Reade
Σ(Yi – Y−mean)2
𝑛−1
Standard Deviation
1.
2.
3.
4.
Larger s.d. = greater amounts of variation around the mean.
s.d. = 0 only when all values are the same (only when you
have a constant and not a “variable”)
If you were to “rescale” a variable, the s.d. would change by
the same magnitude
Like the mean, the s.d. will be inflated by an outlier case
value.
Amsterdam Rehabilitation Research Center | Reade
Descriptive Statistics
Summarizing Data:
Central Tendency (or Groups’ “Middle Values”)
 Mean
 Median
 Mode
Variation (or Summary of Differences Within Groups)
 Range
 Interquartile Range
 Variance
 Standard Deviation
Amsterdam Rehabilitation Research Center | Reade
Distribution of data
Amsterdam Rehabilitation Research Center | Reade
Normal distribution
“Bell curve” where many cases fall near
the middle of the distribution and few fall
very high or very low
Amsterdam Rehabilitation Research Center | Reade
Normal distribution
• Many characteristics are
distributed through the
population in a ‘normal’ manner
• Parametric statistics are based
on the assumption that the
variables are distributed
normally
• Most commonly used statistics
Amsterdam Rehabilitation Research Center | Reade
5
right
(positive)
skew
4
X
3
• skew (skewness)
2
5
1
4
0.2
0.4
0.6
D
0.8
1.0
1.2
3
X
0
0.0
left
(negative)
skew
2
1
Amsterdam Rehabilitation Research Center | Reade
0
0.0
0.2
0.4
0.6
D
0.8
1.0
1.2
Amsterdam Rehabilitation Research Center | Reade
[“peakedness”]
kurtosis
0.22
0.4
0.8
X
X
0.00
-5
0.0
-5
5
D
0.0
-5
5
‘leptokurtic’
D
Amsterdam Rehabilitation Research Center | Reade
’platykurtic’
5
Normal distribution
In normal distributed data:
Mean ≈ median ≈ mode
Judging normality
•
•
•
•
Mean ≠ median
Mean – 2*sd < minimal score
a-symmetric boxplot
Test for skewness and kurtosis (SPSS)
Amsterdam Rehabilitation Research Center | Reade
Which information do you present?
Depends on:
• Type variable
• Dichotome/categorical: frequencies
• Continuous: summarize data
• Normaly distributed data?
• Yes: mean en sd
• No: median en IQR
Amsterdam Rehabilitation Research Center | Reade
So…
Descriptive statistics are used to summarize data from
individual respondents, etc.
• They help to make sense of large numbers of individual
responses, to communicate the essence of those responses to
others
•
They focus on typical or average scores, the dispersion of scores
over the available responses, and the shape of the response
curve
Amsterdam Rehabilitation Research Center | Reade