Download Chapter 11 Gillis & Jackson Descriptive Statistics PP

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linear regression wikipedia , lookup

Data assimilation wikipedia , lookup

Regression analysis wikipedia , lookup

Regression toward the mean wikipedia , lookup

Least squares wikipedia , lookup

Forecasting wikipedia , lookup

Time series wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Chapter Eleven
A Primer for Descriptive
Statistics
Descriptive Statistics
• A variety of tools, conventions, and
procedures for describing variables
and relationships between variables
Measurement is the process of
assigning numbers to phenomena
according to a set of rules
Levels of Measurement
Nominal: involves no underlying continuum;
assignment of numeric values arbitrary
Examples: religious affiliation, gender, etc.
Levels of Measurement
Ordinal: implies an underlying continuum;
values are ordered but intervals are not
equal.
Examples: Community size, Likert items,
etc.
Levels of Measurement Cont.
Ratio: involves an underlying continuum;
numeric values assigned reflect equal
intervals; zero point aligned with true zero.
Examples: weight, age in years, % minority
Data Distributions
• A listing of all the values for any one variable
• The most basic technique for presenting a large
data set is to create a frequency distribution
table
• A systematic listing of all the values on a
variable from the lowest to the highest with the
# of times (frequency) each value was observed
Normal Distribution
• A normal distribution roughly follows a
bell-shaped curve
• Bimodal distribution (2 peaks eg. male &
female body weight)
• Platykurtic distribution (flat & wide, great
deal of variability)
• Leptokurtic distribution (peaked, little
variability)
Measures of Central Tendency
• A single numeric value that summarizes the
data set in terms of its “average” value.
• Eg. the nurse researcher uses the value of
98.6 F or 37 C to describe the average adult
body temperature
Measures of Central Tendency
Mean: calculated by summing values and
dividing by number of cases
Median: caluculated by ordering a set of
values and then using the middle most
value (in cases of two middle values,
calculated the mean of the two values.
Mode: the most frequently occuring value.
Measures of Dispersion
Range: calculated by substracting lowest value
from the highest value in a set of values.
Standard Deviation: a measure reflecting the
average amount of deviation
in a set of values.
___________
_
sd =
 (X - X)²
N-1
Dispersion Cont.
Variance: this measure is simply the
standard deviation squared.
 (X - X)²
Variance = sd² =
N - 1
Standardizing Data
• To standardize data is to report
data in a way that comparisons
between units of different size may
be made
Standardizing Data
Proportions: represents the part of 1 that some
element represents. A so-called batting average
is actually a proportion because it represents:
BA =
Number of Hits
Number at Bats
Percentage: a proportion may be converted
to a percentage by multiplying by 100.
If a players batting “average” is .359 we could
convert that to a percentage by multiplying by
100. In this case, the percentage of time the
person gets a hit is 35.9%.
In short, a percentage represents how often
something happens per 100 times.
Percentage Change: a measure of how much
something has changed over a given time
period. Percentage change is:
Time 2 - Time 1 x 100
Time 1
Thus, if there were 25 nurses now compared
to 17 five years earlier, the percentage change
over the 5 year period would be:
((25 - 17)  17) x 100 = 47.1%
Rates: represent the frequency of something
for a standard sized unit. Divorce rates,
suicide rates, crime rates are examples. So if
we had 104 suicides in a population of
757,465 the suicide rate per 100,000 would
be calculated as follows:
SR =
104
x 100 = 13.73
757,465
I.e., there are 13.73 suicides per 100,000
Ratio: represents a comparison of one thing
to another. So if there are 200 suicides in the
U.S. and 57 per 100,000 in Canada, the
U.S./Canadian suicide ratio is:
US Suicide Rate
= 200 = 3.51
Candian Suicide Rate
57
Normal Distribution
Much data in the social and physical world
is “normally distributed”. If it is this means
that there will be a few low values, many
more clustered toward the middle, and a few
high values. Normal distributions are:
• symmetrical, bell-shaped curve
• mean, mode, and median will be similar
•2/3 of cases ± 1 standard deviation of mean
• 95.6 cases ± 2 standard deviations of mean
Normal Distribution Cont.
Z Scores
A Z score represents the distance, in standard
deviation units, of any value in a distribution.
The Z Score formula is as follows:
__
Z = X - X
sd
Exercise:
Suppose:
Subject
Case 1
Case 2
Case 3
Case 4
Income Mean = $72,000; SD = $18,000
Education Mean = 11 years; SD = 4 years
Income
80,000
70,000
91,000
56,000
Education
14
10
19
8
Calculation Case 1:
Case 1 Z (income) = 80,000 - 72,000 = .44
18,000
Case 1 Z (education) = 14 - 11 = .75
4
SES score Case 1 = .44 + .75 = 1.19
Areas Under the Normal Curve
• draw normal curve, include lines to
represent problem
• calculate Z score(s) for problem
• look up value in Table 11.14
• Solve problem, recall that .5 of cases fall
above the mean, .5 below
• convert proportion to percentage, if needed
Exercise:
Suppose you wished to know percentage of cases will fall above
$100,000 in a sample whose MEAN is $65,000 and the SD is $22,000
Show p. 370 of text
Z = 1.59
100,000 - 65,000 / 22,000
look up in Table 11.14, p 368 = .4441
.5000 - .4441 = .0559 (proportion)
x 100 = 5.6% (percentage)
Describing Relationships
Between Variables
1. Crosstabular Analysis:
used with a nominal dependent variable
we cross-classify the information to
show the relation between an independent
and a dependent variable a standard table
looks like the following:
Table 11.11 Plans to Attend University by Size of Home Community
=================================================================
Town up
Town over
University Rural
to 5,000
5,000
TOTAL
Plans? N
%
N
%
N
%
N
%
----------------------------------------------------------------Plans
69 52.3
44 48.9
102 73.9
215 59.7
No Plans 63 47.7
46 51.1
36 26.1
145 40.3
___ _____
___ _____
___ _____
___ _____
TOTAL 132 100.0
90 100.0
138 100.0
360 100.0
----------------------------------------------------------------If appropriate, test of significance values entered here.
Rules for Crosstabular Tables:
• in
table title, name dependent variable first
• place dependent variable on vertical axis
• place independent on horizontal plane
• use clear variable labels
• run % figures toward independent variable
• report % to one decimal point
• statistical data reported below table
• interpret by comparing % in categories
of the independent variable
2. Comparing Means
• used when dependent variable is ratio
• comparison to categories of independent
variable
• both t-test and ANOVA may be used
Presentation may be as follows:
Mean Heart Rate by Treatment Group
-----------------------------------------------------------Treatment Group
Mean Heart Rate
Number of Cases
-----------------------------------------------------------Touch Therapy
74.6
78
Routine Treatment
77.1
77
COMBINED MEAN
75.8
155
-----------------------------------------------------------If appropriate, test of significance values entered here.
For Example: F = 3.514 df = 2,153 p = >.05
t Test
• T-test is used to determine:
• if the differences in the means of two
groups are statistically significant
• with samples under 30
• when comparing 2 groups on a ratio level
dependent variable
Analysis of Variance (ANOVA)
• ANOVA is used when 3 or more groups means are
compared, or
• When the means for 2 or more groups are compared
at 2 or more points in time in a single analysis (e.g.,
a pre-post experimental design)
• Computes a ratio that compares 2 kinds of
variability-with-in group & between-groups
variability
3. Correlation
• used with ratio level variables
• interest in both the equation and the strength
of the correlation
• Y = a + bX is the general equation
• the r is the symbol used to report the
strength of the correlation: can vary from
-1.0 to + 1.0
Sample Data Set
(X)
(Y)
2
3
3
4
5
4
7
6
8
8
Y
•
8
7
•
6
5
4
3
•
• •
2
1
0
0 1 2 3 4 5 6 7 8
X
Y
•
8
7
•
6
5
4
3
•
• •
Regression Line
2
1
0
0 1 2 3 4 5 6 7 8
X
Y
•
8
7
•
6
5
4
3
•
• •
b
h
b value (slope)
read here h/b
2
1
a value read here
0
0 1 2 3 4 5 6 7 8
X
Y
Predicted Value
8
7
•
6
5
4
3
•
•
• •
2
1
0
0 1 2 3 4 5 6 7 8
X