Download STATISTICS - PART I I. TYPES OF SCALES USED A. Nominal

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
STATISTICS - PART I
I.
TYPES OF SCALES USED
A.
Nominal
- for qualitative variables, or categories
- no magnitude on a nominal scale
B.
Ordinal
- rank-ordering
- have magnitude, but no real numeric value to each variable
- no equal interval is assumed between units/values
C.
Interval
- numerical scales on which the distance between adjacent units is equal
-> numeric labels therefore are meaningful as numbers
- can perform simple math functions
- BUT no absolute zero point
D.
Ratio
- numerical scales on which the distance between adjacent units is equal
- have absolute zero points
-> can compute ratios with numbers on ratio scales
Nominal:
Identify which intelligence test (e.g., WAIS-III or Stanford-Binet)
is more commonly used in each country of the world.
- categorizing
Ordinal
“My order of preferences to administer are 1) WISC-III; 2) WAIS-III; 3) S-Binet”
- each category stands in an ordered relationship to the others
- but the intervals between the three categories are unknown
Interval
Measured the IQ of various people with a WAIS-III
e.g., IQ = 95, 123, 118, etc.
- equal intervals
- no absolute zero
Ratio
For IQ tests, relevant primarily to performance speed
- equal intervals
- absolute zero
II.
ORGANIZING DATA
** a test score, by itself, is meaningless
- the meaning of a test score depends on its position in a distribution of test scores
A.
Frequency Distributions
- can group together scores to better interpret the data
EX
EX
Employee
Age
1
22
2
35
3
30
4
30
5
45
6
30
7
60
8
33
9
62
10
25
11
35
12
53
13
47
14
21
15
38
16
40
17
30
18
26
19
30
20
48
Interval
Tally
f (frequency)
56-62
11
2
49-55
1
1
42-48
111
3
35-41
1111
4
28-34
111111
6
21-27
1111
4
B.
Graphing frequency distributions
X-axis = horizontal e.g., characteristics of the scores
Y-axis = vertical e.g., the scores themselves
C.
Shapes of the distributions
1.
Symmetrical vs. skewed
Symmetrical - two halves of curve mirror each other
- scores are evenly distributed
Skewed - two halves are different
- scores lie more in one half than in the other
2.
Positive vs. negative skew
Positive skew = most scores fall in lower half of distribution
- outliers pull the distribution to the right/up
Negative skew = most scores fall in upper half of distribution
- outliers pull the distribution to the left/down
3.
Importance of the distribution’s shape
- if distribution is symmetrical, approximates “normal” curve
- if a skew is present, can give information about the test itself
a.
Negative skew
“ceiling effect”
- measure is unable to make fine discriminations at high end
- solution in test construction = add more difficult items
b.
Positive skew
“floor effect”
- measure is unable to make fine discriminations at low end
- solution in test construction = add more easy items
III.
MEASURES OF CENTRAL TENDENCY
A.
Mean = the numerical average of all scores in the distribution
Advantages:
1.
Every score contributes to the mean
2.
Generally, the mean is the least subject to sampling variation
- thus, the mean is a relatively stable value for making inferences from a sample to
the population
Disadvantages:
1.
A change in ANY score changes the mean, unlike median & mode
EX
1,2,3,4,5 = 3.0
1,2,3,4,6 = 3.2
2.
Sensitive to extreme scores
EX
1,2,3,4,5 = 3.0
1,2,3,4,20 = 6.0
-7,1,2,3,4 = .6
B.
Median = midpoint of the scores
- 50% of scores fall above/below
For individual scores:
1.
Arrange the scores in rank order
2.
If the number of scores is odd, the central score is the median
3.
If the number of scores is even, the median is the average of the two
central scores
EX
1,2,3,4,5
median = 3
1,2,3,4,5,6
median = 3.5
1,2,2,2,3,4,5
median = 2
Advantage:
1.
A change in ANY score does not necessarily change the median
- less sensitive than the mean
EX
1,2,3,4,5
Mean = 3.0
Median = 3
1,2,3,4,6
Mean = 3.2
Median = 3
-> For strongly skewed distributions, median is a better indicator of
central tendency than the mean
Disadvantage:
1.
Median is the more subject to sampling variation than the mean
- for repeated samples from a population, the median will vary
more than will the mean (but less than will the mode)
- Thus, the median is not as stable and is less useful for making
inferences from the sample back to the population
C.
Mode = the most frequent score
EX
1,2,2,2,3,4,5
Mode = 2
EX
35, 37, 48, 52, 52
Mode = 52
EX
1,2,3,4,5
No mode
EX
1,2,2,3,3,4
Mode = 2, 3 (bimodal)
- distributions can be unimodal, bimodal, trimodal, etc.
Disadvantage:
- not very useful because it is not stable across samples, so much less used
General
- For normal distributions the mean, median, and mode will be identical
- For skewed distributions, the mean and the median will be different
- Unless outliers are present (skew) mean is best measure because many statistics
are designed with the mean as central tendency
- If strong skew, median is preferred
IV.
MEASURES OF VARIABILITY
Variability = amount of dispersal/variety in scores
- how much scores deviate from mean
A.
Range = the difference between the highest and lowest scores
EX
1, 2, 3, 4, 5
Range = 5 - 1 = 4
Advantage:
- easy measure of how spread out the scores are
Disadvantage:
- only measures the 2 most extreme scores
- no indication of how much most scores deviate
B.
Variance
= the average of the squared deviations about the mean
- average distance from the mean of all the scores in the distribution
- cannot use the simple average of deviations about mean (sum to 0)
- so, square those differences, and take that average
- a distribution that is spread out will have a high squared deviation (variance)
- a distribution that is tightly packed around the mean will have a low variance
Variance is useful, BUT 1 weakness:
- deviations are squared
-> variance is given in a scale/units different from the underlying
distribution being described
EX
C.
distribution = 10, 20, 30, 40, 50
variance = 250.00
Standard Deviation
- square root of the variance
- same information about variability
- but in same units as scores themselves
EX
distribution = 10, 20, 30, 40, 50
Vari = 250.00
SD = 15.81
Properties of the Standard Deviation
1.
Gives a measure of variability of the scores with respect to the mean.
2.
Like the mean, SD is sensitive to each score in the distribution.
3.
Also like the mean, the SD is stable with respect to sampling fluctuation.
4.
The closer the scores are to the mean, the smaller the SD
- The farther many scores are from the mean, the larger the SD
General:
- goal of assessment = describe individuals
- data from assessment are most useful when there is some variability in the construct
being measured
-> if everyone scored same on a test (no variability), the test would tell us nothing about
individuals