Download Chapter 10

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Chapter 10
Describing Data
Distributions
McGraw-Hill/Irwin
© 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Categorical Data Print
Output Frequency Table
Occupational Status:
Category
Code
Professional
Mgr., Executive
Admin., Clerical
Engr., Technical
Sales, Marketing
Craft, Trade
Semi-Skilled
Missing Data
Total
1
2
3
4
5
6
7
0
Freq.
37
62
69
16
30
22
27
5
268
Pct.
Adj. Cum.
13.8
23.1
25.7
6.0
11.2
8.2
10.1
1.9
14.1 14.1
23.6 37.6
26.2 63.9
6.1 70.0
11.4 81.4
8.4 89.7
10.3 100.0
Missing
100.0 100.0
Modal and Median Category
10-2
Frequency and Percentage
Distributions Report Format
Age
Number Percent
Over 50
94
22.4
36 to 50
188
45.4
18 to 35
132
31.9
10-3
Bar Chart With Frequency
Labels
Over 50
94
36 to 50
188
18 to 35
132
0
50
100
Number
150
200
10-4
Vertical Bar Chart With
Percentage Labels
60%
45.4%
50%
40%
30%
31.9%
22.7%
20%
10%
0%
Over 50
36 to 50 18 to 35
10-5
Pie Chart With Percentage
Labels
31.9%
22.7%
18 to 35
Over 50
36 to 50
45.4%
10-6
Descriptive Statistical Tools
Scale
Average
Spread
Shape
Nominal
Mode
Ordinal
Mode
Median
Interquartile Range
Data Range
Minimum, Maximum
Interval
& Ratio
Mean
Mode
Median
Standard Deviation Skewness
Interquartile Range Kurtosis
Data Range
Maximum & Minimum
10-7
Choosing an Average
• Mean
• The sum divided by the number
• Inappropriate for highly skewed distributions
• Overly sensitive to extreme values
• Median
• Middle value when arrayed from low to high
• Unaffected by asymmetry or extreme values
• Mode
• Peak of a continuous distribution
• Category with the highest frequency
• Only legitimate average for nominal data
10-8
Measures of Central
Tendency
Mode
Mean
Median
10-9
Spread and Standard
Deviation
• Standard Deviation
• Root mean squared deviation from the mean
• Special properties that make it very useful
• Normal Distributions
• 68% of data are within ± 1 S.D. of the mean
• 95% of data are within ± 2 S.D. of the mean
• 99% of data are within ± 3 S.D. of the mean
10-10
Spread and Standard
Deviation
99% w/i ± 3 S.D.
95% w/i ± 2 S.D.
68% w/i ± 1 S.D.
Mean
10-11
Zero Skewness Indicates
Symmetry
Mean
Median
Mode
10-12
Positive Skewness Leans
Left
Mode Mean
Median
10-13
Negative Skewness Leans
Right
Mean Mode
Median
10-14
Zero Kurtosis Indicates
Normality
Mean
Median
Mode
10-15
Negative Kurtosis: A Low
Peak and High Tails
Mean
Median
Mode
10-16
Positive Kurtosis: A High
Peak and Low Tails
Mean
Median
Mode
10-17
Mean, Median, and Mode
Frequency and Percentage Table
Code
Freq.
1
5
2
10
3
15
4
40
5
30
Total 100
Pct.
Adj.
Cum.
5.0
10.0
15.0
40.0
30.0
100.0
5.0
10.0
15.0
40.0
30.0
100.0
5.0
15.0
30.0
70.0
100.0
Statistics
Mean
3.800 Skewness -0.887
Median
4.000 Kurtosis
0.092
Mode
4.000 Std. dev.
1.128
Number100
Std. err.
0.113
Bar Chart
5
26%
4
42%
3
16%
2
11%
1
5%
0% 10% 20% 30% 40% 50%
50
40
30
20
10
0
Line Plot
1
2
3
4
5
10-18
Averages and Outliers
• This bar chart appears at a glance to show a
symmetrical distribution. In fact, there is radical
asymmetry resulting from 5 outliers with values of 50.
One
Two
Mean
5.66
Median 4
Mode
4
Three
Four
Five
Six
Fifty
0
5
10
15
20
25
30
10-19
Outliers Correctly Shown
• This more clear representation of the distribution
makes the radical asymmetry very obvious.
30
25
20
15
10
5
0
1
6
11
16
21
26
31
36
41
46
51
10-20
Standard Normal
Distribution
2.5%
13.5%
34%
34%
13.5%
2.5%
Mean
Normal Amount of Data to the Left
and Right of the Mean
10-21
Positively Skewed
Distribution
0.0%
9.5%
47%
33%
7.5%
0.3%
Mean
More Data to the Left than to the
Right of the Mean
10-22
Distribution with Positive
Kurtosis
1.5%
8.0%
40.5%
40.5%
8.0%
1.5%
Mean
More Toward the Center than in the
Tails of the Distribution
10-23
Distribution with Negative
Kurtosis
4%
17%
29%
29%
17%
4%
Mean
More Toward the Center than in the
Tails of the Distribution
10-24
Statistical Inference and
Confidence Intervals
• Objective
• To make inferences about the population based on
the sample
• Sample Statistics
• Used as estimates of the population parameters
• Estimates Are Imperfect
• The probability of error can be determined
• Confidence Interval
• The range within which the parameter is likely to be
from the sample mean at a given probability
10-25
Statistical Inference and
Confidence Intervals
• Sampling Distribution of Means
• The distribution that would result if samples of a
given size were taken again and again and the
mean of each sample were plotted.
• Standard Error of the Estimate
• The standard deviation of the theoretical sampling
distribution of means.
• Confidence Interval Probabilities
• 68% chance the parameter is within ± 1 S.E
• 95% chance the parameter is within ± 2 S.E.
• 99% chance the parameter is within ± 3 S.E.
10-26
Confidence Interval
Diagram
• Mean = 50
• Standard Error = 5
99% C.I.
95% C.I.
20
30
40
50
60
70
80
10-27
End of
Chapter 10
McGraw-Hill/Irwin
© 2004 by The McGraw-Hill Companies, Inc. All rights reserved.