Download Statistics - An Introduction

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Statistics
An Introduction
1-1
Learning Objectives
1. Define Statistics
2. Describe the Uses of Statistics
3. Distinguish Descriptive & Inferential
Statistics
4.
Define Population, Sample, Parameter,
& Statistic
5.
Identify data types
1-2
What is Statistics?
The practice (science?) of data analysis
Summarizing data and drawing inferences
about the larger population from which
it was drawn
1-3
Statistical Methods
Statistical
Methods
Descriptive
Statistics
1-4
Inferential
Statistics
Descriptive Statistics
1.
Involves



2.
Collecting Data
Presenting Data
Characterizing
Data
Purpose

Describe Data
1-5
50
$
25
0
Q1
Q2
Q3
Q4
X = 30.5 S2 = 113
Inferential Statistics
1.
Involves


2.
Estimation
Hypothesis
Testing
Purpose

Make Decisions About
Population Based on
Sample Characteristics
1-6
Population?
Key Terms
1. Population (Universe)

All Items of Interest
2. Sample

Portion of Population
• P in Population
& Parameter
• S in Sample
& Statistic
3. Parameter

Summary Measure about Population
4. Statistic

1-7
Summary Measure about Sample
Data Types
Quantitative


Discrete
Continuous
Qualitative


Nominal (categorical)
Ordinal (rank ordered categories)
1-8
Sampling
Representative sample

Same characteristics as the population
Random sample

Every subset of the population has an
equal chance of being selected
1-9
Review
Descriptive vs. Inferential Statistics
Vocabulary




Population
(Random, representative) sample
Parameter
Statistic
Data types
1 - 10
Methods for Describing Data
1 - 11
Learning Objectives
1.
2.
3.
4.
5.
Describe Qualitative Data Graphically
Describe Numerical Data Graphically
Create & Interpret Graphical Displays
Explain Numerical Data Properties
Describe Summary Measures
6. Analyze Numerical Data Using Summary
Measures
1 - 12
Data Presentation
Data
Presentation
Qualitative
Data
Summary
Table
Bar
Chart
1 - 13
Pie
Chart
Numerical
Data
Stem-&-Leaf
Display
Dot
Chart
Frequency
Distribution
Histogram
Presenting
Qualitative Data
1 - 14
Data Presentation
Data
Presentation
Qualitative
Data
Summary
Table
Bar
Chart
1 - 15
Pie
Chart
Numerical
Data
Stem-&-Leaf
Display
Dot
Chart
Frequency
Distribution
Histogram
Student Specializations
Specialization |
Freq.
Percent
Cum.
---------------+---------------------------------HCI
|
9
39.13
39.13
IEMP
|
9
39.13
78.26
LIS
|
3
13.04
91.30
Undecided
|
2
8.70
100.00
---------------+---------------------------------Total
|
23
100.00
1 - 16
Student Specializations
10
9
8
7
6
5
HCI
4
LIS
3
Undecided
IEMP
2
1
0
HCI
1 - 17
IEMP
LIS
Undecided
Undergrad Majors
UG major |
Freq.
Percent
Cum.
--------------------------+----------------------------------American Studies |
1
4.76
4.76
Cog Sci |
1
4.76
9.52
Comp Sci |
3
14.29
23.81
Economics |
3
14.29
38.10
English |
5
23.81
61.90
Environmental Engineering |
1
4.76
66.67
Graphic Design |
1
4.76
71.43
Math |
2
9.52
80.95
Mechanical Engineering |
1
4.76
85.71
Nutrition |
1
4.76
90.48
Sci and Tech Policy |
1
4.76
95.24
Telecommunications |
1
4.76
100.00
--------------------------+----------------------------------Total |
21
100.00
1 - 18
Favorite Colors
color |
Freq.
Percent
Cum.
------------+----------------------------------black |
2
8.70
8.70
blue |
12
52.17
60.87
green |
1
4.35
65.22
orange |
1
4.35
69.57
purple |
1
4.35
73.91
red |
5
21.74
95.65
white |
1
4.35
100.00
------------+----------------------------------Total |
23
100.00
1 - 19
Calculus Knowledge
integrals |
Freq.
Percent
Cum.
------------+----------------------------------1 |
3
13.04
13.04
2 |
1
4.35
17.39
3 |
11
47.83
65.22
4 |
6
26.09
91.30
5 |
2
8.70
100.00
------------+----------------------------------Total |
23
100.00
1 - 20
Presenting
Numerical Data
1 - 21
Data Presentation
Data
Presentation
Qualitative
Data
Summary
Table
Bar
Chart
1 - 22
Pie
Chart
Numerical
Data
Stem-&-Leaf
Display
Dot
Chart
Frequency
Distribution
Histogram
Student Age (Reported)
Data
Stem-and-leaf plot for age
2*
3*
4*
5*
6*
7*
| 22233444555777899
| 01257
|
|
|
| 6
1 - 23
6
4
0
2
Frequency
8
10
Histogram
20
30
40
50
age
1 - 24
60
70
Starting Salaries (in $K)
3*
4*
5*
6*
7*
8*
1 - 25
|
|
|
|
|
|
8
000025
0000
0000005
5
0
Numerical Data Properties
1 - 26
Thinking Challenge
$400,000
$70,000
$50,000
$30,000
... employees cite low pay -most workers earn only
$20,000.
$20,000
... President claims average
pay is $70,000!
1 - 27
Standard Notation
Measure
Mean
Stand. Dev.
Sample
Population
x

s

Variance
s
Size
n
1 - 28
2

2
N
Numerical Data Properties
Central Tendency
(Location)
Variation
(Dispersion)
Shape
1 - 29
Numerical Data
Properties & Measures
Numerical Data
Properties
Central
Tendency
Variation
Shape
Mean
Range
Median
Interquartile Range
Mode
Variance
Skew
Standard Deviation
1 - 30
Central Tendency
1 - 31
Numerical Data
Properties & Measures
Numerical Data
Properties
Central
Tendency
Variation
Shape
Mean
Range
Median
Interquartile Range
Mode
Variance
Skew
Standard Deviation
1 - 32
What’s wrong with this?
Measurements 1 4 2 9 8
Middle measurement is 2, so that’s the
median
 X i X1  X 2    X n
X  i 1

n
n
1 4  2  9  8

5
 24 / 5
1 - 33
 4.8
Ages
Mean = 29
Median = 27
2*
3*
4*
5*
6*
7*
| 22233444555777899
| 01257
|
|
|
| 6
1 - 34
Summary of
Central Tendency Measures
Measure
Equation
Mean
Xi / n
Median
(n+1) Position
2
Mode
none
1 - 35
Description
Balance Point
Middle Value
When Ordered
Most Frequent
Shape
1 - 36
Numerical Data
Properties & Measures
Numerical Data
Properties
Central
Tendency
Variation
Shape
Mean
Range
Median
Interquartile Range
Mode
Variance
Skew
Standard Deviation
1 - 37
Shape
1.
Describes How Data Are Distributed
2.
Measures of Shape

Skew = Symmetry
Left-Skewed
Mean Median Mode
1 - 38
Symmetric
Mean = Median = Mode
Right-Skewed
Mode Median Mean
Variation
1 - 39
Numerical Data
Properties & Measures
Numerical Data
Properties
Central
Tendency
Variation
Shape
Mean
Range
Median
Interquartile Range
Mode
Variance
Skew
Standard Deviation
1 - 40
Quartiles
1. Measure of Noncentral Tendency
2. Split Ordered Data into 4 Quarters
25%
25%
Q1
25%
Q2
3. Position of i-th Quartile
25%
Q3
i  (n  1)
Positioning Point of Qi 
4
1 - 41
Ages
Range
Quartiles
2*
3*
4*
5*
6*
7*
| 22233444555777899
| 01257
|
|
|
| 6
1 - 42
Box Plots - Age and Salary
Quartiles: 41K, 50K, 60K
Inner fences: ??
Outer fences: ??
50,000
1 - 43
40,000
20
40
60,000
60
70,000
80
80,000
Quartiles: 24, 27, 30
Inner fences: (15,39)
Outer fences: (6, 48)
Variance &
Standard Deviation
1. Measures of Dispersion
2. Most Common Measures
3. Consider How Data Are Distributed
4. Show Variation About Mean (X or )
X = 8.3
4 6
1 - 44
8 10 12
Sample Variance
Formula
n
S 
2
 (X i  X)
i 1
n 1
2

2
n - 1 in denominator!
(Use N if Population
Variance)
2
(X1  X)  (X 2  X)  ... (Xn  X)
1 - 45
n 1
2
Equivalent Formula
n
 xi  x 
s  i 1
2
2
n 1
n

2
2
 xi  2 xi x  x
 i 1

n 1
2
2
2
2
 xi    2 xi x    x
 xi  2 x  xi  n x


n 1
2
2
 xi  2 xn x   n x


n 1
1 - 46
n 1
2
2
 xi  n x
n 1
Another Equivalent
Formula
2
2
2  xi  n x
s 
n 1


1 - 47
 xi 

2
 xi  n

 n 
n 1
2
x
 i 
 xi 2
n 1
n
2
Empirical Rule
If x has a “symmetric, mound-shaped”
distribution
Pr  xi       32%
Pr  xi    2   5%
Pr  xi    3   0.3%
Justification: Known properties of the “normal”
distribution, to be studied later in the course
1 - 48
Preview of Statistical
Inference
You observe one data point
Make hypothesis about mean and standard
deviation from which it was drawn
Empirical Rule tells you how (un)likely the data
point is

If very unlikely, you are suspicious of the
hypothesis about mean and standard deviation,
and reject it
1 - 49
Summary of
Variation Measures
Measure
Range
Interquartile Range
Equation
Xlargest - Xsmallest Total Spread
Q3 - Q1
Standard Deviation
(Sample)
 X
Standard Deviation
(Population)
 Xi
Variance
(Sample)
1 - 50
Description
i
 X
Spread of Middle 50%
2
n 1
 X 
2
Dispersion about
Sample Mean
Dispersion about
Population Mean
N
(Xi -X )2
n-1
Squared Dispersion
about Sample Mean
Z-scores
Number of standard deviations from the
mean
xi  
zi 

1 - 51
Conclusion
1.
2.
3.
4.
5.
Described Qualitative Data Graphically
Described Numerical Data Graphically
Created & Interpreted Graphical Displays
Explained Numerical Data Properties
Described Summary Measures
6. Analyzed Numerical Data Using
Summary Measures
1 - 52
Related documents