Download Ch15a

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Statistical Evaluation
• Statistics are tools for summarizing data
– Descriptive: Simple facts
– Inferential: What do the numbers mean?
Populations vs. Samples
• Population: Every individual that
meets the criteria
• Sample: The individuals you actually
measured
Populations vs. Samples
• Statistic: A summary value that
describes a sample
• Parameter: A summary value that
describes a population
Each statistic has a corresponding parameter.
Inferential statistics predict parameters
Descriptive Statistics
• Frequency Distributions (Table or Graph)
– Discrete categories
– Number of individuals in each one
30
Age Group
Participants
25
14
20-29
12
30-39
22
40-49
24
50-59
6
5
60 and Up
3
0
Participants
Under 20
20
15
10
Under 20
20-29
30-39
40-49
Age Group
50-59
60 and Up
Frequency Distributions
30
30
25
25
Participants
Participants
• Histogram vs. Polygon
20
15
10
20
15
10
5
5
0
0
Under 20
20-29
30-39
40-49
Age Group
50-59
60 and Up
Under 20
20-29
30-39
40-49
Age Group
50-59
60 and Up
Frequency Distributions
• Bar Graph
– Use when data categories are not numerical
– Leave space between bars
30
Participants
25
20
15
10
5
0
English
Math
Philosophy
Physics
Major
Psych
Undeclared
Frequency Distributions
• A good first step after data collection
• Seldom presented in a final report
Measures of Central Tendency
• Get a single score that identifies the
center of your distribution
– Mean: Mathematical average
– Median: Splits data in half
– Mode: Most Common Value
IQs of Freshmen Class
12
Participants
10
8
Mean: 132
Median: 133
Mode: 127 (3)
6
4
2
0
100-109
110-119
120-129
130-139
IQ
140-149
150-159
Choosing a Measure of C.T.
• Mean: Commonly used and reader
often assumes a normal distribution
12
Participants
10
8
Mean: 132
Median: 133
Mode: 127 (3)
6
4
2
0
100-109
110-119
120-129
130-139
IQ
140-149
150-159
Measures of Central Tendency
• Median: Useful when a few values distort the mean
Household
Income (k)
Household
Income (k)
1
22
1
22
2
32
2
32
3
40
3
40
4
46
4
46
5
48
5
48
6
51
6
51
7
56
7
2357
Mean
42.1
Mean
Median
46.0
Median
370.9
46.0
Measures of Central Tendency
• Mode: Values are non-numerical
– Favorite New England vacation spot
State
Responses
Maine
5
Vermont
7
New Hampshire
25
Connecticut
4
Rhode Island
6
Massachusetts
2
Measures of Central Tendency
• Mode: Bimodal (or multimodal) Distributions
16
14
Number
12
10
8
6
4
2
0
0-9
10-19
20-29
30-39
40-49
50-59
Age Group
60-69
70-79
80-89
Variability
• A measure of the range or spread of the
scores
35
30
PA
NJ
25
20
15
10
5
0
0-9
10-19
20-29
30-39
40-49
Age Group
50-59
60-69
70-79
80-89
Variability
Variance: A measure of variability determined by
1.
2.
3.
4.
Computing the mean
Determining each value’s distance from the mean
Square each distance
Take the average of the squared distances*
Standard Deviation: The square root of the variance
SD is a measure of how much the scores scatter
around the mean.
Correlations
• Used to measure the direction and degree of a
relationship
– Create a scatter plot
– Determine Pearson Correlation Coefficient (r)
8
6
GPA
Hours Sleeping
10
4
2
r = 0.48
0
0
2
4
Hours Studying
6
8
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
r = -0.92
0
2
4
Hours of TV
6
8
Hypothesis Testing
• Null Hypothesis (H0) There is no
difference between two populations
– Differences in sample averages reflect
expected sampling error
• Alternative Hypothesis (H1): There is a
difference
– Differences in sample averages reflect a
true difference in the populations
Sample Hypothesis
• School District A is accused of age
discrimination when hiring new faculty
– They favor older teachers
• Test the Hypothesis: Are the new teachers at
School District A older, on average, than
teachers at School District B?
• Ave. for A: 28.2 Ave. for B: 26.5
Hypothesis Testing
• Standard Error: A measure of how close
your sample values (means etc.) are likely to
be to the population values
• Test Statistic: A mathematical technique
that determines the “strength” of your test
statistic based on Standard Error
Test Stat. = Sample Stat/S.E.
Level of Significance
• Each test statistic will consider the
acceptable risk of a Type I error (α) and
compare it to the actual risk (p)
How likely is it that a difference in sample means
reflects a true difference in population means?
α = 0.05 (sometimes 0.01)
Types of Error
You Claim That a
Difference Exists
(Reject Ho )
You Claim That no
Difference Exists
(Accept Ho )
A Difference Exists
in the Population
Correct!
Type II Error
No Difference
Exists
Type I Error
Correct!
Sample Case
• Test the Hypothesis: Are the new teachers at
School District A older, on average, than
teachers at School District B?
Ave. for A: 28.2 Ave. for B: 26.5
Result will depend on number of teachers sampled,
range and standard deviation of the teachers’ ages
Inferrential Statistics
• The t-test for comparing means
• Simplified version of t
t = (M1-M2)/(Standard Deviations)
Large values of t are associated with
small values of p (lower risk of Type I
Error)
Related documents