Download Definition of Non-Parametric Statistics:

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Definition of Non-Parametric
Statistics
Non-parametric statistics are a branch
of statistics that are applied when
populations are not normal, or there
are severely skewed data.
Titles of Non-parametric Tests
•
•
•
•
•
One Sample Median Test
Two Sample Location Test
Two Sample Dispersion Test
One-Way Layout
Independence Test
Focus: Median tests
This presentation will cover:
• What median tests are
• Why they are used
• When they are used
• How they are used
30
25
20
% earning
specified
amount
15
10
5
0
$0$1000
$2001$3000
$4001$5000
$6001$7000
What are median tests?
• They are tests similar to the mean tests
covered in a college introduction to
statistics.
• They include confidence intervals, and
significance tests.
When to use a median test:
(as opposed to a mean test)
•When data or population does not fulfill
conditions for mean tests.
•The ONLY condition is a simple random
sample!
Remember these conditions?
•30>n>15 with slight skewness
•N>30
•Or population is normal
They are NOT necessary!
Why do we use median tests?
Because they are more robust!
Medians are more robust than means
SRS of salaries of
Company A:
1
$18,000
8
$35,000
2
$20,000
9
$36,000
3
$23,000
10
$50,000
4
$23,000
11
$50,000
5
$23,000
12
$60,000
6
$28,000
13
$130,000
7
$30,000
14
$1,000,000
•The mean of these salaries is
$109,000
•The median of these salaries is
clearly between #7 and #8, or
$32,500
Just from looking at the list of
salaries, the median seems to
describe the middle of the
distribution much more accurately,
since salary #14 pulls the mean so
far up
More robustness
The rest of the procedure of the median test is
more robust than the t-distribution.
This combination of a robust statistic and
robust procedure allows for statistical
inference on very skewed data.
Confidence Intervals for Medians
The two main types:
•Exact: needs tables and or computer software
•Approximate: simpler tables, appropriate for
larger samples
We will concentrate on the approximations
Introduction to the Confidence Intervals
It is necessary to understand “rank”
The rank of a value in a distribution is simply
its numbered place in the list of ordered values
Example: in the distribution of letters
{a, b, c, d, e, f}
“b” has a rank of 2 from the left, and a rank of 5 from
the right.
Steps for Approximate Confidence Intervals
1. Order the distribution from smallest to largest
values
2. Find the median of the distribution.
3. Find the rank* of each limit depending on the
sample size from a table like the one shown on the
next slide.
4. Take the rank number and count in that many data
points from each side of the ordered data.
* Note that these ranks are computed by complicated formulas, then put neatly
into a table for users, and treated like the definition of rank seen before.
Ranks for non-parametric 95% confidence intervals*
Sample Size
Rank
Sample size
rank
8
1
21
6
9
2
22
6
10
2
23
7
11
2
24
7
12
3
25
8
13
3
26
8
14
3
27
8
15
4
28
9
16
4
29
9
17
5
30
10
18
5
31
10
19
5
32
10
20
6
33
11
* Values taken from Siegel’s Statistics and Data Analysis
Example: Using the
same salary data
from before, with This is the lower
sample size 14 and confidence limit
rank 3, proceed as of the interval
follows
So, the 95%
confidence interval
is ($23000, $60000)
1
2
3
This is the upper
confidence limit
of the interval
3
2
1
1
$18,000
2
$20,000
3
$23,000
4
$23,000
5
$23,000
6
$28,000
7
$30,000
8
$35,000
9
$36,000
10
$50,000
11
$50,000
12
$60,000
13
$130,000
14
$1,000,000
Significance test for medians
Remember duality?
“What is not contained in the
confidence interval is significant at the
same alpha-level.”
This property of confidence intervals
can be used to test for significance.
Steps for Significance Test at alpha=.05
1. Create a confidence interval at this alphalevel.
2. Check to see if the accepted population value
is included in interval.
3. Draw Conclusion:
–
–
If value IS included sample is NOT significant
If value is NOT includedsample IS significant
Sample Significance Test
Assume that the commonly accepted
median of salaries at company A is
$53,000, and that the sample shown
before was drawn.
Test hypotheses
•
Ho: M=$53,000 or that the true
median of salaries in company A is
$53,000.
•
Ha: M≠$53,000 or that the true
median of salaries in company A is
NOT $53,000.
Our previous 95% confidence interval was
($23000, $60000), so:
•the accepted median, $53,000, is within the interval,
•The outcome is not significant,
•We do not reject the accepted median.
Mean Tests VS. Median Tests
Consider a population of children, with a
distribution of the number of toys each
one has.
•True mean Mu of 7.3 toys per child
•True median M of 7 toys per child
2 SRS’s from the Population of Children
# of
children
9
8
7
6
5
4
3
2
1
0
0
2
4
6
8
10
12
14
16
18
20
14
16
18
20
# of toys
9
8
# of
children
7
6
5
4
3
2
1
0
0
2
4
6
8
10
# of toys
12
Both look very
similar. The
only difference
is the movement
of one bar, to be
a far out outlier.
Sample 1: 95% Mean Confidence Interval
Sample 1, with
no outlier
9
8
7
6
5
4
3
2
1
0
0
•
•
•
•
•
(use calculator 1-var stats)
Sample mean x-bar=7.1 toys
Sample standard deviation=1.9877
Sample size n=28
Sigma of x-bar=1.9877/√28=.3756
Z-score z*=1.95996
2
4
6
8
10
12
14
16
18
20
• CI: 7.1+/-(1.95996*.3756): (6.358,
7.842)
Sample 1: 95% Median Confidence Interval
Sample 1, with
no outlier
9
8
7
6
5
4
3
2
1
0
0
•
•
•
•
•
2
4
6
8
10
12
14
16
18
20
Sample median=7 toys
Sample size n=28
Rank (see table) =9
Lower confidence limit=6
Upper confidence limit=7
• CI: (6,
7)
Sample 2: 95% Mean Confidence Interval
9
Sample 2, with
outlier
8
7
6
5
4
3
2
1
0
0
•
•
•
•
•
(use calculator 1-var stats)
Sample mean x-bar=8.4 toys
Sample standard deviation=4.8722
Sample size n=28
Sigma of x-bar=4.8722/√28=.9208
Z-score z*=1.95996
2
4
6
8
10
12
14
16
18
20
• CI: 8.4+/-(1.95996*.9208): (6.595,
10.205)
Sample 2: 95% Median Confidence Interval
9
Sample 2, with
outlier
8
7
6
5
4
3
2
1
0
0
•
•
•
•
•
2
4
6
8
10
12
14
16
18
20
Sample median=7 toys
Sample size n=28
Rank (see table) =9
Lower confidence limit=6
Upper confidence limit=7
• CI: (6,
7)
These statistics match
up EXACTLY with the
median CI for the first
sample. The outlier did
not affect the outcome,
demonstrating the test’s
robustness.
Comparison of different intervals
Median CI (6,7)
Mean CI (6.358, 7.842)
9
8
7
Sample
1
6
5
4
3
2
1
0
0
2
4
6
8
10
12
14
16
18
20
Median CI (6, 7)
Mean CI (6.595, 10.205)
9
8
Sample
2
7
6
5
4
3
2
1
0
0
2
4
6
8
10
12
14
16
18
20
Discussion of differences
• The outlier pulled the mean confidence
interval to be much larger, making it less
useful
• The median interval stayed the same, and
capture the true median very closely (as 7 is
captured from 6 to 7)
Conclusion
When data is skewed, a median test can
be much more useful than a mean test in
estimating the true parameter.