Download Central Tendency and Dispersion

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Central Tendency and Dispersion
Ungrouped Data -- population
n
X
i

i 1
N
n
 
2
Spring 02
(X
i 1
i
 )
2
N
1
Central Tendency and Dispersion
Ungrouped Data -- sample
n
X
i
X
i 1
N
n
s 
2
Spring 02
(X
i 1
i
 X)
2
N 1
2
Central Tendency and Dispersion
Other measures of central tendency
Median
 Mode

Relating central tendency and dispersion

Coefficient of variation

V

Spring 02
3
Problem
Salvatore 2.1 – quiz scores – ungrouped data
7
5
6
2
8
7
6
7
48
Spring 02
3
9
10
4
5
5
4
6
46
7
4
8
2
3
5
6
7
42
9
8
2
4
7
9
4
6
7
8
3
6
7
9
10
5
49
55
240
4
Problem
Salvatore 2.1 – quiz scores –grouped data
2
2
2
3
3
3
4
4
Spring 02
4
4
4
5
5
5
5
5
6
6
6
6
6
6
7
7
7
7
7
7
7
7
8
8
8
8
9
9
9
9
10
10
5
Problem
Salvatore 2.1 – quiz scores –grouped data
X
2
3
4
5
6
7
8
9
10
Spring 02
f
3
3
5
5
6
8
4
4
2
f*X
6
9
20
25
36
56
32
36
20
240
6
Central Tendency and Dispersion
Grouped Data -- population
n

fX
i
i
i 1
N
n
 
2
Spring 02
 f (X
i 1
i
i
 )
2
N
7
Central Tendency and Dispersion
Grouped Data -- sample
n
X
fX
i 1
i
i
N
n
s 
2
Spring 02
 f (X
i 1
i
i
 X)
2
N 1
8
Problem
Salvatore 2.1 – quiz scores –grouped data
X
2
3
4
5
6
7
8
9
10
Spring 02
f
3
3
5
5
6
8
4
4
2
X-Xmean (X-Xmean)^2 f*(X-Xmean)^2
-4
16
48
-3
9
27
-2
4
20
-1
1
5
0
0
0
1
1
8
2
4
16
3
9
36
4
16
32
Sum
Variance
Stdev
192
4.8
2.19
9
Probability
If event A can occur in nA ways out of a total of N
possible and equally likely outcomes, the probability
that event A will occur is given by:
P(A) = nA/N
What is the probability of flipping a fair coin and
getting a head? a tail?
What is the probability of rolling a four with a fair die?
What is the probability that a women who is pregnant
will have a boy?
Spring 02
10
Probability of Multiple Events
Probability of either of two events happening

Mutually exclusive (one event precludes the
occurrence of another)
What is the probability that when rolling two dice, I roll a
seven or eleven?
 P(A or B) = P(A) + P(B)


Not mutually exclusive (one event does not preclude
the occurrence of the other)
What is the probability that a flipped coin is heads or a
rolled die is a 4?
 P(A or B) = P(A) + P(B) – P(A and B)

Spring 02
11
Probability of Multiple Events
Probability of two independent events
happening at the same time.

Independent events (event A is not at all connected
to event B)
What is the probability that if I flip a coin twice, I will get
heads both times?
 P(A and B) = P(A)*P(B)


Dependent events (event A is connected in some
way the event of the other)
What is the probability that drawing one card from a deck
of 52 playing cards, I draw the ace of hearts?
 P(P and B) = P(A)*P(B/A)

Spring 02
12
Binominal Probability
Discrete vs. continuous distributions
The probability of X number of occurrences or
successes of an event, P(X), in n trials of the
same experiment when
1.
2.
3.
Spring 02
There are only two possible and mutually
exclusive outcomes.
The n trials are independent
The probability of success remains constant in
each trial.
13
Binomial Probability
The probability of X successes is given by:
n!
x
n X
P( X ) 
p (1  p)
X !(n  X )!
Spring 02
14
Binomial Probability
What is the probability that in a family of four
children, there are four boys?
P(Boy and Boy and Boy and Boy) = ½* ½ * ½
* ½ = 1/16 = .0625
According to the binomial formula:
4!
4
0
4
P( X  4) 
(.5) (.5)  .5  .0625
4!*0!
Spring 02
15
Normal Distribution
Continuous random variable is one that can assume an
infinite number of values within any given interval.
The probability that X falls within any interval is given
by the area under the probability distribution.
The normal distribution (bell shaped and symmetric) is
the most commonly used.
Spring 02
16
Standard Normal Distribution
Standard normal distribution is a normal
distribution with a mean of zero and a standard
deviation of one.
P(-1<z<+1) = 68%
 P(-2<z<+2) = 95%
 P(-3<z<+3) = 99%

http://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html
http://www.math.csusb.edu/faculty/stanton/m262/normal_distribution/normal
_distribution.html
Spring 02
17
Problems
If grades are normally distributed with a mean of 75 an
a variance of 25, what is the probability that a student’s
grade will fall between 80 and 90?
If the demand for ice cream cones in January is
normally distributed with a mean of 100 cones/week
with a standard deviation of 15,
what is the probability that demand is less than 90?
 If the owner only wishes to turn customers away less than
5% of the time, how many cones should she be prepared
to make?

Spring 02
18
Statistical Inference
Refers to estimation and hypothesis testing.
Estimation is the process of inferring and
estimating a population parameter (, ) from
the corresponding statistic drawn from a sample
Spring 02
19
Sampling Distribution of the Mean
X  
X 

n
or


n
N n
N 1
http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/
Central Limit Theorem

Spring 02
As n approaches infinity, the sampling distribution of the
sample mean approaches the normal distribution regardless
of the distribution of the original population.
20
Interval Estimates/Confidence Intervals
If n > 30 and n > .05N
s
 X z
n
N n
N 1
If n > 30 and n < .05N
s
 X z
n
Spring 02
21
Interval Estimates/Confidence Intervals
If n < 30 and population is normally distributed
s
  X t
n
If n < 30 and population is not normally distributed

Spring 02
the probability of observations falling within k standard
deviations of the mean is at least
1-1/k2
given that k > or equal to 1.
22
Problems
A random sample of 49 with a mean of 80 and a
standard deviation of 42 is taken from a
population of 1000. Find the interval estimate
for the population mean such that we are 80
percent confident that it includes the population
mean.
A random sample of 64 with a mean of 50 and a
standard deviation of 20 is taken from a
population of 800. Find the 90 percent
confidence interval.
Spring 02
23
Problems
A random sample of 9 lighting components with
a mean of 9 months and a standard deviation of
4 months is taken from a population which is
known to be normally distributed.
What is the 90% CI of the population mean?
 What is the 95% CI of the population mean?
 What is the 99% CI of the population mean?
 What if n=25, what is the 90% CI of the population
mean?

Spring 02
24
Testing Hypothesis
Formal steps

Set null and alternate hypotheses
Ho:  = 0
 Ha:   0

Set level of significance
 Take a random sample; compute the sample mean;
test the null hypothesis.

Spring 02
25
Errors
Type I Error: rejecting a true hypothesis
Probability of a Type I Error is 
 Level of significance is 1- 

Type II Error: accepting a false hypothesis

Probability of a Type II Error is 
 can only be reduced at the expense of 
Spring 02
26
Errors
Accept Ho
Reject Ho
Spring 02
Ho is True
Correct
Type I Error
Ho is False
Type II Error
Correct
27
Problems
USP specifies that a certain drug be effective for
at least 37 hours. The standard deviation is
known to be 11 hours. A shipment of this drug
will be accepted or rejected on the basis of a
random sample of 100.

Spring 02
What decision rule should be used if the maximum
probability of erroneously rejecting the shipment is
to be 10% (i.e.,  = .10)?
28
Problems
Over the past 10 years, the Snow Mountain Ski Resort
has averaged 120 skiers/day during the winter season
(130 days) with a standard deviation of 10. In a
random sample of 50 days during the most recent ski
season, the mean number of skiers was 118/day.


Assuming  = 0.05, would you conclude that the average
number of skiers per day has changed?
If the decision rule were as shown below, what would be the
level of significance:

Spring 02
117 < X < 123
29
Differences between Means
Many times one is faced with determining
whether the means of two populations are the
same.
Take a random sample of each population.
You will accept the hypothesis that the means
are equal only if the difference can be attributed
to chance.
Spring 02
30
Differences between Means
If the two populations are normally distributed
(or if n1 and n2  30), then the sampling
distribution of the difference between means is
also normal with the standard error:
X
Spring 02
1X 2


2
1
n1


2
2
n2
31
Difference between Means
One can test for a difference between means as
follows

Null and Alternate Hypothesis
Ho : 1 = 2
 Ha: 1  2


Test statistic is
z
Spring 02
( X 1  X 2 )  ( 1   2 )
X
1X 2

(X 1  X 2)
X
1X 2
32
Problems
The Dairy Fresh Milk Company felt that two of
its markets exhibited equivalent sales patterns
Area
1
2
Mean Cons.
1500
1465
s
140
120
Sample Size
100
150
Justify the proposition at the 2% level of
significance.
Spring 02
33
Problems
A random sample of 100 of the entering firstyear students at a particular college in 2001 has
a mean SAT score of 950 and s=50. In 2000, a
random sample of 100 had a mean SAT score of
975 and s=58.

Spring 02
Are the entering first-year students in 2001
academically better than those in 2000? Estimate
for  = .05.
34
Chi-Squared Test
Goodness of Fit
Tests
If observed frequency differs significantly from
expected frequency when more than two outcomes
are possible
 If sampled distribution is binomial, normal, or other
 If two variables are independent

Sum of squares of N independently distributed
normal random variable N(0,1) is distributed as
Chi-Squared with n degrees of freedom.
Spring 02
35
Chi-Squared Test
( f0  fe )
 
fe
2
2
where
 f0
is the observed frequency
 fe is the expected frequency
 df = c-m-1
c= number of categories
 m=number of population parameters

Spring 02
36
Test of Proportions
There was a survey undertaken in 1994, 1997, and
2001 to determine the number of women with children
in school working:
1994
1997
2001
Working
410
412
409
Not Working
252
176
151
On the basis of this study, should one reject the
hypothesis that the proportion of women who worked
has remained constant during the study? =0.05
Spring 02
37
Tests of Goodness of Fit
Suppose the Department of Defense believes that the
probability distribution of the number of submarine
parts of a certain type that will fail during a mission is
as shown below and the data for 500 missions is
observed as follows:
Number of
Failures
0
1
2
3
4 or more

Spring 02
Theoretical
Probability
Observed
Frequency
.368
.368
.184
.061
.019
190
180
90
30
10
What is the probability that the data follows the expected
distribution? =0.05
38
Analysis of Variance
Used to test the hypothesis that the means of
more than two populations are equal or different
when the populations are normally distributed
with equal variance.
Spring 02
39
Analysis of Variance
Estimate the population variance from the
variance beween the same means (MSA)
Estimate the population variance from the
variance within the samples (MSE)
Compute the F ratio:
F  MSA
MSE
If F>Fcrit, reject the null hypothesis
If F<Fcrit, accept the null hypothesis
Spring 02
40
Analysis of Variance
Source of
Variation
Between
groups
Within
groups
Total
Sum of Squares
Df
SSA   ( X j  X ) 2
c-1
SSE  
SST  
(X
(X
ij
 X j )2
(r-1)c
 X )2 
rc-1
ij
MS
MSA 
SSA
c 1
F
MSA
MSE
SSE
MSE 
(r  1)c
SSA  SSE
Spring 02
41
ANOVA Problem
A thread manufacturer wants to determine whether the mean
strength of thread produced by 3 different types of machines are
different when raw material A is used on each machine. Four
pieces of thread are produced on each machine with the
following results
I
II
III
50
51
51
52
41
40
39
40
49
47
45
47
Test whether the mean strength of thread is equal at =.05
Spring 02
42