Download n-1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
The normal distribution
•  is called a location parameter because it indicates where the graph is
centered or positioned.
• σ determines the shape of f.
Standard normal distribution
If a distribution is normal, one may know the percentiles of the data; given a
normal distribution, 68.25%, 94.45% and 99.7% of the data will fall between
+/- one, two and three standard deviation from the mean!
Standard normal distribution
For each different  and , we will have a different distribution. But, if
each distribution depends on its  and , how can we compare different
distributions? We would have to make a table for each and every
possible  and . To solve this problem, the normal distribution is
converted to a standard normal distribution: a normal distribution with
a mean of 0 and a standard deviation of 1.
Let X be a normal random variable Z with mean , and standard
deviation . The transformation
express X as the standard normal random variable with  = 0 and  = 1
Standard normal distribution
Suppose that the scores on an aptitude test are normally distributed with a
mean of 100 and a standard deviation10. (Some of the original IQ tests were
purported to have these parameters.) What is the probability that a randomly
selected score is below 90?
Standard normal distribution
= (90 - 100) / 10 = -1.0
Thus a score of 90 can be represented as 1 standard deviation below the mean.
P(X < 90) = P (Z < -1.0)
Table C. 3 catalogues the CDF for the standard normal distribution from Z =
-3.99 to Z = 3.99 in increments of 0.01.
P(X < 90) = P (Z < -1.0) = 0.1587 (from Table C. 3 we use the row marked -1.0
and the column marked 0.00)
Sampling distributions
三大抽样分布:
t 分布
χ2分布(卡方分布)
F 分布
Distribution of the sample mean
When sampling from a normally distributed population with mean
and
variance
, the distribution of the sample mean (sample distribution) will
have the following attributes:
1. The distribution of
’s will be normal.
2.
3.
is the population standard error of the mean
Distribution of the sample mean
The mean blood cholesterol concentration of a large population of adult
males (50-60 years old) is 200 mg/dl with a standard deviation of 20 mg/dl.
Assume that blood cholesterol measurements are normally distributed.
What is the probability that a random selected individual from this age
group will have a blood cholesterol level below 250 mg/dl?
Solution. Apply the standard normal transformation.
P (X < 250) = P (Z <
) = P (Z < 2.5) = F (2.5) = 0.9938 (Table C.3)
Distribution of the sample mean
What is the probability that a random selected individual from this age
group will have a blood cholesterol level above 225 mg/dl?
Solution. Apply the standard normal transformation.
P (X > 225) = P (Z >
) = P (Z > 1.25) = 1-F (1.25) = 1-0.8944 = 0.1056
Distribution of the sample mean
What is the probability that the mean of a sample of 100 men from this age
will have a value below 204 mg/dl?
Solution.
= 200 mg/dl
= 2.0 mg/dl
P(
< 204) = P (Z <
) = P (Z < 2.0) = F (2.0) = 0.9772
Distribution of the sample mean
If a group of 25 older men who are strict vegetarians have a mean blood
cholesterol level of 188 mg/dl, would you say that vegetarianism significantly
lowers blood cholesterol levels? Explain.
Solution.
= -3.0
P(
< 188) = P (Z ≤ -3.0) = F (-3.0) = 0.0013 (Table C.3)
Diet may affect blood cholesterol levels.
Distribution of the sample mean
Portions of prepared luncheon meats should have pH values with a mean of
5.6 and a standard deviation of 1.0. The usual quality control procedure is to
randomly sample each consignment of meat by testing the pH value of 25
portions. The consignment is rejected if the pH value of the sample mean
exceeds 6.0. What is the probability of a consignment being rejected?
Solution.
= 2.0
P(
> 6.0) = P (Z > 2.0) =1- F (2.0) = 1-0.9772 = 0.0228
Only 2.28% of the consignment will be rejected using the quality control
procedure above.
Q: 5%?
Distribution of the sample mean
population standard error
sample standard error
Z(u) distribution and t distribution
Z=
is distribution as a normal distribution (Z distribution) with  =
0 and  = 1
is distribution as a t distribution with  = 0 and  depending on the
sample size.
The t distributions are symmetric and bell-shaped like the normal distribution but a
little flatter, i.e., they have a larger standard deviation. The degrees of freedom is just
the sample size minus 1: df = n-1 for any t distribution.
t-distribution & Student’s t-test
t - distribution was first presented by W. S. Gosset, who published it
under the pseudonym “Student” (1908), hence the common
reference to “Student’s t distribution” or “Student’s t test” P. 383
t0
Example: one-tailed test for the hypotheses H0:   0 and HA: < 0
The data are weight changes of human, tabulated after administration of a
drug proposed to result in weight loss. Each weight change (in kg) is the
weight after minus the weight before drug administration
0.2
-0.5
-1.3
-1.6
-0.7
0.4
-0.1
0.0
-0.6
-1.1
-1.2
-0.8
n = 12
Mean = -0.61 kg
Variance (s2) = 0.4008 kg2
s = 0.63 kg
t 0.05 (1), 11 = 1.796
If t  -t 0.05 (1), 11, reject H0
Conclusion: reject H0 and accept HA
Review
Distribution of the sample mean
If a group of 25 older men who are strict vegetarians have a mean blood
cholesterol level of 188 mg/dl, would you say that vegetarianism significantly
lowers blood cholesterol levels? Explain.
Solution.
= -3.0
P(
< 188) = P (Z ≤ -3.0) = F (-3.0) = 0.0013 (Table C.3)
Diet may affect blood cholesterol levels.
Review
Z distribution and t distribution
Z=
is distribution as a normal distribution (Z distribution) with  =
0 and  = 1
is distribution as a t distribution with  = 0 and  depending on the
sample size.
The t distributions are symmetric and bell-shaped like the normal distribution but a
little flatter, i.e., they have a larger standard deviation. The degrees of freedom is just
the sample size minus 1: df = n-1 for any t distribution.
Z distribution and t distribution
Chi-square distribution
 正态离差U服从平均数为0,标准差为1的正
态分布。假定由该总体中随机抽取样本,样
本容量为n,样本值为u1,u2,…,un。
 则随机变量
χ2 = u12 + u22 + … + un2
Chi-square distribution
χ2 = u12 + u22 + … + un2 = (n-1)s2/ 2
If all possible samples of size n are drawn from a normal
population with a variance equal to  2 and for each of these
samples the value (n-1)s2/ 2 is computed, this values will form a
sampling distribution called a χ2 with n-1 degrees of freedom.
The Greek letter “chi” or χ, is pronounced as the “ky” in “sky”.
The degree of freedom for the chi-square distribution are often
denoted by v [nju:].
Chi-square distribution
Chi-square (x) goodness of fit
Chi-square goodness of fit is widely used to infer whether the population from
which a sample of nominal data came conforms to a certain theoretical
distribution.
e.g., a plant geneticist may raise 100
progeny from a cross that is hypothesized
to result a 3:1 phenotypic ratio of pinkflowered to white-flowered. Perhaps a ratio
of 84 pink: 16 white is observed, although
out of this total of 100 roses, the geneticist’s
hypothesis would predict a ratio of 75 pink:
25 white. The question to be answered,
then, is whether the observed frequencies
deviate significantly from the frequencies
expected if the hypothesis were true
Chi-square (x) goodness of fit
The following calculation of a statistic called chi-square is used as a measure of
how far a sample distribution deviate from a theoretical distribution
Here, Oi is the frequency, or number of counts, observed in class i, Ei is
the frequency expected in class i if the null hypothesis is true, and the
summation is performed over all k categories of data.
Larger disagreement between observed and expected frequencies will
results in a larger x2 value. Thus, this type of calculation is referred to
as a measure of goodness of fit. A calculated x2 value can be as small as
zero, in the case of perfect fit.
Example:
Chi-square goodness of fit for two categories
Calculation of chi-square goodness of fit for k = 2 (e.g., data consisting of 100 flower
colors to a hypothesized color ratio of 3: 1)
H0: The sample data came from a population having a 3: 1 ratio of pink to white flowers
HA: The sample data came from a population not having a 3: 1 flower color ratio
Categories (flower color)
Pink
White
n
Oi
84
16
100
(Ei )
(75)
(25)
degree of freedom =  = k – 1 = 2 – 1 = 1
= (84 – 75)2/75 + (16 – 25)2/25 = 4.320
0.025 < P < 0.05. Therefore, reject H0 and accept HA
Example:
Chi-square goodness of fit for more than two categories
Calculation of chi-square goodness of fit for k = 4
H0: The sample from a population having a 9: 3: 3: 1 color pattern of flowers
HA: The sample from a population not having a 9: 3: 3: 1 color pattern of flowers
Categories (flower color)
Red rayed Red margined Blizzard Rayed
Red margined
Oi
152
(Ei ) (140.6)
Red rayed
Rayed
Blizzard
39
53
(46.9)
6
n
250
(46.9) (15.6)
=k–1=4–1=3
= 8.956
0.025 < P < 0.05. Therefore, reject H0 and accept HA
Chi-square correction for continuity
Chi-square values obtained from actual data belonging to discrete or
discontinuous distribution. However, the theoretical x2 distribution is a
continuous distribution. x2 values calculated obtained from discrete data (
= 1 in particular) are often overestimated and may therefore cause us to
commit the Type I error with a probability greater than the stated .
The Yates correction (see below) should routinely be used when  = 1
The log-likelihood ratio (G-test)
The x2 test is the traditional method for tests of GOF. The G-test is an alternative
to the x2 test for analyzing frequencies. The two methods are interchangeable.
The G-test is increasingly used because: it is easier to calculate; mathematicians
believe it has theoretical advantages in advanced applications
G = 2  O ln (O/E) (ln = natural logarithm)
The G-test statistic (G) uses the same tables as the x2 test.
The G-test is based on the principle that the ratios of two probabilities can be
used as a test statistic to measure the degree of agreement between sampled and
expected frequencies.
Williams (1976) recommends G be used in preference to x2 whenever any
> expected frequency
The two methods often yield the same conclusions; when they do not, many
statiscians prefer G test and therefore recommend its routine use
Example: G-test for more than two categories
H0: The sample from a population having a 9: 3: 3: 1 color pattern of flowers
HA: The sample from a population not having a 9: 3: 3: 1 color pattern of flowers
Categories (flower color)
Red rayed Red margined Blizzard Rayed
Red margined
Oi
152
(Ei ) (140.6)
Red rayed
Rayed
Blizzard
39
(46.9)
53
6
n
250
(46.9) (15.6)
=k–1=4–1=3
G = 2  O ln (O/E) = 10.807
0.010 < P < 0.025. Therefore, reject H0 and accept HA
F distribution

在概率论和统计学里,F-分布(F-distribution)是
一种连续概率分布,被广泛应用于似然比率检验,
特别是ANOVA中。 一个F-分布的随机变量是两个
卡方分布变量的比率:
U1和U2呈卡方分布,它们的自由度分别是d1和d2。
 U1和U2是相互独立的。

F distribution
F
= (S12/ 1 2 ) / (S22/ 2 2)
Related documents