Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia, lookup

History of statistics wikipedia, lookup

Statistics wikipedia, lookup

Transcript
```3/30/2009
Random variable
The outcome of each procedure is determined by chance.
Probability Distributions
Normal Probability Distribution
N
Chapter 6
Discrete Random variables
takes on a countable number of
values
(i.e. there are gaps between
values).
SPECIAL Discrete Random variables
•Binomial distribution (Sections 5.3, 5.4)
•Geometric distribution
•Hypergeometric distribution
•Poisson distribution (Section 5.5)
Continuous Random variables
there are an infinite number of
values the random variable can
take, and they are densely packed
together (i.e. there are no gaps
between values)
SPECIAL Continuous Random
variables
•Normal distribution
•Exponential distribution
•Uniform distribution
2
Binomial distribution
TI-83 Binomial Probability
Fixed number of trials
Press 2nd VARS.
There are only two possible outcomes: success or failure
Select the option 0:binompdf(.
The trials are independent
Complete the entry to obtain binompdf(n, p, x), with the
The probabilities of success and failure are the remain the same
appropriate values substituted in.
Example: recording the genders of children in 250 families.
Example: What is the probability of getting exactly 2 heads
The mean is
Solution: Using the TI-83 with binompdf(4, 0.5, 2), it
µ = np
The standard deviation is
follows that the probability for getting 2 heads on 4 throws is
0.375.
σ = np(1 − p) = npq
3
4
Continuous Random Variables
Poisson distribution
Continuous sample spaces contain an infinite
number of events. They typically are intervals of
possible, continuously-distributed outcomes.
The random variable is the number
of occurrences of some events over an
interval.
Used for describing the behavior of rare events
Ex.: Select ANY number between 0 and 1.
Number of industrial accidents per month in a manufacturing plant.
Number of people arriving at a checkout in a day
What is the sample space?
S = { all numbers between 0 and 1}
Ex.: Drink ANY volume of water from a 32-ounce bottle.
What is the sample space?
S = { 0 – 32 ounce}
Number of eagles nesting in a region
Number of patients arriving at an emergency room
The occurrences must be random and independent of each other,
and uniformly distributed over the interval.
µ , and the standard deviation is σ =
The mean is
5
n
6
1
3/30/2009
Special Continuous Probability
Distributions
Continuous Random Variables
A continuous probability distribution function for a random
variable X is a continuous function with the property that the
area below the graph of the function between any two points a
and b equals the probability that a ≤ X ≤ b.
Uniform
distribution
Remember, AREA = PROPORTION = PROBABILITY
7
Normal
distribution
Exponential
distribution
8
Uniform Distribution
1. Equally Likely Outcomes
1
b−a
a +b
2
σ=
Describes Time or
Distance Between Events
2.
b
a
3.
Normal Distribution
λ = 0.5
X
Parameters
Examples of normal random variables
A and B have the same center, but different standard deviations (shape).
A and C have the same standard deviations (shape), but different means
(shifted).
f ( x) =
λ = 2.0
µ = λ1 , σ = λ1
b−a
12
f(X)
Density Function
f(X)
f ( x ) = λ e − λx
x
Mean
Median
3. Mean & Standard Deviation
µ=
1.
f( x)
1
b−a
2. Probability Density
f ( x) =
Exponential Distribution
testosterone level of male students
length of middle finger of Math 225
 ( x − µ) 2 
1
exp −

2σ 2 

σ 2π
B
students
A
C
test scores in Math 225
height of all kindergarten kids at a
school
X
11
12
2
3/30/2009
Characteristics of normal distribution
Bell-shaped curve
Symmetric, bell-shaped curve.
Shape of curve depends on population mean µ
0.08
Mean = 70 SD = 5
0.07
and standard deviation σ.
Density
0.06
Center of distribution is µ.
Most values fall around the mean, but some
0.05
0.04
Mean = 70 SD = 10
0.03
values are smaller and some are larger.
STANDARD NORMAL DISTRIBUTION:
Mean: µ = 0
Standard deviation: σ =1
0.02
0.01
0.00
40
50
60
70
80
90
100
13
14
Infinite Number of Tables
Probabilities for Normal Distributions
Probability is area under curve!
curve!
Normal distributions differ by mean &
standard deviation.
?
d
f(x)
P(c ≤ x ≤ d) = ∫ f (x) dx
c
c
d
f(X)
Each distribution would
require its own table.
x
X
15
To find probability follow these steps:
Standardize the Normal Distribution
X −µ
Z=
σ
Normal
Distribution
Draw the normal distribution and shade the
area of interest
Find the standardized score (z-score) for the
x−µ
given x.
Standardized Normal
Distribution
z=
σ=1
σ
σ
Find the probability using the z-table or
µ
X
µ= 0
One table!
calculator
Z
18
3
3/30/2009
To find x from given area follow these
steps
TI-83, 84: DISTR 2:normalcdf(
upper-tail:
normalcdf(z,9999)
lower-tail:
normalcdf(-9999,z)
Between part:
normalcdf(z1,z2)
Find the LOWER tail probability INSIDE
the table, and read off the corresponding zscore. OR: use DISTR3:invNorm(
To find x use the formula:
Probability student scores higher than 75?
0.07
0.05
Density
Density
0.06
P(X > 75)
0.04
0.03
0.02
0.08
0.08
0.07
0.07
0.06
0.06
0.05
Density
0.08
0.04
0.03
P(X < 65)
0.02
0.01
0.00
65
70
75
80
x = z ⋅σ + µ
0.00
85
0.03
0.01
0.00
60
P(65 < X < 70)
0.04
0.02
0.01
55
0.05
55
65
75
85
55
60
65
70
75
80
85
19
20
Parameter versus statistic
Example
Sample: the part of
Population: the
entire group of
individuals in which
we are interested
but can’t usually
assess directly.
the population we
actually examine
and for which we do
have data.
The Environmental Protection Agency took soil samples at
A statistic is a
number describing a
characteristic of a
sample. We often
use a statistic to
estimate an
unknown population
parameter.
A parameter is a
number describing
a characteristic of
the population.
Parameters are
usually unknown.
21
20 locations near a former industrial waste dump and
checked each for evidence of toxic chemicals. They found no
elevated levels of any harmful substances.
Population: ALL the soil near the waste dump
Sample: the 20 soil samples
Parameter: mean level of toxic chemicals in the ground
around the waste dump
Statistic: the mean level of toxic chemicals in the 20 soil
samples
22
Notation
Variable of interest:
Quantitative
Then we are interested in
Then we are interested in
PROPORTION
Notation:
Population parameter: p
Sample statistic : p\$
23
Sampling Variability
Variable of interest:
Categorical
When we take many samples, the statistics
from the samples are usually different from
the population figures, and also different
from what we got in the first sample.
This very intuitive idea, that sample results
change from sample to sample, is called
sampling variability.
MEAN
Notation:
Population parameter:
Sample statistic: x
µ
24
4
3/30/2009
Sampling Distributions
1. Parameters are usually unknown,
because it is impractical or impossible to
know exactly what values a variable takes
for every member of the population.
2. Statistics are computed from the
sample, and vary from sample to sample
due to sampling variability.
25
The sampling distribution is
a distribution of a sample
statistic in infinite number
of samples.
26
OK, we have the sampling distribution
of the sample means. Then what?
Sampling distribution of the sample mean, x
Sampling distribution of x
Histogram
of some
sample
averages
27
Sampling distributions,
like data distributions,
are best described by
shape,
center, and
28
Mean and standard error of the sampling
distribution of the sample means
Shape: Many, but not all, sampling
Suppose that
x is the mean of an SRS of size n drawn
from a large population with mean μ and standard
deviation σ. Then the sampling distribution of x has
distributions are approximately normal.
Center: The mean will be denoted by µ with
a subscript to indicate which sampling
distribution is being discussed. For example,
the mean of the sampling distribution of the
mean is represented by the symbol µ X .
(The mean of the sample means.)
Spread: the standard deviation of the
sampling distribution of the sample means
and is σ X
29
mean
µx = µ
and
standard deviation
σx =
σ
n
30
5
3/30/2009
For any population with mean µ and standard deviation σ:
Mean of a sampling distribution of
The mean, or center of the sampling distribution of x , is
equal to the population mean µ.
There is no tendency for a sample mean to fall
The standard deviation of the sampling distribution is σ/√n,
where n is the sample size.
Sampling distribution of
x
systematically above or below µ, even if the
distribution of the raw data is skewed. Thus, the
x
mean of the sampling distribution of x is an
unbiased estimator of the population mean μ —it
σ/√n
31
will be “correct on average” in many samples.
32
µ
Standard error of a sampling distribution of
Generating Sampling Distributions
x
The standard deviation of the sampling
1.
distribution measures how much the sample
2.
statistic x varies from sample to sample. It is
3.
smaller than the standard deviation of the
4.
Take a random sample of a fixed size n from a
population.
Compute the summary statistics (mean, proportion).
Repeat steps 1 and 2 many times.
Display the distribution of the summary statistics.
population by a factor of √n. Averages are
less variable than individual observations.
33
34
Example
The results from the 1000 samples
Extensive studies have found that the DMS odor
1st SRS of size 10:
x = 36, s = 3.2
2nd SRS of size 10:
x = 22.8, s = 2.7
x = 30.4, s = 4.1
threshold of adults follows a roughly normal
distribution with mean µ =25 micrograms per
liter and standard deviation σ =7 micrograms per
liter. With this information, we can simulate many
runs of our study with different subjects drawn at
random from the population. We take 1000
samples of size 10, find the 1000 sample mean
thresholds x , and make a histogram of these
1000 values.
35
3rd SRS of size 10:
M
1000th SRS of size 10:
x = 28.9, s = 2.1
36
6
3/30/2009
The sampling distribution of the statistic
For normally distributed populations
x.
100
When a variable in a population is normally distributed, then
the sampling distribution of x for all possible samples of size
n is also normally distributed.
90
Shape: looks normal.
80
Frequency
70
60
Center: the mean of the 1000
x ‘s is 25.073.
50
40
µ x = 25.073
30
20
10
0
20
25
30
35
C1
The distribution is centered
very close to the population
mean
µ = 25
Spread: the standard error of the 1000
smaller than the standard deviation σ =
x
7
‘s is 2.191, notably
of the population.
37
If the
population is
N(µ,σ), then the
sample means
distribution is
N(µ,σ/√n ).
Population
38
IQ scores: population vs. sample
μ
Application
σ
In a large population of adults, the mean IQ is 112 with standard deviation 16. Suppose 100
adults are randomly selected for a market research campaign.
Hypokalemia is diagnosed when blood potassium levels are low, below 3.5mEq/dl. Let’s assume
that we know a patient whose measured potassium levels vary daily according to a normal
distribution N(µ = 3.8, σ = 0.2).
n
If only one measurement is made, what's the probability that this patient will be misdiagnosed
hypokalemic?
The distribution of the sample mean IQ is
A) exactly normal, mean 112, standard deviation 16.
B) approximately normal, mean 112, standard deviation 16.
z=
C) approximately normal, mean 112 , standard deviation 1.6.
D) approximately normal, mean 112, standard deviation 4 .
z=
Population distribution: N (µ = 112; σ = 16)
Sampling distribution for n = 200 is N (µ = 112; σ /√n = 1.6)
39
40
σ
=
3.5 − 3.8
0.2
z = −1.5, P(z < −1.5) = 0.0668 ≈ 7%
( x − µ ) 3.5 − 3.8
=
σ n
0.2 4
z = −3, P(z < −1.5) = 0.0013 ≈ 0.1%
Note:
Make sure to standardize (z) using the standard deviation for the sampling distribution.
The Central Limit Theorem
VERY IMPORTANT!!!
But…
Not all variables are normally distributed.
When randomly sampling from any
Income is typically strongly skewed for
example.
Is
(x − µ)
If instead measurements are taken on four separate days, what is the
probability of such a misdiagnosis?
C) approximately normal, mean 112, standard deviation 1.6.
population with mean µ and standard
x still a good estimator of µ then?
deviation σ, when n is large enough,
the sampling distribution of
The Central Limit Theorem will rescue
x
is
approximately normal: N(µ, σ/√n).
us!
41
Sample means
42
7
3/30/2009
Central Limit Theorem
The Central Limit Theorem guarantees that a
distribution of sample mean to be approximately
normal as long as the sample size is large
enough.
We will depend on the Central Limit Theorem
again and again in order to take advantage of
normal probability calculations when we use
sample mean to draw conclusions about
population mean, even if the population
distribution is not normal.
43
44
The central limit theorem
There is no requirement on the shape
of the population distribution. This is
where the strength of the Central
Limit Theorem lies. It tells us that
regardless of the shape of the
population distribution, averages that
are based on a large enough sample
will have a normal distribution.
Population with
strongly skewed
distribution
Sampling
distribution of
x for n = 2
observations
Sampling
distribution of
x for n = 10
observations
Sampling
distribution of
for n = 25
observations
x
http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/index.html
46
45
Data from a
Normal distribution
Assessing Normality
Data from a
right-skewed distribution
Data from a
left-skewed distribution
v
A normal probability plot is a graph with the
original set of data on the x-axis, and the
corresponding z scores for each data value on the yaxis.
If the points appear to lie reasonably close to a straight line
and there does not appear to be a systematic pattern that is
not a straight line, we can conclude that the data came from a
normally distributed population.
47
Data from a
Short-tailed distribution
Data from a
Long-tailed distribution
48
8
```