Survey

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
```Learning about...
Normal Distribution
introduction
• what is distribution?
• the distribution of a data set is the description of how
the data is spread across it’s range.
• Plotted as a frequency graph for ranges of outcome
values, the distribution can look like
•
•
or
or
•
distribution in
context...
Consider a experiment with known outcomes - the
set of possible random outcomes is X.
• If you perform the experiment n times with the same
environmental conditions, the experiments should all
have the same distribution:
the same set of possible outcomes
• (random variables) ≡ X
ie eventhe
if the
recorded
same
p(x) outcomes
vary, the distributions do not
the distribution is the same
What is Normal
Distribution?
Any event can have at least one possible outcome.
A trial is a single event. An experiment consists of
the same trial being performed repeatedly under the
same conditions.
If an experiment is performed with enough trials, the
populations of each possible outcome can be
distributed according to different patterns.
this is a typical
This
is theDistribution.
symmetrical
Poisson
Gaussian,
Normal
Note the or
lack
of
Distribution.
Learn
this!
symmetry. We’re
not studying them.
notice the numbers. We’ll deal with them later
•
How it
works...
The Normal Distribution is characterised by grouped
continuous data.
• The typical graph is a histogram of the populations of
each grouped range of possible outcome values
For example, if we returned
to the era of normstandardised testing for
NCEA, then the distribution
of test scores, as
percentages, would look
something like this:
To pass, you would need a score of 50% or greater.
Notice about 50% of all candidates achieved this
that 50% pass-rate is quite
important.
• For any Normally-Distributed
data, the central peak is the
mean, μ.
• AND
• 50% of all data is <μ;
• which means 50% of all data is
•
a quick bit of
revision...
the standard deviation now comes into its own.
• Recall:
• For a set of continuous data, the mean, μ, is a
measure of central tendancy - it is one value that
represents the peak data value population. The
majority of the data does not equal μ.
• The standard deviation, σ, is analogous to the mean
difference of every data value from μ.
and now, back to the graph and it’s numbers...
features of Normal Distribution...
the x-axis is asymptotic
95% of all data lie
within 2σ of μ
68% of all data lie
within 1σ of μ
99% of all data lie
within 3σ of μ
the peak is the
mean, μ.
50% of the data lie
either side of μ.
the distribution is
summary:
the x-axis is asymptotic
the peak is the mean, μ.
the distribution is symmetrical about μ; 50% of the data lie
either side of μ.
68% of all data lie within 1σ of μ
95% of all data lie within 2σ of μ
99% of all data lie within 3σ of μ
}
these percentages
are rounded
this distribution lets us calculate the probability that any
outcome will be within a specified range of values
Ah, the wonder that is
Z
u
• The Normal Distribution of outcome frequencies is
defined in terms of how many standard deviations either
side of the mean contain a specified range of outcome
values.
• In order to calculate the probability that the outcome of a
random event X will lie within a specified multiple of σ
either side of μ, we use an intermediate Random
X-μ
Variable, Z.
Z=
σ
For this relationship to hold true, for Z, σ = 1 and μ = 0;
hence, for a Normally distributed population, the range
is from -3σ to +3σ
Z, PQR, and You
• Probability calculations using Z give the likelihood that an outcome will
be within a specified multiple of σ from the mean. There are three
models used:
P(t) = the probability that an outcome t
is any value of X up to a defined
multiple of σ beyond μ
P(Z<t) ≡ P(μ<Z<t) + 0.5
Q(t) = the probability that an outcome t
is any value of X between μ and
a defined multiple of σ
P(μ<Z<t)
R(t) = the probability that an outcome t
is any value of X greater than
a defined multiple of σ below μ
P(Z>t)
Solving PQR
Problems
problem
Enterthe
RUN
mode.carefully.
2. Draw
OPTN
a diagram - sketch the Bell Curve,
3. and
F6 use this to identify the problem as P, Q
4. orF3R
5. F6
You could now use the Z probability tables
This gives the F-menu for PQR.
to calculate P, Q or R, or use a Graphic
1. Choose the function (P, Q or R)
Calculator such as the Casio fx-9750G Plus
2. enter the value of t, and EXE.
Calculating Z from Real
Data
The PQR function assumes a perfectly-symmetrical distribution about μ.
Real survey distributions are rarely perfect.
For any set of real data, we can calculate μ and σ, and therefore Z.
For example, if μ=33 and σ=8, then to find P(X<20):
P(X<20) = P(Z <
SO... use
X
μ
Z= σ
to calculate Z, and
then use PQR.
20-μ
)
σ
[ ]
[ ]
20-33
= P(Z <
)
8
= P(Z < -1.625) Now, use the R function, and
subtract the result from 1.
Inverse Normal
This is the reverse process to finding the probability.
Given the probability that an event’s outcome will lie
within a defined range, we can rearrange the Z
equation to give
X=Zσ+μ
But...
k
we cannot define X, as it represents the entire
range of values of all possible outcomes.
What the equation will give us is the value k.
is the upper or lower limit of the range of X that is
included in the P calculations
an
• with
if X is a normally-distributed variable
example...
• is σ=4,
k? μ=25, and p(X<k) = 0.982, what
The
long
way...
using
a
graphic
calculator,
for
example
Use the PQR model, and sketch a bell
the
trusty
Casio
fx9750G;
curve to identify the regions being
included in the p range.
•Use
MODE:
STATS
the ND table to find the value of Z:
•ZF5range
➜ F1
➜
F3
is from -1 to +1, so find
•0.982
Area =- 0.5
probability,
as
a
decimal
= 0.482
σ=
Gives
μ = Z = 2.097
EXECUTE
So,
k = 4 x 2.097 + 25 = 33.388
or, the short way...
```
Related documents