Download Lecture 13 notes for October 10, 2008

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia, lookup

History of statistics wikipedia, lookup

Probability wikipedia, lookup

Lecture 13, Wednesday, October 10, 2008
Probability is the chance that an event will occur. It is a decimal number between 0 and 1
If probability = 0, there is no chance the event will occur.
If probability = 1, it is certain that the event will occur.
Probabilities may also be expressed in percent format, 0% to 100%
A random phenomena, or random process, like a coin toss, is a process where nobody can
predict what will occur on the next trial, but the long range result is predictable.
Probability is the proportion of times that the event will occur in a very long series of
trials. It may take thousands of trials for the proportion to converge on the true value for
the random process.
The sample space is the set of all possible outcomes.
An even is an outcome or a set of outcomes in a random process.
A probability model is a listing of the sample space and the probability for each outcome.
Four probability rules:
1. The probability of an any event is always between 0 and 1 inclusive.
2. The probability of all the events in the sample space must sum to 1.0
3. Two events are disjoint if they have no outcomes in common and therefore they
can never occur together.
In this case, Prob (A or B) = Prob(A) + Prob(B)
4. For any event, Prob(A does not occur) = 1 – Prob(A)
A random variable is a variable whose value is a numerical outcome of a random process.
A PARAMETER is a number which describes some characteristic of a POPULATION.
is a number which describes some characteristic of a SAMPLE.
We usually do not know the actual value of a parameter because populations are usually
too large to include every member in a sample and determine the value. We usually have
to estimate the value of parameters.
A statistic is calculated from actual data obtained by taking a sample from the population.
We use the statistic to estimate the value of the parameter.
Examples: The sample mean, xbar, is used to estimate the population mean, mu.
The sample proportion, phat, is used to estimate the population proportion, p.
LAW OF LARGE NUMBERS says that the sample statistic, xbar, approaches the
population parameter, mu, closer and closer as the sample size increases.
This is similar to the way probability approaches the true value when the series of trials
becomes very long.
Every statistic such as the sample mean or the or the sample proportion has a pattern of
variation. Because each sample is different, the value of the sample statistic will
fluctuate. The sampling distribution of a statistic is the distribution of values taken
by the statistic in all possible samples of the same size from the same population.
The sample mean, xbar, will form a distribution of values whose mean = population
mean, and whose standard deviation = population standard deviation / square root of n.
The shape of the XBAR distribution will be normally distributed if the samples are taken
from a normal distribution.
The shape of the XBAR distribution will be approximately normally distributed if the
samples are taken from a non-normal distribution if the sample size is large. Large is not
defined in the book, but most books say that n=30 is large enough. We will use 40.
Figure 11.4 on page 282 shows a right skewed distribution of earned income in top panel.
The second panel shows the distribution of XBAR’s when n=100.
The third panel shows the second panel with the horizontal scale expanded. It shows a
distribution which quite symmetric and very close to a normal distribution.
Figure 11.5 on page 283 shows schematically how the distribution of XBAR approaches
a symmetric distribution as the sample size changes from 2 to 10 to 25.
Sample means are less variable than individual observations.
Sample means are more normally distributed than individual observations.
The statistic XBAR has a distribution which is centered on the population
mean, mu. Therefore XBAR is an unbiased estimator of the population
The variation of the statistic XBAR is always less than the variation of
individual observations.
The larger the sample size, n, the smaller the variation of XBAR becomes.
Regardless of the shape of the population, the sample mean will be a
normal distribution, centered on the population mean if the sample size is
at least 40, and the larger the better.