Download Population Mean

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Inductive probability wikipedia , lookup

Foundations of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Law of large numbers wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Population Mean
Mean = (sum of the values / the number of the value) if probabilities are equal
Compute the population mean
Population/Sample mean:
1. Collect the data
2. sum all the values in the population/sample.
3. divide the sum by the number of elements in the population/sample.
Median
The median is a center value that divides a sorted list of data into two halves.
Data Array
Data that have been arranged in numerical order.
Mode
Is the value in a data set that occurs most frequently.
Percentile location value
i = (P/100) (n+1)
p desired percentile
n number of values in the data set.
The pth percentile in a data array is a value that divides the data set into two parts. The
lower segment contains at least p%, and the upper segment contains at least (100 – p)%,
of the data. The 50th percentile is the median.
Box and Whisker plots
1. sort the data values from low to high
2. find the 25th percentile ( first quartile), 50th percentile (median), 75th percentile
3. draw a box so that the ends of the box at Q1 and Q3, This box wil contain the middle
50% of the data values in the population or sample
4. Draw a vertical line through the box at the median. Half the data values in the box will
be on either side of the median.
5. Calculate the interquartile range (IQR = Q3 – Q1).
Compute the lower limit for the box and whisker plot as Q1 – 1.5(Q3-Q1) and upper
limit Q3 + 1.5(Q3 – Q1). Any data values outside these limits are referred to as outliers.
6. extend dashed lines(call the whiskers) from each end to the box to the lowest (on the
left) and highest value (on the right) within the limits.
7. any value outside the limits (outlier) found in 5 is marked with an asterisk(*).
Range
R = Maximum value – minimum value
Interquartile Range
IQR = Q3 – Q1
Variance
The population variance is the average of the squared distances of the data values from
the mean.
The sample variance is the “average” (divide by n-1 instead n) of the squared distances
of the data values from the mean (“residuals”).
Standard Deviation
Positive square root of the variance.
Coefficient of Variation
CV = (SD/mean) 100
Standardized Data Values (Z scores)
1. compute the population mean and SD or the sample mean and SD
2. use these formulas:
Z = (x – mean) / SD
For samples
Z = (x – sample mean) / sample SD
Using Tree Diagram
Independent Events
Two events are independent if the occurrence of one event in no way influences the
probability of the occurrence of the other event.
Probability Rule
P(E1 or E2) = P(E1) + P(E2) – P(E1 and E2)
For two mutually exclusive events
P(E1 or E2) = P(E1) + P(E2)
Conditional probability
P(E1 | E2) = P(E1 and E2) / P(E2)
It reads “probability of Event E1 given event E2 has occurred.”
The sample space is E2 and you find the elements in E1 that are also in E2.
Conditional Probability for Independent Events
P(E1 | E2) = P (E1)
And P(E2 | E1) = P (E2)
Binomial
Use R
Pbinom for if you want to find the probability less than or equal to q,
Size = number of trials and p = probability of a success at each trial.
pbinom( q , number_of_trials, probability of success)
Probability of outcome x, use:
dbinom( x, number_of_trials, probability_of_success )
Expected value for the binomial is = number_of_trials x probability_of_success
Poisson: number of successes when number_of_trials is very large and the probability of
a success is very small.
λ=
number_of_trials x prob_of_success = expected number of successes.
Use R
dpois (x, lambda ) = prob of x if expected value = lambda
Normal distribution
Use R
Pnorm
Sample Error
Sample Error = sample mean – population mean
Std.Error = SD of the Sample error = population SD / square root of n
This is SD of the sampling distribution.
To find probabilities associated with a sampling distribution of xbar for samples of size n
from a population with mean and SD (if population is normal or if n is large)
1. compute the sample mean
2. Define the sampling distribution
Population Mean of Sample mean = population mean
SD of sample mean = SD / square root of n
3. define the event of interest
4. Express in terms of a Z value = (Sample Mean – Pop Mean) / ( SD of sample mean)
and use pnorm to get the probability
Sample proportion
1. “find” p (true probability)
2. find pbar
3. find SD pbar
If we have p: sqrt( p(1-p) / n) [Hypothesis testing]
(6.10)
If only pbar: sqrt( pbar(1-pbar) / n) [Confidence intervals]
4. define the event of interest
5. find the Z value
6. use pnorm
Confidence Interval Calculation
Point estimate +/- (critical Value (Z or T))(Standard Error of Estimate)
Developing a confidence interval estimate for a population proportion
1. define the population of interest and the variable from which to estimate the population
proportion.
2. determine the sample size and select a simple random sample.
3. specify the level of confidence and obtain the critical value from qnorm or qt (in R)
4. calculate the pbar, the sample proportion.
5. construct the interval estimate.
One tailed test for a hypothesis about a population mean, SD known, large samples
1. Specify the population value of interest.
2. Formulate the null hypothesis and the alternative hypothesis in terms of the population
mean.
3. Specify the desired significance level
4. construct the rejection region
5. compute the test statistic.
6. draw the conclusion
© T. Lau 2007