Download Calculating the probability of

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Statistics Final Exam Quick Notes
Exam Layout and Content
Question 1: Threshold 5 (10 marks)
● Calculating the probability of a sampling distribution of sample means
● Calculating the standard error of the sampling distribution of sample mean for:
finite population
infinite population
Question 2: Threshold 6 (10 marks)
● Calculating/constructing the confidence Interval using zdistribution
● Interpreting a confidence Interval
● Hypothesis testing when the population standard deviation is known
Questions 3 and 4: any material from lectures 9, 10, 11
● Lecture 9: Sampling and Sampling distribution (Threshold 5)
● Lecture 10: Confidence Interval (Threshold 6)
● Lecture 11: Hypothesis Testing (Threshold 6)
Notes
Lecture 9: Sampling and Sampling Distribution (T5)
Introduction
● A sample consists of individual observations that are random draws from a population
● A sample mean is a guess about the true population mean
● But, based on one sample, how accurately can you estimate the population parameter
(i.e. population mean)?
● A sampling distribution of a statistic tells us how close the statistic (e.g. the sample
mean) is to the parameter (e.g. population mean)
● Sampling error: the difference between the value computed from a sample statistic
and the corresponding value for population. Can be minimised by using larger sample.
Central Limit Theorem
● The sampling distribution of the sample mean is approximately normal if at least one of
these conditions is met:
1. The original (population) distribution is normal
2. The sample size (n) ≥ 30 (regardless of the original distribution)
The Sampling Distribution of the Sample Mean
● DEFINITION: The sampling distribution of the mean is the probability distribution of
sample means, with all samples having the same samples size n taken from the same
population.
● It is characterised by:
* Note
xbar space – average of averages
x space – is a normal sample
Standard Error (SE)
● Is the standard deviation of all possible sample mean
● The higher the standard error, the lower the accuracy
Standard Error Formulas
* Note:
is the finite correction factor (FCF)
* Note: if the population standard deviation is unknown, we replace σ with s
Standardising the Sampling Distribution (using Zscore)
and Finding Probabilities
● We can standardise the sampling distribution of the sample mean using zscore:
● We can now use the Z table to find probabilities.
Sampling Distribution of the Sample Proportion
● Let X be the number of times a particular outcome (success) occurs in n repeated
trials
● To estimate the population proportion of success, p, we use the sample proportion:
Sampling Distribution of the Sample Proportion
● If both np ≥ 5 and nq ≥ 5, then the sample size is large.
● Therefore sampling distribution of proportions can be approximated by a normal
probability distribution:
Lecture 10: Confidence Interval (T6)
Introduction
● Confidence intervals are constructed to provide an estimate of how close the sample
mean is to the population mean after accounting for a certain margin of error.
● The information about whether the population standard deviation is known or unknown
is crucial to understanding whether we use z or t distribution to calculate z/t critical
value and then use it to calculate the confidence interval.
Point Estimate
● The sample mean, a single value, is referred to as the point estimate.
Confidence Interval (CI)
● To make statements about unknown population parameters (e.g. population mean)
with greater accuracy/confidence, we can develop an interval estimator
● An interval estimator draws inference about a population by estimating the value of the
unknown population parameter using an interval.
● This interval is called the Confidence Interval (CI)
* Further Explanation
● Different samples taken from a population will give different means (xbar)
● So instead we give an interval instead of a specific point
● That interval says: the real (i.e. population) mean lies somewhere in the interval
● We are trying the capture the pop. mean in the interval. The bigger the interval, the
more likely the pop. mean will be in that interval
Constructing a Confidence Interval (standard deviation is known)
Confidence interval (CI) = point estimate (xbar) (critical value) ± X (standard error)
Confidence interval (CI) =
* Note: (critical value) X (standard error) is the margin of error (E)
Constructing a Confidence Interval (standard deviation is unknown)
● In most sampling situations the population standard deviation is unknown.
● Instead of population standard deviation we use sample standard deviation, s
● Instead of z distribution we use t distribution to obtain the critical value (t critical value)
and then the confidence interval.
Confidence interval (CI) =
Where n1 = degrees of freedom (df)
Commonly Used Confidence Levels
Determining the Appropriate Sample Size
● Formula:
● E is the margin of error
● Always ROUNDUP
e.g. 47.000000256 → round to 48
Confidence Interval of Population Proportion
● Population proportion: for qualitative variables the pop. proportion is a parameter of
interest.
Determining the Appropriate Sample Size (Population Proportion)
When to Use the Z or T Distribution for Confidence Interval Computation
Lecture 11: Hypothesis Testing (T6)
Introduction
● The purpose of hypothesis testing is to determine whether there is enough statistical
evidence in favour of a certain belief about a population parameter
● e.g. is there statistical evidence in a random sample of potential customers that
supports the belief that consumers spend more than $20 on an online purchase? You
have been employed to test the belief…
Summary of Steps in Hypothesis Testing
● STEP 1: formulate the hypothesis
● STEP 2: determine alpha ( ), the level α of significance
● STEP 3: determine the standardized test statistic
● STEP 4: determine the critical value
● STEP 5: write the decision rule and draw a conclusion
STEP 1: Set the null hypothesis (H0) and alternate hypothesis (HA)
● Claim always goes to HA
● Equal sign always goes to HO (≤, ≥, =)
● We are testing population parameter (e.g. μ)
● Choose from 3 scenarios:
Identifying the Rejection Region
● Rejection region = range of values such that if the test statistic falls into that range, we
REJECT the NULL HYPOTHESIS in favour of alternate hypothesis
● The rejection regions are the shaded regions on the above diagrams.
STEP 2: determine alpha ( ), the level α of significance
● Type Errors:
○ Type 1 error: occurs when we reject a true null hypothesis (i.e. reject H0, but H0 is true)
○ Type 2 error: occurs when we don’t reject a false null hypothesis (i.e. don’t reject H0, but
H0 is false)
● The probability of a type 1 error is denoted as α (alpha)
● P(making type 1 error) = α
● α is called the level of significance
● α = 1%, 5%, 10% are frequently used in practice
●1α
→ the confidence level
5% significance level
→ α = 5% = 0.05
→ confidence level = 1 0.05
= 0.95
10% significance level
→ α = 10% = 0.10
→ confidence level = 1 0.1
= 0.90
STEP 3: determine the standardized test statistic
STEP 4: determine the critical value
● σ is known: Z crit → use Z table, you need α
● σ is unknown: t crit → use t table, you need α and df (df = n1)
Common critical values
STEP 5: write the decision rule and draw a conclusion
● Making decision requires comparison between teststatistic
and critical value
LOWER TAIL: if test stat < critical value → reject H0, otherwise don’t reject
UPPER TAIL: if test stat > critical value → reject H0, otherwise don’t reject
TWO TAILED: if test stat < – critical value OR test stat > + critical value → reject H0,
otherwise don’t reject
Hypothesis Testing: P Value Method
● The p value of a test is the minimum level of significance that is required to reject the
null hypothesis.
PValue Creation
For a 2-sided test
,
p-value is calculated as
we reject HO
we do not reject HO
Example