Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics Final Exam Quick Notes Exam Layout and Content Question 1: Threshold 5 (10 marks) ● Calculating the probability of a sampling distribution of sample means ● Calculating the standard error of the sampling distribution of sample mean for: finite population infinite population Question 2: Threshold 6 (10 marks) ● Calculating/constructing the confidence Interval using zdistribution ● Interpreting a confidence Interval ● Hypothesis testing when the population standard deviation is known Questions 3 and 4: any material from lectures 9, 10, 11 ● Lecture 9: Sampling and Sampling distribution (Threshold 5) ● Lecture 10: Confidence Interval (Threshold 6) ● Lecture 11: Hypothesis Testing (Threshold 6) Notes Lecture 9: Sampling and Sampling Distribution (T5) Introduction ● A sample consists of individual observations that are random draws from a population ● A sample mean is a guess about the true population mean ● But, based on one sample, how accurately can you estimate the population parameter (i.e. population mean)? ● A sampling distribution of a statistic tells us how close the statistic (e.g. the sample mean) is to the parameter (e.g. population mean) ● Sampling error: the difference between the value computed from a sample statistic and the corresponding value for population. Can be minimised by using larger sample. Central Limit Theorem ● The sampling distribution of the sample mean is approximately normal if at least one of these conditions is met: 1. The original (population) distribution is normal 2. The sample size (n) ≥ 30 (regardless of the original distribution) The Sampling Distribution of the Sample Mean ● DEFINITION: The sampling distribution of the mean is the probability distribution of sample means, with all samples having the same samples size n taken from the same population. ● It is characterised by: * Note xbar space – average of averages x space – is a normal sample Standard Error (SE) ● Is the standard deviation of all possible sample mean ● The higher the standard error, the lower the accuracy Standard Error Formulas * Note: is the finite correction factor (FCF) * Note: if the population standard deviation is unknown, we replace σ with s Standardising the Sampling Distribution (using Zscore) and Finding Probabilities ● We can standardise the sampling distribution of the sample mean using zscore: ● We can now use the Z table to find probabilities. Sampling Distribution of the Sample Proportion ● Let X be the number of times a particular outcome (success) occurs in n repeated trials ● To estimate the population proportion of success, p, we use the sample proportion: Sampling Distribution of the Sample Proportion ● If both np ≥ 5 and nq ≥ 5, then the sample size is large. ● Therefore sampling distribution of proportions can be approximated by a normal probability distribution: Lecture 10: Confidence Interval (T6) Introduction ● Confidence intervals are constructed to provide an estimate of how close the sample mean is to the population mean after accounting for a certain margin of error. ● The information about whether the population standard deviation is known or unknown is crucial to understanding whether we use z or t distribution to calculate z/t critical value and then use it to calculate the confidence interval. Point Estimate ● The sample mean, a single value, is referred to as the point estimate. Confidence Interval (CI) ● To make statements about unknown population parameters (e.g. population mean) with greater accuracy/confidence, we can develop an interval estimator ● An interval estimator draws inference about a population by estimating the value of the unknown population parameter using an interval. ● This interval is called the Confidence Interval (CI) * Further Explanation ● Different samples taken from a population will give different means (xbar) ● So instead we give an interval instead of a specific point ● That interval says: the real (i.e. population) mean lies somewhere in the interval ● We are trying the capture the pop. mean in the interval. The bigger the interval, the more likely the pop. mean will be in that interval Constructing a Confidence Interval (standard deviation is known) Confidence interval (CI) = point estimate (xbar) (critical value) ± X (standard error) Confidence interval (CI) = * Note: (critical value) X (standard error) is the margin of error (E) Constructing a Confidence Interval (standard deviation is unknown) ● In most sampling situations the population standard deviation is unknown. ● Instead of population standard deviation we use sample standard deviation, s ● Instead of z distribution we use t distribution to obtain the critical value (t critical value) and then the confidence interval. Confidence interval (CI) = Where n1 = degrees of freedom (df) Commonly Used Confidence Levels Determining the Appropriate Sample Size ● Formula: ● E is the margin of error ● Always ROUNDUP e.g. 47.000000256 → round to 48 Confidence Interval of Population Proportion ● Population proportion: for qualitative variables the pop. proportion is a parameter of interest. Determining the Appropriate Sample Size (Population Proportion) When to Use the Z or T Distribution for Confidence Interval Computation Lecture 11: Hypothesis Testing (T6) Introduction ● The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favour of a certain belief about a population parameter ● e.g. is there statistical evidence in a random sample of potential customers that supports the belief that consumers spend more than $20 on an online purchase? You have been employed to test the belief… Summary of Steps in Hypothesis Testing ● STEP 1: formulate the hypothesis ● STEP 2: determine alpha ( ), the level α of significance ● STEP 3: determine the standardized test statistic ● STEP 4: determine the critical value ● STEP 5: write the decision rule and draw a conclusion STEP 1: Set the null hypothesis (H0) and alternate hypothesis (HA) ● Claim always goes to HA ● Equal sign always goes to HO (≤, ≥, =) ● We are testing population parameter (e.g. μ) ● Choose from 3 scenarios: Identifying the Rejection Region ● Rejection region = range of values such that if the test statistic falls into that range, we REJECT the NULL HYPOTHESIS in favour of alternate hypothesis ● The rejection regions are the shaded regions on the above diagrams. STEP 2: determine alpha ( ), the level α of significance ● Type Errors: ○ Type 1 error: occurs when we reject a true null hypothesis (i.e. reject H0, but H0 is true) ○ Type 2 error: occurs when we don’t reject a false null hypothesis (i.e. don’t reject H0, but H0 is false) ● The probability of a type 1 error is denoted as α (alpha) ● P(making type 1 error) = α ● α is called the level of significance ● α = 1%, 5%, 10% are frequently used in practice ●1α → the confidence level 5% significance level → α = 5% = 0.05 → confidence level = 1 0.05 = 0.95 10% significance level → α = 10% = 0.10 → confidence level = 1 0.1 = 0.90 STEP 3: determine the standardized test statistic STEP 4: determine the critical value ● σ is known: Z crit → use Z table, you need α ● σ is unknown: t crit → use t table, you need α and df (df = n1) Common critical values STEP 5: write the decision rule and draw a conclusion ● Making decision requires comparison between teststatistic and critical value LOWER TAIL: if test stat < critical value → reject H0, otherwise don’t reject UPPER TAIL: if test stat > critical value → reject H0, otherwise don’t reject TWO TAILED: if test stat < – critical value OR test stat > + critical value → reject H0, otherwise don’t reject Hypothesis Testing: P Value Method ● The p value of a test is the minimum level of significance that is required to reject the null hypothesis. PValue Creation For a 2-sided test , p-value is calculated as we reject HO we do not reject HO Example