Download summary

SUMMARY • Z-distribution • Central limit theorem Statistical inference If we can’t conduct a census, we collect data from the sample of a population. Goal: make conclusions about that population Confidence interval 𝑠 for 𝑛 ≥ 30: 𝑥 ± 𝑍 × 𝑛 critical value kritická hodnota margin of error možná odchylka 𝑠 for 𝑛 < 30: 𝑥 ± 𝑡𝑛−1 × 𝑛 What a confidence interval does tell us? • When we say "we are 95% confident that the true value of the parameter is in our confidence interval", we express that 95% of the observed confidence intervals will hold the true value of the parameter. • After a sample is taken, the population parameter is either in the interval made or not. It is not a matter of chance! • A confidence interval does not predict that the true value of the parameter has a particular probability of being in the confidence interval given the data actually obtained. neco × • Just to summarize, the margin of error depends on the confidence level (common is 95%) 2. the sample size 𝑛 1. • • 3. the variability of the data (i.e. on σ) • • • as the sample size increases, the margin of error decreases For the bigger sample we have a smaller interval for which we’re pretty sure the true population lies. more variability increases the margin of error Margin of error does not measure anything else than chance variation. It doesn’t measure any bias or errors that happen during the proces. • It does not tell anything about the correctness of your data!!! 𝑠 𝑛 HYPOTHESIS TESTING Aim of hypothesis testing • decision making Engagement ratio • Hopefully, you like this course so far. • How to measure this? number of minutes awake Engagement ratio = total minutes avilable Engagement distribution 𝑁 = 100 𝜇 = 0.077 σ = 0.107 𝒏 = 𝟑𝟎 𝒙 = 𝟎. 𝟏𝟑 0.0 0.5 1.0 𝑁 = 100, 𝜇 = 0.077, σ = 0.107 • If all students attended to my lesson with singing, what is our point estimate for the engagement ratio? • 0.13, number of minutes awake = 100 × 0.13 = 13 min • Interval estimate • What is the standard error of the mean that we would to use to compare this sample mean (0.13) with the means of other samples of the same size? • 𝑆𝐸 = 0.107 30 = 0.019 95% 0.077 0.13 95% 95% M 0.13 95% M 95% 0.13 M 𝑛 = 30, 𝑥 = 0.13, σ = 0.107 Confidence interval 𝑠 𝑥±𝑍× 𝑛 • But in our case we know the population standard deviation 𝜎 = 0.107. So instead of the sample 𝑠 we can use the population 𝜎. • What is a 95% confidence interval in our case? 0.13 ± 1.96 × 0.107 30 ⟹ 0.090 … 0.170 Confidence inteval • Our interval estimate for 95% confidence interval has a lower bound of 0.090 and an upper bound of 0.170. • Remember, what these numbers mean. This is the ratio of the “minutes awake” during a lesson to the “total minutes available in a lesson” (which is 100 minutes). Engagement ratio ER = number of minutes awake 100 • So we’re predicting if I incorporate this musical lesson then the entire population of 100 students will be awaken between 9 minutes and 17 minutes. • Without my song the population of 100 students was awaken 7.7 minutes. • Thus we’re pretty sure a singing works, it will keep you awaken at least 9 min, but possibly up to 17 min. Self-assessment • The engagement ratio may not be perfect, it may have few flaws. • For example a fact you’re not sleeping does not necessarily mean you’re engaged. • Another option – you can self-report how engaged you think you’re (at the scale between 1 and 10). • And you can also self-assess how much you think you learnt (at the scale between 1 and 10). Results Measure of “Engagement” Measure of “Learning” • 𝜇𝐸 = 7.5, 𝜎𝐸 = 0.64 • 𝜇𝐿 = 8.2, 𝜎𝐿 = 0.73 • 𝑥𝐸 = 8.94 • 𝑥𝐿 = 8.35 Quiz • Ultimately we want to know if incorporating a song about the concepts in the lesson will lead to higher engagement and learning. What statistics should we calculate to determine this? 1. 2. 3. 4. Note if the sample means are less than or greater than a population mean. Calculate the actual difference between each sample mean and population mean. Find where each sample mean falls on the distribution of sample means for their respective populations. Find how many 𝜎s each sample mean is from the population mean. Measure of “Engagement” Measure of “Learning” • 𝜇𝐸 = 7.5, 𝜎𝐸 = 0.64 • 𝜇𝐿 = 8.2, 𝜎𝐿 = 0.73 • 𝑛 = 30 • 𝑛 = 30 • 𝑀𝐸 = ? • 𝑀𝐿 = ? • 𝑆𝐸𝐸 = ? • 𝑆𝐸𝐿 = ? 𝒁𝑬 = 𝟖. 𝟗𝟒 − 𝟕. 𝟓 𝟎. 𝟔𝟒 𝟑𝟎 = 𝟏𝟐. 𝟑𝟐 Measure of “Engagement” 𝒁𝑳 = 𝟖. 𝟑𝟓 − 𝟖. 𝟐 𝟎. 𝟕𝟑 𝟑𝟎 = 𝟏. 𝟏𝟐 Measure of “Learning” • 𝜇𝐸 = 7.5, 𝜎𝐸 = 0.64 • 𝜇𝐿 = 8.2, 𝜎𝐿 = 0.73 • 𝑛 = 30 • 𝑛 = 30 𝒙𝑬 = 𝟖. 𝟗𝟒 • 𝑀𝐸 = 7.5 • 𝑆𝐸𝐸 = 0.64 30 𝒙𝑬 = 𝟖. 𝟑𝟓 • 𝑀𝐿 = 8.2 = 0.12 • 𝑆𝐸𝐿 = 0.73 30 = 0.13 Probability of getting a given mean • 𝑍𝐸 = 12.32, 𝑍𝐿 = 1.12 • What is the probability of randomly selecting a sample of size 30 and getting a mean at least 8.94 for an engagement and 8.35 for a learning? • For 𝑍𝐸 the probability is really low. • For 𝑍𝐿 the probabilty is 1.0 − 0.8686 ~ 0.13. • So what does this mean, what can we conclude? Check all what applies. Conclusions? 1. The song seems to have had an effect on learning, but not engagement. 2. The song seems to have had an effect on engagement, but not learning. 3. The song caused and increase in bot engagement and learning. 4. The song caused an increase in engagement, but not in learning. Summary of our findings Dependent variable (scale from 1 to 10) Sample mean 𝒙 (n=30) engagement 8.94 𝑝 ≪ 0.01 learning 8.35 𝑝 ≈ 0.13 Probability Likely or unlikely? Summary of our findings Dependent variable (scale from 1 to 10) Sample mean 𝒙 (n=30) engagement ??? 𝑝 = 0.05 learning ??? 𝑝 = 0.10 Probability Likely or unlikely? Levels of likelihood - 𝛼 levels 0.05 (5%) 0.01 (1%) 0.001 (0.1%) • Three conventional levels of (un)likelihood • If the probability of getting a sample mean is less than 0.05 – 0.01 – 0.001 then it is usually considered unlikely. • These are called the 𝜶 levels. Or significance levels (hladiny významnosti). • 𝛼 level is our criteria for deciding if something is likely or unlikely. Quiz • Focus on 𝛼 = 0.05 Sampling distribution Z* • Which of the following are true? 1. If the probability of getting a particular sample mean is less than 𝛼, it is unlikely to occur. 2. If a sample mean has a Z-score greater than Z*, it is “unlikely” to occur. 3. If the probability of getting a particular sample mean is “unlikely”, the sample mean is in he orange region. 4. The alpha level corresponds to the orange region. Z-critical value If the probability of obtaining a particular sample mean is less than alpha level then it will fall in this tail which is called the critical region. Z* Z-critical value If the Z-score of the sample mean is greater than the Zcritical value we have an evidence that this mean is different from the regular population (the population that had not watched the musical lesson). Critical regions • What is the Z-critical value for 𝛼 = 0.05? • Using Z-table you find Z-value for 0.95 probability. Which is 1.65. • What is the Z-critical value for 𝛼 = 0.01? • 2.33 • What is the Z-critical value for 𝛼 = 0.001? • 3.08 • We take a sample mean from a sample size 𝑛. • Then we calculate its Z-score 𝑥−𝜇 𝑍=𝜎 𝑛 • And we get a Z-score of 1.82. • We say that this is significant at 𝑝 < 0.05. • 1.82 is somewhere in the red region at the previous picture. It is less than 0.05, but not less than 0.01. • It means that a probability of obtaining this sample mean is less than 5%, but is not less than 1%. • And remember, 0.05 is the alpha level. Significance quiz Z-score 3.14 2.07 2.57 14.31 Significant at p< p< p< p< 𝛼 level Z-critical value 0.05 1.65 0.01 2.32 0.001 3.08 Significance quiz Z-score 3.14 2.07 2.57 14.31 Significant at p < 0.001 p < 0.05 p < 0.01 p < 0.001 𝛼 level Z-critical value 0.05 1.65 0.01 2.32 0.001 3.08

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download summary