Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics & Probability Descriptive Statistics: descriptive statistics to summarise the data in a sample (Workshop 5) Inferential Statistics: Using the sample statistics to infer a description of the population (Workshop 10) 1 Recap of workshop 5 Important prior knowledge 2 We analysed Right Foot Lengths as an example of continuous data that would be normally distributed 3 Mean 24.6 1 s.d. 20.6 28.6 2 s.d. 16.6 32.6 3 s.d. 144 72% 186 93% 197 98.5% 12.6 36.6 4 After our investigation we confirmed the Empirical Rule for normally distributed data 5 Empirical Rule 68% ο3 ο2 ο1 0 1 2 3 95% 99.7% 6 Empirical Rule π = ππππππππππ ππππ π = ππππππππππ πππππ πππ π ππππππππ 68% ο ο 3ο³ ο ο 2ο³ ο ο 1ο³ ο ο ο« 1ο³ ο ο« 2ο³ ο ο« 3ο³ 95% 99.7% 7 Most Important for Inferential Stats on our Syllabus ο ο 2ο³ ο ο ο« 2ο³ 95% 95% of normal data lies within 2 standard deviations of the mean 8 Pg. 37 Pg. 36 For Honours Leaving Cert use z scores ο1.96 2.5% 0 95% ο«1.96 2.5% 9 Module 5.21 First, build on the informal language we already know. 10 Data is collected from a class on the number of people in each of their households. The above dot plot represents the data. 11 1 student was absent on the day the sample was collected. A 100 euro prize is offered to guess how many are in his family. Which answer would you chose and why? A. 5 B. 8 12 1 student was absent on the day the sample was collected. A 100 euro prize is offered to guess how many are in his family. How confident are you that 5 is the answer? Answer in words. Quite confident Not that confident, but its my best guessβ¦ 13 1 student was absent on the day the sample was collected. A 100 euro prize is offered to guess how many are in his family. In which statement are you more confident? Why? A. The number is 5. B. The number lies in the following interval: 4 β€ number β€ 6 {4,5,6} 14 1 student was absent on the day the sample was collected. A 100 euro prize is offered to guess how many are in his family. In which statement are you more confident? Why? A. The number lies in the following interval: 3 β€ number β€ 7 {3,4,5,6,7} B. The number lies in the following interval: 4 β€ number β€ 6 {4,5,6} 15 1 student was absent on the day the sample was collected. A 100 euro prize is offered to guess how many are in his family. A. The number lies in the following interval: 3 β€ number β€ 7 {3,4,5,6,7} Are you 100% confident in this answer? Why? 16 Mobile Phones You wanted to work out which mobile phone company is the most popular among Irish second level students. You used a sample of 100 students and found that Meteor topped the survey with 42%. What percentage of all second level students in the country use Meteor? (Can we use the sample to talk about the population?) 17 Sample Variation from Workshop 5 18 Important with Inferential Statistics Indeterminate Language when inferring from a sample to a population β¦around β¦tends to β¦approximately Around 42% of students in Ireland use Meteor 19 New Terminology You wanted to work out which mobile phone company is the most popular among Irish second level students. You used a sample of 100 students and found that Meteor topped the survey with 42%. What percentage of all second level students in the country use Meteor? (Can we use the sample to talk about the population?) 20 New Terminology What percentage of all second level students in the country use Meteor? Categorical Data: Yes, I use Meteor No, I donβt use Meteor How can I represent categorical data (yes/no) with a number? 0.42 By using a Sample Proportion: 42% use Meteor (58% do not use Meteor) (1-0.42) = 0.58 π 21 Confidence Words: β Around 42% of students in Ireland use Meteor How confident are you in your statistic? β I am reasonably confident that the percentage of students using meteor is around to 42% 22 Quantify Sample: infer Full Population Words: Iβm fairly confident that around 42% use Meteor LCOL & LCHL Quantify: Use an interval balanced with confidence in that interval I amβ¦ 100% confident that between 0% and 100% use Meteor 50% confident that between 40% and 45% use Meteor 23 Q: Where do you keep your mobile phone at night? A simulation activity β’ The 200 cubes in the bag represent all 200 students in a school. β’ The yellow ones represent the students who kept their mobile phone under their pillow last night when they were asleep. β’ Take a sample of 25 cubes from the bag and see what proportion (percentage) are yellow. 24 My Sample N=25 Yellow Not Yellow Mobile Phone Under pillow Not under pillow How many? 9 16 Proportion % 36% 64% Proportion (dec) 0.36 1-0.36 = 0.64 Inferring to the population (200 students) Can we say that 36% of these 200 students sleep with their phone under their pillow? Why? No - Sample variation 25 If we took another sample of 25β¦ β’ Could you get a sample proportion of 4%? ο How likely is this? β’ Could you get a sample proportion of 40% ο How likely is this? If we retook the sample 300 times where would most of the proportions lie? 26 Empirical Rule: 95% of the sampled proportions (%) will lie within 2 sd of the true population proportion 95% π¨πππππ ππππ ππππππππππ = π. π ππ% from C@S phase 11: Sample of 7150 students Results lie around the true population proportion 27 Confidence Approximately 95% of the sample proportions are between 10% and 50% Therefore I can say, βI am 95% confident that the true proportion lies between 10% and 50%β 28 In Realityβ¦ ο we have 1 sample, not 300 ο we donβt have the true population proportion β this is what we are looking for ο so we create an interval of 2 sd around the proportion we have from our 1 sample as our best estimate. Best estimate π π-2sd% β π Sample% True % ( π =36%) π π + +2sd% π True % (30%) β’ This interval around a sample statistic is called the Margin of Error 29 I am 95% confident 20 different 95% confidence intervals β95% confidence intervalsβ using the margin of error around the sampled proportion (blue dot) β π π + π π 95% of the time, the true population is in the interval I made with my sampled proportion and the margin of error interval. π Population Proportion 30 20 different 95% confidence intervals β95% confidence intervalsβ using the margin of error around the sampled proportion (blue dot) 8:00 am β π π + π Population Proportion π π If I took lots of random samples of this size and created confidence intervals at the 95% level of confidence around the sample proportions, 95% of the intervals would capture the true population proportion. 31 Sample Size and Error Why does n=sample size feature in the formula for building a confidence interval? π π Question for intuitive understanding: What do you think happens the margin of error as the sample size increases? There is a direct relationship between sample size and the margin of error around your statistic 32 Sample Size & Margin of Error As we increase the sample size what happens the margin of error? 1 n 33 Level of Confidence 95% Sample Size 25 64 100 256 400 625 1111 1600 2500 10000 Margin of Error ο±20% ο±12.5% ο±10% ο±6.25% ο±5% ο±4% ο±3% ο±2.5% ο±2% ο±1% 34 Inferential Statistics Real Life Big Question: (Pose a Question) What percentage of all post-primary students keep their phone under their pillow? Letβs answer our question using different sample sizes. 35 Sample Size 25 Letβs say you took a random sample of size 25 from the population of all post-primary students in Ireland and 7 of the 25 keep their mobile phone under their pillow. Sample Proportion = Margin of Error = ± π ππ π ππ × πππ π = ππ% = ±ππ% 0.28±0.2 8% 28% 48% This would mean you are 95% confident that between 8% and 48% of post-primary students keep their phone under their pillow. 36 Sample Size 100 Letβs say you took a random sample of size 100 from the population of all post-primary students in Ireland and 28 of the 100 keep their mobile phone under their pillow. Sample Proportion = Margin of Error = ± ππ πππ π πππ × πππ π = ππ% = ±ππ% 0.28±0.1 18% 28% 38% This would mean you are 95% confident that between 18% and 38% of post-primary students keep their phone under their pillow. 37 Sample Size 1111 Letβs say you took a random sample of size 1111 from the population of all post-primary students in Ireland and 311 of the 1111 keep their mobile phone under their pillow. Sample Proportion = Margin of Error = ± πππ πππ × ππππ π π ππππ = ππ% = ±π% 0.28±0.03 25% 28% 31% This would mean you are 95% confident that between 25% and 31% of post-primary students keep their phone under their pillow. 38 made more accurate using z scores Pg. 37 Pg. 36 LCHL: π π ο1.96 2.5% ο 95% ο«1.96 2.5% 39 Module 5.21 Revisit Margin of Error π -2sd% β π π³πͺπΆπ³ = True % % Sample True % π π π(π β π) π³πͺπ―π³ = ±π. ππ π ππ) π(π β βπ. ππ -2sd% βπ π π +2sd% + π Note: Use p if p is unknown π π(π β π) +π. +2sd% + ππ π π True % % Sample True % 40 Higher Level Letβs say you took a random sample of size 25 from the population of all post-primary students in Ireland and 7 of the 25 keep their mobile phone under their pillow. Sample Proportion = π ππ × πππ π Confidence Limits = ±π. ππ = ππ% .ππ(πβ.ππ) ππ = ±π. πππ = ±ππ. π% 0.28±0.176 10.4% 28% 45.6% This would mean you are 95% confident that between 10.4% and 45.6% of post-primary students keep their phone under their pillow. 41 NCCA Find additional information about Inferential Statistics on the NCCA website under frequently asked questions 42 Recap of workshop 5 In WS 5 we did Hypothesis Testing for Higher Level. Now, Hypothesis Testing is on also for Ordinary Level. Therefore, we will revisit Hypothesis Testing from WS 5. 43 44 Hypothesis Testing Testing claims about a population Null Hypothesis: The Null hypothesis, denoted by H0 is a statement about a population. We assume this statement is true until proven otherwise. 45 A Claim CLAIM (H0) : The new drug relieves migraine 70% of the time. SAMPLE: A newspaper investigates this claim by getting migraine sufferers to try the new drug. They get 100 results that say it relieves migraine 62% of the time. MARGIN OF ERROR: π πππ Fail to Reject = ±ππ% 70% TEST: 52% 62% π 72% 46 A Claim CLAIM (H0) : 30% of second level students in Ireland are 180cm or taller. SAMPLE: Of the 200 students in the C@S sample, 34 are 180cm or taller. Is this sufficient evidence to reject the teacherβs claim, at the 5% level of significance? MARGIN OF ERROR: π πππ Reject = ±π% ππ = ππ% πππ πππππ ππ ππππππ πππ TEST: 10% 30% 17% π 24% 47 On-line β’ Under workshop 5 48 Testing a Claim about a Proportion using a Sample Proportion Go Fast Airlines provides internal flights in Ireland, short haul flights to Europe and long haul flights to America and Asia. Each month the company carries out a survey among 1000 passengers. The company repeatedly advertises that 70% of their customers are satisfied with their overall service. 640 of the sample stated they were satisfied with the overall service. 49 Courtroom Analogy to Teach Formal Language β’ At the start of a trial we assume the defendant is innocent. β’ Then we present the evidence. β’ The verdict is βGuiltyβ or βNot Guilty". (Write this in a real world context) β’ A real world action sometimes needs to be decided upon. Null Hypothesis: The proportion of passengers who are satisfied with the service is 70%. p = 0.7 Alternative Hypothesis: The proportion of passengers who are satisfied with the service is not 70%. pοΉ 0.7 Evidence: Sample Proportion = Margin of Error = 1 n 640 1000 1 = = 0.64 = 64% 1000 = 0.0316 = 3.2% Conclusion The 70% is outside the range 60.8% to 67.2% of our confidence interval. There is sufficient evidence to reject the claim that the percentage of passengers who are happy with the service is 70% at the 5% level of significance. Possible Actions: Change the advertisement from 70%. Meet with staff to come up with suggestions about how to improve the level of satisfaction. Do a further survey to find out more detail about why the level of satisfaction has changed. 50 Overview 1. Confidence Intervals and Hypothesis Testing for Proportions 2. Analysing the Relationship Between Sample Means and the Population Mean 3. Using the Relationship for Confidence Intervals for Means 51 Some Distributions 52 Question A sample size n=30 was taken of the weights of school bags of post-primary students. The mean of the sample is 4.5 kg. The standard deviation of the sample is 2.5 kg. Write a statement about the population mean (stating the level of confidence you are using). 53 Our Findings Shape Centre Spread Population One Large Sample All Sample Means 54 Analysing the Relationship Between Sample Means and the Population Mean The distribution of weights of school bags of all 200 students in a school is shown in the diagram. We will call this our population distribution. 55 Feedback and Answers 1. (a) (Roughly) symmetric/bell-shaped/ not skewed/mound-shaped (b) Average is around 5 kg. (c) Most are within 2 kg of the average/ inter-quartile range is about 4kg. 2. (a) Small Chance or unlikely (b) Around 5 3. How many different samples of size 30 are possible? 200 = 409,681,705,022,128,000,000,000,000,000,000,000 30 56 Feedback and Answers 7. (b) The mean of the sample means (c) It is close to 5.1 kg. 8. (b) The range of the sample means (c) It is much smaller. It is much narrower. 9. (b) The standard deviation of the sample means. (c) It is much smaller. It is much narrower. 57 Feedback and Answers 12. The distribution has one mode. (If we look at more sample means we will see the distribution roughly symmetric). The sample means is centred around the centre of the population distribution. The spread is narrower than the spread of the population distribution. 13. (a) it is possible to get a mean between 1 kg and 2 kg but it is not very likely. (b) The mean of a sample will be close to the mean of the population. Some will be closer than others. 58 ICT β’ Show that the sampling distribution of the means will eventually become symmetrical with one mode. β’ Show that when n is large the standard error is narrow. β’ Sample size of 30 is very important to mention. 60 Compare the Standard Deviations of the Samples to the Population Standard Deviation Population Standard Deviation = 2.6 2.4 2.5 2.6 2.5 2.4 2.7 2.9 2.6 2.5 2.9 2.8 2.8 2.6 2.6 2.6 2.4 2.5 2.5 2.7 61 School Bags: Our Findings Shape Centre Spread π π One mode, moundPopulation shaped & roughly symmetric One Large Sample Varies too much to comment x which is π₯ approximately π (95% of these will be within 1.96 All Sample Means Normal π π οx ο½ ο Approximately π of π ) π π 62 Note about π or s π π= πβπ π π π¬= πβπ πβπ 63 Why we chose this approach β’ Guided activities where students make their own discoveries, share and discuss them. Co-construction β’ Revision of Shape, Centre and Spread β’ Prediction so important β’ Difficult to visualise the Central Limit Theorem β’ Strikes a balance between some physical sampling and use of ICT. All of one wouldnβt be ideal. β’ Students come up with phrases like βmean of the sample meansβ themselves. β’ After this work it is easier to picture a confidence interval and a hypothesis test (not just a list of steps). 64 Key Ideas Shape One mode, moundPopulation shaped & roughly symmetric One Large Sample Varies too much to comment Centre Spread π π π₯ x which is approximately π (95% of these will be within 1.96 All Sample Means Normal π π οx ο½ ο Approximately π of π ) π π 65 Overview 1. Confidence Intervals and Hypothesis Testing for Proportions 2. Analysing the Relationship Between Sample Means and the Population Mean 3. Using the Relationship for Confidence Intervals for Means 66 Task In reality we donβt know the population mean. We have a sample. A sample size n=30 was taken of the weights of school bags of post-primary students. The mean of the sample is 4.5 kg. The standard deviation of the sample is 2.5 kg. What could we say about the population mean? Hints: What do we know? What can we determine? What links these? 67 Solution to Task 1. The population mean is around 4.5kg. 3. I am 95% confident that the population mean school bag weight is between 3.6 kg and 5.4 kg. 68 Solution to Task 69 Solution to Task 70 Solution to Task Population Shape We donβt know this shape Centre Spread π π xπ₯ and this is our One Large Sample We donβt care about this best estimate for π (95% of these will be within 1.96 All Sample Means Normal π π οx ο½ ο s We know this is our best estimate for π of π) π π 71 Confidence Intervals for the Population Mean 72 Key Ideas Population Shape We donβt know this shape Centre Spread π π xπ₯ and this is our One Large Sample We donβt care about this best estimate for π (95% of these will be within 1.96 All Sample Means Normal π π s We know this is our best estimate for π of π) οx ο½ ο π π 73 A Task on Hypothesis Testing 3 newspapers investigate a claim by a parents group that the mean weight of school bags of post-primary students in Ireland is 8kg. All 3 take a sample size of 100. Group A Newspaper A finds a mean weight of 7.5kg with a standard deviation of 2.7kg. Group B Newspaper B finds a mean weight of 8.2kg with a standard deviation of 2.5kg. Group C Newspaper C finds a mean weight of 7.3kg with a standard deviation of 3.2kg. Use an hypothesis test at the 5% level of significance to decide whether there is sufficient evidence to conclude that their claim is valid. State the null hypothesis and state your conclusion clearly. 74 Overview 1. Confidence Intervals and Hypothesis Testing for Proportions 2. Analysing the Relationship Between Sample Means and the Population Mean 3. Using the Relationship for Confidence Intervals for Means 75 Modular Courses Notes for previous course on Strand 1 online. New course on Functions, Calculus and the Deferred Material will begin in many education centres from March. Sign up to the Project Maths Newsletter to find where and when. 76