Download The Study of Change

Document related concepts

Confidence interval wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Statistics & Probability
Descriptive Statistics: descriptive
statistics to summarise the data in
a sample (Workshop 5)
Inferential Statistics: Using the
sample statistics to infer a
description of the population
(Workshop 10)
1
Recap of
workshop 5
Important prior knowledge
2
We analysed Right Foot Lengths as
an example of continuous data that
would be normally distributed
3
Mean
24.6
1 s.d.
20.6
28.6
2 s.d.
16.6
32.6
3 s.d.
144
72%
186
93%
197
98.5%
12.6
36.6
4
After our investigation we confirmed the
Empirical Rule for normally distributed data
5
Empirical Rule
68%
ο€­3
ο€­2
ο€­1
0
1
2
3
95%
99.7%
6
Empirical Rule
𝝁 = π’‘π’π’‘π’–π’π’‚π’•π’Šπ’π’ π’Žπ’†π’‚π’
𝝈 = π’‘π’π’‘π’–π’π’‚π’•π’Šπ’π’
𝒔𝒕𝒂𝒏𝒅𝒂𝒓𝒅 π’…π’†π’—π’Šπ’‚π’•π’Šπ’π’
68%
 ο€­ 3
 ο€­ 2
 ο€­ 1

  1
  2
  3
95%
99.7%
7
Most Important for Inferential Stats
on our Syllabus
 ο€­ 2

  2
95%
95% of normal data lies within 2 standard deviations of the mean
8
Pg. 37
Pg. 36
For Honours Leaving Cert use z scores
ο€­1.96
2.5%
0
95%
1.96
2.5%
9
Module 5.21
First, build on the informal
language we already know.
10
Data is collected from a class on the number of
people in each of their households.
The above dot plot represents the data.
11
1 student was absent on the day
the sample was collected.
A 100 euro prize is offered to
guess how many are in his
family.
Which answer would you chose and why?
A. 5
B. 8
12
1 student was absent on the day
the sample was collected.
A 100 euro prize is offered to
guess how many are in his
family.
How confident are you that 5 is the answer?
Answer in words.
Quite confident
Not that confident, but its my best
guess…
13
1 student was absent on the day
the sample was collected.
A 100 euro prize is offered to
guess how many are in his
family.
In which statement are you more confident? Why?
A. The number is 5.
B. The number lies in the following interval:
4 ≀ number ≀ 6 {4,5,6}
14
1 student was absent on the day
the sample was collected.
A 100 euro prize is offered to
guess how many are in his
family.
In which statement are you more confident? Why?
A. The number lies in the following interval:
3 ≀ number ≀ 7 {3,4,5,6,7}
B. The number lies in the following interval:
4 ≀ number ≀ 6 {4,5,6}
15
1 student was absent on the day
the sample was collected.
A 100 euro prize is offered to
guess how many are in his
family.
A. The number lies in the following interval:
3 ≀ number ≀ 7 {3,4,5,6,7}
Are you 100% confident in this answer? Why?
16
Mobile Phones
You wanted to work out which mobile phone
company is the most popular among Irish
second level students. You used a sample of
100 students and found that Meteor topped the
survey with 42%.
What percentage of all second level students in
the country use Meteor?
(Can we use the sample to talk about the
population?)
17
Sample Variation from Workshop 5
18
Important with Inferential Statistics
Indeterminate Language when inferring from a
sample to a population
…around
…tends to
…approximately
Around 42% of students in Ireland use Meteor
19
New Terminology
You wanted to work out which mobile phone
company is the most popular among Irish
second level students. You used a sample of
100 students and found that Meteor topped the
survey with 42%.
What percentage of all second level students in
the country use Meteor?
(Can we use the sample to talk about the
population?)
20
New Terminology
What percentage of all second level students in
the country use Meteor?
Categorical Data:
Yes, I use Meteor
No, I don’t use Meteor
How can I represent categorical data (yes/no) with a number?
0.42
By using a Sample Proportion: 42% use Meteor
(58% do not use Meteor) (1-0.42) = 0.58
𝒑
21
Confidence
Words:
– Around 42% of students in Ireland use Meteor
How confident are you in your statistic?
– I am reasonably confident that the percentage of
students using meteor is around to 42%
22
Quantify
Sample:
infer
Full Population
Words:
I’m fairly confident that around 42% use Meteor
LCOL & LCHL
Quantify: Use an interval balanced with confidence in that interval
I am…
100% confident that between 0% and 100% use Meteor
50% confident that between 40% and 45% use Meteor
23
Q: Where do you
keep your mobile
phone at night?
A simulation activity
β€’ The 200 cubes in the bag represent all 200 students in a
school.
β€’ The yellow ones represent the students who kept their mobile
phone under their pillow last night when they were asleep.
β€’ Take a sample of 25 cubes from the bag and see what
proportion (percentage) are yellow.
24
My Sample
N=25
Yellow
Not Yellow
Mobile Phone
Under pillow
Not under pillow
How many?
9
16
Proportion %
36%
64%
Proportion (dec)
0.36
1-0.36 = 0.64
Inferring to the population (200 students)
Can we say that 36% of these 200 students sleep
with their phone under their pillow? Why?
No - Sample variation
25
If we took another sample of 25…
β€’ Could you get a sample proportion of 4%?
οƒ˜ How likely is this?
β€’ Could you get a sample proportion of 40%
οƒ˜ How likely is this?
If we retook the sample 300 times where
would most of the proportions lie?
26
Empirical Rule: 95% of the sampled proportions (%) will lie
within 2 sd of the true population proportion
95%
π‘¨π’”π’”π’–π’Žπ’† 𝒕𝒓𝒖𝒆 π’‘π’“π’π’‘π’π’“π’•π’Šπ’π’ =
𝟎. πŸ‘ πŸ‘πŸŽ% from C@S phase 11:
Sample of 7150 students
Results lie around the true
population proportion
27
Confidence
Approximately 95% of the sample
proportions are between 10% and 50%
Therefore I can say,
β€œI am 95% confident that the true proportion
lies between 10% and 50%”
28
In Reality…
οƒ˜ we have 1 sample, not 300
οƒ˜ we don’t have the true population proportion – this is
what we are looking for
οƒ˜ so we create an interval of 2 sd around the proportion
we have from our 1 sample as our best estimate.
Best estimate
𝟏
𝒑-2sd%
βˆ’
𝒏
Sample%
True
%
( 𝒑 =36%)
𝟏
𝒑
+
+2sd%
𝒏
True %
(30%)
β€’ This interval around a sample statistic is called the
Margin of Error
29
I am
95%
confident
20 different 95% confidence intervals
β€œ95% confidence intervals” using the margin of
error around the sampled proportion (blue dot)
βˆ’
𝟏
𝒏
+
𝟏
𝒏
95% of the time,
the true
population is in
the interval I
made with my
sampled
proportion and
the margin of
error interval.
𝒑
Population
Proportion
30
20 different 95% confidence intervals
β€œ95% confidence intervals” using the margin of
error around the sampled proportion (blue dot)
8:00 am
βˆ’
𝟏
𝒏
+
𝒑
Population
Proportion
𝟏
𝒏
If I took lots of random
samples of this size
and created
confidence intervals at
the 95% level of
confidence around the
sample proportions,
95% of the intervals
would capture the true
population proportion.
31
Sample Size and Error
Why does n=sample size feature in the formula
for building a confidence interval?
𝟏
𝒏
Question for intuitive understanding:
What do you think happens the margin of error as the sample
size increases?
There is a direct relationship between
sample size and the margin of error around your statistic
32
Sample Size & Margin of Error
As we increase the sample size what happens the
margin of error?
1
n
33
Level of Confidence 95%
Sample Size
25
64
100
256
400
625
1111
1600
2500
10000
Margin of Error
ο‚±20%
ο‚±12.5%
ο‚±10%
ο‚±6.25%
ο‚±5%
ο‚±4%
ο‚±3%
ο‚±2.5%
ο‚±2%
ο‚±1%
34
Inferential Statistics
Real Life Big Question: (Pose a Question)
What percentage of all post-primary students
keep their phone under their pillow?
Let’s answer our question using
different sample sizes.
35
Sample Size 25
Let’s say you took a random sample of size 25 from the
population of all post-primary students in Ireland and 7 of the 25
keep their mobile phone under their pillow.
Sample Proportion =
Margin of Error = ±
πŸ•
πŸπŸ“
𝟏
πŸπŸ“
×
𝟏𝟎𝟎
𝟏
= πŸπŸ–%
= ±πŸπŸŽ%
0.28±0.2
8%
28%
48%
This would mean you are 95% confident that between 8% and
48% of post-primary students keep their phone under their pillow.
36
Sample Size 100
Let’s say you took a random sample of size 100 from the
population of all post-primary students in Ireland and 28 of the 100
keep their mobile phone under their pillow.
Sample Proportion =
Margin of Error = ±
πŸπŸ–
𝟏𝟎𝟎
𝟏
𝟏𝟎𝟎
×
𝟏𝟎𝟎
𝟏
= πŸπŸ–%
= ±πŸπŸŽ%
0.28±0.1
18%
28%
38%
This would mean you are 95% confident that between 18% and
38% of post-primary students keep their phone under their pillow.
37
Sample Size 1111
Let’s say you took a random sample of size 1111 from the
population of all post-primary students in Ireland and 311 of the
1111 keep their mobile phone under their pillow.
Sample Proportion =
Margin of Error = ±
πŸ‘πŸπŸ
𝟏𝟎𝟎
×
𝟏𝟏𝟏𝟏
𝟏
𝟏
𝟏𝟏𝟏𝟏
= πŸπŸ–%
= ±πŸ‘%
0.28±0.03
25%
28%
31%
This would mean you are 95% confident that between 25% and
31% of post-primary students keep their phone under their pillow.
38
made more accurate using z scores
Pg. 37
Pg. 36
LCHL:
𝟏
𝒏
ο€­1.96
2.5%

95%
1.96
2.5%
39
Module 5.21
Revisit Margin of Error
𝟏
-2sd%
βˆ’
𝒏
𝑳π‘ͺ𝑢𝑳 =
True % %
Sample
True %
𝟏
𝒏
𝒑(𝟏 βˆ’ 𝒑)
𝑳π‘ͺ𝑯𝑳 = ±πŸ. πŸ—πŸ”
𝒏
πŸπ’‘)
𝒑(𝟏 βˆ’
βˆ’πŸ. πŸ—πŸ” -2sd%
βˆ’π’
𝒏
𝟏
+2sd%
+
𝒏
Note: Use p if p is unknown
𝟏 𝒑(𝟏 βˆ’ 𝒑)
+𝟏.
+2sd%
+ πŸ—πŸ”
𝒏 𝒏
True % %
Sample
True %
40
Higher Level
Let’s say you took a random sample of size 25 from the
population of all post-primary students in Ireland and 7 of the 25
keep their mobile phone under their pillow.
Sample Proportion =
πŸ•
πŸπŸ“
×
𝟏𝟎𝟎
𝟏
Confidence Limits = ±πŸ. πŸ—πŸ”
= πŸπŸ–%
.πŸπŸ–(πŸβˆ’.πŸπŸ–)
πŸπŸ“
= ±πŸŽ. πŸπŸ•πŸ” = ±πŸπŸ•. πŸ”%
0.28±0.176
10.4%
28%
45.6%
This would mean you are 95% confident that between 10.4% and
45.6% of post-primary students keep their phone under their
pillow.
41
NCCA
Find additional information about Inferential
Statistics on the NCCA website under frequently
asked questions
42
Recap of
workshop 5
In WS 5 we did Hypothesis Testing for Higher Level.
Now, Hypothesis Testing is on also for Ordinary Level.
Therefore, we will revisit Hypothesis Testing from WS 5.
43
44
Hypothesis Testing
Testing claims about a population
Null Hypothesis: The Null hypothesis, denoted by H0 is a
statement about a population.
We assume this statement is true until proven otherwise.
45
A Claim
CLAIM (H0) : The new drug relieves migraine 70% of the
time.
SAMPLE: A newspaper investigates this claim by getting
migraine sufferers to try the new drug.
They get 100 results that say it relieves migraine 62% of the
time.
MARGIN OF ERROR:
𝟏
𝟏𝟎𝟎
Fail
to
Reject
= ±πŸπŸŽ%
70%
TEST:
52%
62%
𝒑
72%
46
A Claim
CLAIM (H0) : 30% of second level students in Ireland are
180cm or taller.
SAMPLE: Of the 200 students in the C@S sample, 34 are
180cm or taller. Is this sufficient evidence to reject the
teacher’s claim, at the 5% level of significance?
MARGIN OF ERROR:
𝟏
𝟐𝟎𝟎
Reject
= ±πŸ•%
πŸ‘πŸ’
= πŸπŸ•% 𝒂𝒓𝒆 πŸπŸ–πŸŽπ’„π’Ž 𝒐𝒓 𝒕𝒂𝒍𝒍𝒆𝒓
𝟐𝟎𝟎
TEST:
10%
30%
17%
𝒑
24%
47
On-line
β€’ Under workshop 5
48
Testing a Claim about a Proportion
using a Sample Proportion
Go Fast Airlines provides internal flights in Ireland, short haul flights
to Europe and long haul flights to America and Asia. Each month
the company carries out a survey among 1000 passengers. The
company repeatedly advertises that 70% of their customers are
satisfied with their overall service. 640 of the sample stated they
were satisfied with the overall service.
49
Courtroom Analogy to Teach
Formal Language
β€’
At the start of a trial we assume the defendant is innocent.
β€’
Then we present the evidence.
β€’
The verdict is β€œGuilty” or β€œNot Guilty". (Write this in a real world context)
β€’
A real world action sometimes needs to be decided upon.
Null Hypothesis: The proportion of passengers who are satisfied with the service is 70%. p = 0.7
Alternative Hypothesis: The proportion of passengers who are satisfied with the service is not
70%. pο‚Ή 0.7
Evidence:
Sample Proportion =
Margin of Error =
1
n
640
1000
1
=
= 0.64 = 64%
1000
= 0.0316 = 3.2%
Conclusion
The 70% is outside the range 60.8% to 67.2% of our confidence interval.
There is sufficient evidence to reject the claim that the percentage of passengers who are
happy with the service is 70% at the 5% level of significance.
Possible Actions: Change the advertisement from 70%.
Meet with staff to come up with suggestions about how to improve the level of satisfaction.
Do a further survey to find out more detail about why the level of satisfaction has changed.
50
Overview
1. Confidence Intervals and Hypothesis Testing for
Proportions
2. Analysing the Relationship Between Sample Means
and the Population Mean
3. Using the Relationship for Confidence Intervals for
Means
51
Some Distributions
52
Question
A sample size n=30 was taken of the weights of school
bags of post-primary students.
The mean of the sample is 4.5 kg.
The standard deviation of the sample is 2.5 kg.
Write a statement about the population mean (stating
the level of confidence you are using).
53
Our Findings
Shape
Centre
Spread
Population
One
Large
Sample
All Sample
Means
54
Analysing the Relationship Between Sample
Means and the Population Mean
The distribution of weights of school bags of all 200
students in a school is shown in the diagram. We will
call this our population distribution.
55
Feedback and Answers
1.
(a) (Roughly) symmetric/bell-shaped/
not skewed/mound-shaped
(b) Average is around 5 kg.
(c) Most are within 2 kg of the average/
inter-quartile range is about 4kg.
2.
(a) Small Chance or unlikely
(b) Around 5
3. How many different samples of size 30 are possible?
200
= 409,681,705,022,128,000,000,000,000,000,000,000
30
56
Feedback and Answers
7.
(b) The mean of the sample means
(c) It is close to 5.1 kg.
8.
(b) The range of the sample means
(c) It is much smaller. It is much narrower.
9.
(b) The standard deviation of the sample means.
(c) It is much smaller. It is much narrower.
57
Feedback and Answers
12.
The distribution has one mode. (If we look at more
sample means we will see the distribution
roughly symmetric). The sample means is centred
around the centre of the population distribution.
The spread is narrower than the spread of the
population distribution.
13.
(a) it is possible to get a mean between 1 kg and 2
kg but it is not very likely.
(b) The mean of a sample will be close to the mean
of the population. Some will be closer than others.
58
ICT
β€’ Show that the sampling distribution of
the means will eventually become
symmetrical with one mode.
β€’ Show that when n is large the
standard error is narrow.
β€’ Sample size of 30 is very important to
mention.
60
Compare the Standard Deviations of the
Samples to the Population Standard Deviation
Population Standard Deviation = 2.6
2.4
2.5
2.6
2.5
2.4
2.7
2.9
2.6
2.5
2.9
2.8
2.8
2.6
2.6
2.6
2.4
2.5
2.5
2.7
61
School Bags: Our Findings
Shape
Centre
Spread
πœ‡
𝜎
One mode,
moundPopulation shaped &
roughly
symmetric
One
Large
Sample
Varies too
much to
comment
x which is
π‘₯
approximately πœ‡
(95% of these
will be within
1.96
All Sample
Means
Normal
𝜎
𝑛
x ο€½ 
Approximately 𝜎
of πœ‡ )
𝜎
𝑛
62
Note about 𝜎 or s
𝟐
𝜎=
π’™βˆ’π
𝒏
𝟐
𝐬=
π’™βˆ’π’™
π’βˆ’πŸ
63
Why we chose this approach
β€’ Guided activities where students make their own
discoveries, share and discuss them. Co-construction
β€’ Revision of Shape, Centre and Spread
β€’ Prediction so important
β€’ Difficult to visualise the Central Limit Theorem
β€’ Strikes a balance between some physical sampling and
use of ICT. All of one wouldn’t be ideal.
β€’ Students come up with phrases like β€œmean of the
sample means” themselves.
β€’ After this work it is easier to picture a confidence interval
and a hypothesis test (not just a list of steps).
64
Key Ideas
Shape
One mode,
moundPopulation shaped &
roughly
symmetric
One
Large
Sample
Varies too
much to
comment
Centre
Spread
πœ‡
𝜎
π‘₯
x which is
approximately πœ‡
(95% of these
will be within
1.96
All Sample
Means
Normal
𝜎
𝑛
x ο€½ 
Approximately 𝜎
of πœ‡ )
𝜎
𝑛
65
Overview
1. Confidence Intervals and Hypothesis Testing for
Proportions
2. Analysing the Relationship Between Sample Means
and the Population Mean
3. Using the Relationship for Confidence Intervals for
Means
66
Task
In reality we don’t know the population mean.
We have a sample.
A sample size n=30 was taken of the weights of
school bags of post-primary students.
The mean of the sample is 4.5 kg.
The standard deviation of the sample is 2.5 kg.
What could we say about the population mean?
Hints: What do we know?
What can we determine?
What links these?
67
Solution to Task
1. The population mean is around 4.5kg.
3. I am 95% confident that the
population mean school bag weight is
between 3.6 kg and 5.4 kg.
68
Solution to Task
69
Solution to Task
70
Solution to Task
Population
Shape
We don’t
know this
shape
Centre
Spread
πœ‡
𝜎
xπ‘₯ and this is our
One
Large
Sample
We don’t
care about
this
best estimate for πœ‡
(95% of these will
be within
1.96
All Sample
Means
Normal
𝜎
𝑛
x ο€½ 
s
We know this is
our best estimate
for 𝜎
of πœ‡)
𝜎
𝑛
71
Confidence Intervals for the
Population Mean
72
Key Ideas
Population
Shape
We don’t
know this
shape
Centre
Spread
πœ‡
𝜎
xπ‘₯ and this is our
One
Large
Sample
We don’t
care about
this
best estimate for πœ‡
(95% of these will
be within
1.96
All Sample
Means
Normal
𝜎
𝑛
s
We know this is
our best estimate
for 𝜎
of πœ‡)
x ο€½ 
𝜎
𝑛
73
A Task on Hypothesis Testing
3 newspapers investigate a claim by a parents group that the mean
weight of school bags of post-primary students in Ireland is 8kg.
All 3 take a sample size of 100.
Group A
Newspaper A finds a mean weight of 7.5kg with a standard deviation of 2.7kg.
Group B
Newspaper B finds a mean weight of 8.2kg with a standard deviation of 2.5kg.
Group C
Newspaper C finds a mean weight of 7.3kg with a standard deviation of 3.2kg.
Use an hypothesis test at the 5% level of significance to decide
whether there is sufficient evidence to conclude that their claim is
valid. State the null hypothesis and state your conclusion clearly.
74
Overview
1. Confidence Intervals and Hypothesis Testing for
Proportions
2. Analysing the Relationship Between Sample Means
and the Population Mean
3. Using the Relationship for Confidence Intervals for
Means
75
Modular Courses
Notes for previous course on Strand 1 online.
New course on Functions, Calculus and the Deferred Material
will begin in many education centres from March.
Sign up to the Project Maths Newsletter to find where and when.
76