Download Unit 6 Data Management Math 421A 15 Hours

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
UNIT 6
DATA MANAGEMENT
MATH 421A
15 HOURS
Revised June 1, 00
100
UNIT 6: Data Management
Previous Knowledge
With the implementation of APEF Mathematics at the Intermediate level, students should be able
to:
- Grade 7- distinguish between biased and unbiased sampling
- select appropriate data collection methods
- construct a histogram
- read and make inferences for data displays
- determine measures of central tendency
- create and solve problems using the numerical definition of probability
- identify all possible outcomes of two independent events
- Grade 8- develop and apply the concept of randomness
- construct and interpret box and whisker plots
- determine the effect of variations in data on the mean, median and mode
- Grade 9- determine probabilities involving dependent and independent events
- determine theoretical probabilities of compound events
Overview:
- sampling techniques and Bias
- measures of Central Tendency and 50% Box Plots
- 90% Box Plots and Applications
- Probability and Applications (Expected Values)
101
SCO: By the end of grade
10 students will be
expected to:
F1 design and conduct
experiments using
statistical methods and
scientific inquiry
F2 demonstrate an understanding of concerns
and issues that pertain
to the collection of
data
F12 draw inferences about
a population/sample
and any bias that can
be
identified
F14 demonstrate an understanding of how the
size of a sample
affects the variation in
sample results
G5 develop an
understanding of
sampling variability
Elaborations - Instructional Strategies/Suggestions
Sampling Techniques (8.1)
Invite student groups to explore the following questions:
“If you want to know what percent of high school students on PEI know
the capitals of the Canadian provinces, how would you do this and who
would you ask? Would the results represent the views of the entire
grade 10 population?
Class discussion might touch on these topics:
What does the term “population” mean?
Is it reasonable to survey the entire population?
If the response is no, then how do we select a representative sample to
be surveyed?
Concept of Bias should be introduced at this point.
Bias is some influence that prevents the sample from being
representative of the entire population.
Challenge student groups to determine possible ways to select a biased
sample.(ex. Sample selected could be only grade 12 Canadian Studies
classes)
Invite students to explore ways of selecting an unbiased sample.
Students should read pp.365-367 in Math Power 10.
Probability sampling
< simple random < every member of the population has an equal chance
of being selected.
Ex. All students’ names are put in a hat and 30 are
selected
< systematic <
every nth member of a population is selected
Ex: If the school population is 630 and you want to
select a sample of 30 students, 630 ÷ 30 = 21.
Therefore in an alphabetical student list select every
21st student.
<stratified <
the population is divided into groups, or strata,
from which random samples are taken.
Ex: School is divided into grades and you want 30
people. Randomly pick 10 people from each grade.
<cluster <
choose a random sample from one group within a
population.
Ex: School is subdivided by classes. A class is
chosen randomly and all members are selected.
Non-Probability sampling (not random)
<convenience <
no thought or effort has been put into selecting the
sample. It is designed to be convenient for the
sampler.
Ex: Samplers survey their friends at the cafeteria
table.
102
Worthwhile Tasks for Instruction and/or Assessment
Sampling Techniques (8.1)
Journal/Pencil/Paper
A survey result indicates that “ .. most Canadians feel that the
Senate is a waste of tax-payers’ money.” What are some of
the questions you should ask about this survey?
( who was surveyed- was it random across Canada? ; What
age groups were surveyed? ; What socio-economic groups
were surveyed?)
Pencil/Paper
Identify the population you would sample for an opinion on
each topic:
a) minimum driving age
b) student parking spaces
c) fees for athletic teams
d) cafeteria food
Pencil/Paper
You intend to survey the school population to determine
whether the students would attend another dance this month.
Describe a sampling method for each sampling technique:
a) systematic
b) convenience
c) simple random
d) stratified
Presentation
Bring an example of a recent survey in a newspaper or
magazine to class and discuss the validity of the survey. Was
there bias in the survey question(s)? What sampling method
do you think was used?
Project
Try to find out what company does the surveys during the
election campaign and ask questions relating to bias and
sampling methods.
103
Suggested Resources
Sampling Techniques
Mathpower 10 p.368 # 1,6,11,14,17,
21,24
SCO: By the end of grade
10 students will be
expected to:
F12 draw inferences about
a population/sample
and any bias that can
be
identified
G2 design “yes/no” type
questions
F4 construct various
displays of data
Elaborations - Instructional Strategies/Suggestions
Sampling Techniques (cont’d) (8.1)
< Volunteers <
members of a population choose to participate in a
survey.
Ex: Interested students volunteer to participate (mailin or phone-in surveys fall under this category)
Various Types of Bias (8.2)
< Selection (Sampling) Bias
This is the type of bias created by faulty sample selection this
generally would not happen in probability sampling procedures.
< Response Bias
This bias is created by faulty question or survey construction.
In other words the wording of the question influences the
response. This can occur in all sampling techniques.
Ex: In the question “Is it really fair that young people are not
allowed to drive until they are 16?” the phrase “really fair”
shows a bias in the question.
< Non-Response Bias
This bias is created when a large number of people do not
complete a survey.
Ex: Mail out questionnaires commonly have a poor response.
People do not mail them back, therefore, a bias is created
because inferences are made on sketchy results.
Measures of Central Tendency
Generate discussion to see what students’ current knowledge is on
mean, median and mode.
Mean (
)<
The arithmetic average.
Median <
The middle number. Once the list is in ascending order,
the median is the middle value. If there is an even
number of values, the median is the average of the
middle two. Half of the data is below the median and
half of the data is above the median.
Ex: 1, 3, 4, 4, 6, 7, 8, 8, 8, 9, 10 (Median = 7)
Ex: 1, 2, 2, 3, 5, 5, 6, 6, 6, 7, 8, 9 (Median = 5.5)
Mode <
The most frequently occurring number
Ex: In the first list above the mode = 8 and in the
second list mode = 6.
Worthwhile Tasks for Instruction and/or Assessment
104
Suggested Resources
Various Types of Bias (8.2)
Journal
In a short paragraph describe in your own words the types of
bias that can occur and give an example of each.
Various Types of Bias
Mathpower 10 p.372 #1-13
Group Activity
Study newspapers, magazines, TV commercials, etc. Find as
many statements as possible that you feel are biased. Identify
each one as a response, non-response, or selection bias.
Project
Contact a polling company and ask for copies of the questions
used to survey political party popularity during the last
election. Study the questions for any bias and determine the
method of sampling.
Measures of Central Tendency
Pencil/paper(See p.112 for explanation on constructing box
plots)
Each student in the class picks a number from 1 to 10. Write
the data from the entire class on the board and find the mean,
median and mode. Draw a 50% box plot.
Measures of Central Tendency
Note to teachers:
To use the TI-83 as a random number
generator.
Math < PRB 5:randInt(
Pencil/Paper/Estimation
A random generator(TI-83) is used to generate 20 numbers
from 1 to 100. Estimate the mean, median and mode from the
data below. Calculate the mean, median and mode and relate
these to your estimates. Draw a 50% box plot.
55 100 91 95 46 75 94 17 19 53
72 71 24 75 80 24 98 6 77 19
generates numbers from 1 to 100 in
groups of 20.
Pencil/Paper
Listed below are the heights, in centimetres, of 35
competitors in an Olympics event. Examine the data to
determine the spread (range) of the data, where the data was
centred, and if any extreme heights existed. Construct a 50%
box plot on the data below.
190
175
180
185
192
195
187
167
175
185
183
184
180
183
185
188
189
187
198
183
184
185
181
185
184
182
185
189
187
184
180
175
178
195
189
105
SCO: By the end of grade
10 students will be
expected to:
Elaborations - Instructional Strategies/Suggestions
Measures of Central Tendency (cont’d)
For Box Plots we must look at the data in quarters or quartiles.
Q1 (first quartile) < the first quartile is the mid-value of the first half of
the data (ie. up and not including the median).
Ex: 1, 3, 4, 4, 6, 7, 8, 8, 8, 9, 10 (Q1 = 4)
F5 calculate various
statistics using
appropriate
technology, analyze
and interpret displays
and describe the
relationships
G4 interpret and report on
the results obtained
from surveys and
polls,
and from
experiments
Q3 (third quartile)< the third quartile is the mid-value of the second
half of the data (ie. after the median).
Ex: 1, 3, 4, 4, 6, 7, 8, 8, 8, 9, 10 (Q3 = 8)
Once we have determined the Median and the quartiles we can then
plot this data in a Box Plot. A box plot has 50% of the values inside
the box and the left whisker represents the first quarter of the data and
the right whisker represents the fourth quarter of the data.
Ex: 1, 3, 4, 4, 6, 7, 8, 8, 8, 9, 10
Now we have Q1= 4, Median = 7, and Q3 = 8
For this example we will use a number line from 1 to 10 with a scale of
1.
In general, more valid inferences can be made when the measures of
central tendency are all close together. The more they are dispersed
the less valid the inferences.
106
Worthwhile Tasks for Instruction and/or Assessment
Suggested Resources
Measures of Central Tendency
Pencil/Paper
The results of an experiment to determine the effect of
temperature on the speed of sound in air consisted of taking
nine measurements at 100 C and nine taken at 220 C. The data
is displayed below.
Measures of Central Tendency
a)
draw a 50% box plot for each set of data
b) What is the median speed at the lower temperature? At the
higher temperature?
c) Between what two speeds do 50% of the data lie for each
plot.
d) From your results, what do you think is the effect of an
increase in temperature on the speed of sound in air?
Pencil/Paper
A survey of weekly television viewing time of 25 female and
26 male teenagers produced the following data.
a) Find the measures of central tendency (mean, median and
mode)
b) What type of sampling technique would you assume was
used?
c) What types of conclusions can you make about the survey?
107
SCO: By the end of grade
10 students will be
expected to:
F4 construct various
displays of data
F26 construct, interpret and
apply 90% box plots
Elaborations - Instructional Strategies/Suggestions
Measures of Central Tendency (cont’d)
Re-doing the previous example using the TI-83:
Stat 1:Edit clear all lists, then enter the data in L1
If the data must be arranged in ascending order
press Stat 2:Sort A(L1) where A is
ascending
To graph a 50% box plot:
2nd Stat Plot 1:Plot 1 and having the following settings
F30 organize and display
information in many
different ways with
and without
technology
the 4th graph choice doesn’t connect the outliers to the box while the 5th
choice of graph does. Typically we will be using this 5th choice. It is a
box and whiskers plot with outliers.
to graph set the appropriate window dimensions or press zoom 9:zoom
stat
press trace and see the minimum, Q1, the
dian, Q3 and the maximum
by cursoring across the box plot.
108
me
Worthwhile Tasks for Instruction and/or Assessment
Measures of Central Tendency
Pencil/Paper/Technology
A teacher has the following results in percent in a class test.
76, 43, 56, 74, 96, 89, 55, 66, 49, 80, 85, 93, 95, 77, 96, 70,
98, 46, 78, 55, 76, 95, 95, 96, 52, 98, 73, 95, 81, 96, 59, 94,
44, 92, 96. Sort the data in ascending order. And draw a 50%
box and whiskers plot.
Solution
Enter the data in the TI-83. Sort the data Stat 2:Sort A(L1).
Graph the data on the TI-83. To see the graph, set the window
dimensions by pressing zoom 9:stat
To see the mean, minimum, Q1, median, Q3 and the maximum
press Stat < Calc 1:1-var Stats enter and scrolling down L
Looking at the sorted data determine the mode.
What inferences can be made from the graph?
Half the class has a mark over the median 80, and 1/4 over Q3
95 . Because the median is 80 and we see a short upper
whisker then a lot of the class is very high. The lower whisker
is long which means that there are a few really low students
dragging the mean down.
Note to teacher: If the box is really short then the middle 50%
have marks very close together. If the box is long then there is
a large range of marks in the middle 50% of students.
Communication/Journal.
Make inferences about the following box plot.
The median is skewed around 85% with a short upper whisker
and therefore a lot of marks there. The range of the upper half
is very small thus the upper half of the class have marks very
close together. The lower half have a greater range and thus a
greater dispersion of marks. Marks in upper half are high
because median is 85%.
109
Suggested Resources
Measures of Central Tendency
SCO: By the end of grade
10 students will be
expected to:
F26 construct, interpret and
apply 90% box plots
Elaboration - Instructional Strategies/Suggestions
90% Box Plots
Binomial Population < A population that has two possible outcomes.
In other words, in response to a question the
answer is either YES or NO.
Ex: Toss of a coin
Ex: Did you pass your test?
Ex: Are you a band student?
90% Box Plots combine results of many small samples of the
population. These box plots then allow us to make inferences on the
population as a whole or backwards from population to sample. The
box plots given are for sample sizes 20, 40, and 100.
Ex: In a school of 1000 students, a sample of 20 students is surveyed.
This procedure is repeated 100 times and each time the 20 students are
randomly chosen. (Not necessarily the same students). This gives us
the data to create a 90% Box Plot for sample size 20.
In the above example, assume the population is known be 70% enrolled
in the English Program and 30% in French Immersion. When
conducting a survey (as explained in the above paragraph) the following
data is obtained and placed in a frequency table.
# marked
8
9
10
11
12
13
14
15
16
17
18
19
Frequency
1
2
6
2
14
11
21
17
15
8
2
1
In a 90% Box Plot, 10% of the values are contained in the two whiskers
together. Out of 100 trials, 10% would be 10. In the table above we
need to count frequencies from both ends until we are as close to 10 as
possible. Working our way in from both sides, the closest we get to 10
is 12 which is obtained when using the first three columns on the left
and the last two columns on the right. The rest of the values are
contained in the box.
Now would be a good time to show the students the entire 90% Box
Plot for sample size 20 table and let them realize that all this work has
generated only 1 of the box plots in this table. So instead of doing all
this work from now on use the tables provided.
In order to do the worthwhile tasks you will need to be able to read the
box plot tables. Instructions are given in Addison-Wesley 10 text p. 548
and 556.
110
Worthwhile Tasks for Instruction and/or Assessment
90% Box Plots (population to sample, sample to
population)
Group Activity/Paper/Pencil
Divide class into groups and have each group create a 90%
box plot based on a different percent of marked items.
Note to teachers: To generate the data using the TI-83 for a
situation where 80% of the school population is enrolled in
the English Program:
Math < over to Prb 7:randBin( Random binomial)
Suggested Resources
90% Box Plots
see worksheet at end of unit
Activity
Estimating the size of a wildlife
population Math 10 p.560
Instructions for this activity are at the
back of the unit.
Math 10
(Sample size, probability, number of samples). In this sample
the sample size is 20, the probability is 80% and this is
repeated 100 times
80% of 20 = 16 so we would expect out of every 20 people
surveyed 16 would be in the English Program. This program
generates 100 numbers with this restriction but taking into
account the fact that there is some uncertainty in the sampling
process. In the first 20 people you survey it might happen that
most (or very few) of them are in the English Program so that
you may not have exactly 16 out of 20 in the English
Program. If enough groups of 20 students are surveyed the
average should move closer to 16.
For the following problems and those in the Suggested
Resources use the Box Plot tables at the end of this unit.
Pencil/Paper
20% of the school population take Canadian Studies. In a
random sample of 20 students, what range of students might
be taking Canadian Studies.
Pencil/Paper
If 34% of the student population regularly attends school
dances, is it likely that a random sample of 40 students would
contain 20 students who attend dances.
Pencil/Paper
In a random sample of 20 grade 10 students 7 said they have a
driver’s license. Make an inference about the percent of grade
10 students who have a driver’s license. ( ex. Math 10 p.556)
111
p.561 # 1, 3-5
Problem Solving Strategies
Math Power 10 p.397 #1,3,6
SCO: By the end of grade
10 students will be
expected to:
Elaborations - Instructional Strategies/Suggestions
Probability (p.374)
A simple way of introducing students to the study of probability is to do
an activity like the following:
Each card has a letter written on it
G10 find probability given
various conditions
if the cards were placed in a hat, what is the chance (or probability) that
you will draw (assume that after each draw the cards are replaced):
a) a vowel
b) a consonant
c) an E
d) an X
Now challenge the students to come up with a definition of probability.
Probability < The ratio of the number of favourable outcomes to the
total number of possible outcomes.
P(outcome) is the probability of getting that outcome. For example,
when rolling a die P(3) is the probability of rolling a 3 which equals .
Using a deck of 52 cards a person draws a jack.
a) What are the chances of drawing a second jack if the first jack has
been replaced? ( ) This is an example an independent event. An
independent event is when each event has an equal chance of
occurring.
b) What are the chances of drawing a second jack if the first jack was
not replaced. ( ) This is an example of a dependent event.
Expected Values (8.3)
Have students play the game as described in example 2 p. 381. Students
need to keep track of the number of rolls needed to win. The table
included on p.110 at the end of this unit is to help students record their
result with this activity.
When students have completed the activity, record the number of rolls it
took each student to win and then find the class mean (experimental
solution).
Now go through the solution to the example to calculate the expected
value of each roll. Use the expected value to find the number of rolls
expected to win (theoretical solution).
112
Worthwhile Tasks for Instruction and/or Assessment
Suggested Resources
Probability (p.374)
Pencil/Paper
A jack is drawn from a deck of 52 cards.
a) What is the probability of drawing a second jack from the
deck if the first jack is replaced?
b) What is the probability of drawing a second jack from the
deck if the first jack is not replaced?
Journal
How do independent and dependent events differ?
Probability
Mathpower 10
Scrabble p.362#1, 2d,g, i-k
Rock, Scissors, Paper all
p.374 #1 do any three
p.375 #3 a-c use chart p.381
Expected Values (8.3)
Pencil/Paper
In a contest at a local coffee/donut store the prizes are as
shown. What is the expected value for this contest?
Expected Values
Mathpower 10 p.382 # 2-6,9,10,13
= .94
If you spend more than $.94 at the store then you will spend
more than you win on average.
Pencil/Paper
At the Old Home Week Exhibition there is a game of chance
where you toss 2 coins. If both come up heads you will win
$4. If only one comes up heads you will win $1. If neither
comes up heads they pay you nothing. It costs $2 to play this
game. Complete the table below to determine the expected
value for this game. Should you play this game
113
Math 10
p.575 # 1-6
Journal
Design a game where you will raise
money for the school council during
the winter carnival. (Make sure you
don’t lose money for the school but
still give participants a reasonable
chance of winning.
# of rolls
Sum
Points
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
114
Total
Estimating the size of a Wildlife Population(bi-nomial population:tagged or not tagged)
To estimate the number of animals in a species, wildlife biologists use a capture - recapture sampling
technique. To simulate this process popcorn can be used. Have approximately 100 kernels in each ziplock bag where 10 in each bag has been spray painted black. ( see Math 10 p.560 for detailed
instructions)
Students do not know how many popcorn are in the bag that they have. Don’t allow
them to count them yet, that is done in step 8.
1. Place the unmarked popcorn(natural colour) in a styro-foam cup
º this represents the population at large
2. Count the number of marked popcorn (black)
º this is the number captured and released
3. Place the marked popcorn in the cup and mix the popcorn up.
º this represents the release of the captured into the wild where they mix with the
rest of the population
4. Pick 40 popcorn from the cup(don’t look - this is the random sample)
º this is the recapture
5. Count the number of marked popcorn
º this represents the number of marked items in the sample
6. Use the chart (sample size 40) to determine the percentage range of marked items in the population
For example, if there were 6 marked popcorn kernels, then by using the table we would get a
percentage range of 8% to 26%.
7. Use the steps below to estimate the size of the entire population (the total number of kernels in the
bag)
Using the 8% to 26% range. We know that 10 kernels are marked so the total population could range
from 38 to 125.
.08n = 10
.26n = 10
n = 125
n = 38
Therefore there is a 90% probability that there are between 38 and 125 popcorn (marked and unmarked)
in your bag.
8. Count the total number of popcorn in your bag. Does your prediction fall in an acceptable range?
115
Construction of box plots
If we look at 50% box plots then 50% of the data (values) are contained in the box and the remaining
50% are contained in the two whiskers combined.
For our example, 50% of 100 trials is 50. In the frequency table below (from p.106) we must try to get
the two whiskers adding to as close to 50 as possible (can’t be less than 50)
If we work inward from the outside columns in the table we see this development;
Column
1
2
3
4
5
6
7
8
9
10 11
12
# marked
8
9
10
11
12
13
14
15
16
17
18
19
Frequency
1
2
6
2
14
11
21
17
15
8
2
1
Combining the values of columns 1 and 12 we get a value = 2
Adding to the above total columns 2 and 11 we get = 6
Adding to the above total columns 3 and 10 we get = 20
Adding to the above total columns 4 and 9 we get = 37
Now as we approach 50 (the total we want) we will probably only be able to add one extra column at a
time
Adding to the above total column 5 we get = 51
If we had chosen to add to the above total column 8 we would have gotten = 54
So we can see that the best result comes from adding column 5 last to get a total of 51.
Column
1
2
3
4
5
6
7
8
9
10
11 12
# marked
8
9
10
11
12
13
14
15
16
17
18
19
Frequency
1
2
6
2
14
11
21
17
15
8
2
1
The same procedure of working from outside to inside is used for 90% box plots.
116
90% Box Plot Problems
Population to Sample (given the % of population, find # possible in a sample)
1) 30% of students at Three Oaks take Physics. In a random sample of 20 students, estimate how many
students could possibly be taking Physics.
2) At a certain school, 80% of the students take History. In a random sample of 40 students, estimate how
many students might be taking History.
3)In the town of Montague 18% of people speak two languages. In a random sample of 100 residents,
estimate how many people might speak two languages.
4) If 28% of 16 year-old people smoke, is it possible that a random sample of 40 people would contain 19
smokers?
5) The probability (chances) of correctly answering a true/false question is 50%. If you guess the
answers, can you correctly guess 24 out of 40 questions correctly, 90% of the time?
6) The probability of guessing a multiple choice question (each question has 5 possible answers) is 20%.
If you guess the answers, can you guess 15 out of 40 questions correctly 90% of the time?
Sample to Population (given the # possible in a sample, find the % of the population)
1) 8 out of 40 randomly selected grade 10 students say that they have a part-time job. Make an inference
about the percent of grade 10 students that have a part-time job.
2) In Westisle High School a survey showed that 12 out of 20 randomly selected students come from a
farm home. Use the box-plots to estimate the percent of students in Westisle who come from a farm
background.
3) Bluefield has 900 students. A survey showed that 26 out of 40 students were bussed to school:
a) make an inference about the percent of students who go to school by bus.
b) use the answer from (a) to estimate how many students are bussed.
Project/Presentation
4) Design a one question yes/no survey about a topic of your choice. Conduct your survey with a random
sample of 40 people. Use the results to make an inference about the percent of people who would
answer yes on the survey question. Explain how you chose your random sample. Which method of
sampling did you use? How were you able to eliminate bias in your question?
117
118
119
120
121
122