Download General maths: Univariate statistics

Document related concepts
Transcript
General Maths Chapter 1
Univariate Data
Chapter One – Univariate Data
• Assigned textbook questions to be up to date by 2nd Feb
• Exercise 1A
• Exercise 1B
• Exercise 1D
• Holiday Homework Sheet
1A – Categorical Data
Types of Data
Data can be divided into two major groups • Categorical Data (Qualitative)
• Numerical Data (Quantitative)
1A – Types of Data
Categorical Data can be placed into one of 2 categories –
1A – Types of Data
Numerical Data is in the form of numbers and can be either –
1A – Types of Data
What type of data is……???
The number of students who walk to school.
Numerical – Discrete
The types of vehicles that each of your parents drive.
Categorical – Nominal
The sizes of pizza available at a pizza shop.
Categorical – Ordinal
The varying temperature outside throughout the day.
Numerical - Continuous
1A – Working with Categorical Data
Once Categorical Data has been collected, it is important to be
able to summarise and display the data using –
Frequency Tables
Graphs – Bar Graphs or Dot Plots
1A – Working with Categorical Data
Once Categorical Data has been collected, it is important to be
able to calculate –
Frequency
The number of times that a particular thing has occurred
Relative Frequency
The number of times that a particular thing has occurred
The total amount of all data recorded
% Frequency
The relative frequency × 100
1A – Working with Categorical Data
Class Hair Colour Survey
Gather Data of the students in the classroom and use it to:
1. Summarise data using a frequency distribution table
2. Represent data using a bar chart
3. Find the frequency of those with brown hair
4. Find the relative frequency of those with brown hair
5. Find the % frequency of those with brown hair
1. Summarise data using a frequency distribution table
2. Represent data using a bar chart
3. Find the frequency of those with brown hair
4. Find the relative frequency of those with brown hair
5. Find the % frequency of those with brown hair
1. Summarise data using a frequency distribution table
Hair Colour
Brown
Blonde
Black
Red
Other
Tally
Total
1. Summarise data using a frequency distribution table
2. Represent data using a bar chart
3. Find the frequency of those with brown hair
4. Find the relative frequency of those with brown hair
Remember –
5. Find the % frequency of those with brown hair
2. Represent data using a bar chart
Class Hair Colours
Brown
Blonde
Red
Black
In a bar chart
the bars don’t
touch. Leave
gaps!
Other
1. Summarise data using a frequency distribution table
2. Represent data using a bar chart
3. Find the frequency of those with brown hair
4. Find the relative frequency of those with brown hair
5. Find the % frequency of those with brown hair
3. Find the frequency of those with brown hair
This is just the total number of people with brown hair
4. Find the relative frequency of those with brown hair
the total number of people with brown hair
Find this using:
the total number of people with surveyed
5. Find the % frequency of those with brown hair
Find this using: The relative frequency × 100
1A – Working with Categorical Data
Your Turn
Eye Colour Survey
Gather Data of the students in the classroom and use it to:
1. Summarise data using a frequency distribution table
2. Represent data using a bar chart
3. Find the frequency of those with brown hair
4. Find the relative frequency of those with brown hair
5. Find the % frequency of those with brown hair
1. Summarise data using a frequency distribution table
2. Represent data using a bar chart
Your Turn
3. Find the frequency of those with brown hair
4. Find the relative frequency of those with brown hair
5. Find the % frequency of those with brown hair
1. Summarise data using a frequency distribution table
Eye Colour
Tally
Total
Now complete the rest in your workbooks
Chapter 1A
Now do
Questions from your
work record
Q1a,b
Q2b,c
Q3a,b,c
Q6 Q8 Q9 Q10
1B – Working with Numerical Data
The remainder of this topic is concerned with Numerical Data.
With Numerical Data, each data point is known as a score.
Grouping Data
Numerical Data can be presented as either Ungrouped Data
or Grouped Data.
1B – Working with Numerical Data
Grouping Data
When we have a large amount of data, it’s useful to group
the scores into groups or classes.
When making the decision to group raw data on a frequency
distribution table, choice of class (group) size matters.
As a general rule, try to choose a class size so that 5 – 10
groups are formed.
Find the lowest and the highest scores to decide what
numbers need to be included in the groups.
1B – Working with Numerical Data
Grouping Data
We use an open ‘ – ‘ to
include all values up to the
number in the next column
eg1. Group the following data appropriately.
12
17
10
24
18
13
24
8
5
9
7
22
2
3
21
22
0-
5-
10 -
15 -
2
5
5
2
20
10
20 -
Tally
Frequency
6
11
6
1B – Working with Numerical Data
Grouping Data
eg2. Group the following data appropriately.
10.1
17.0
15.2
24.9
16.7
25
24.4
12.2
30.2
20
29
16.1
31.6
12.1
36.7
21
39.3
10
10 -
15 -
20 -
25 -
30 -
35 -
5
4
4
3
2
2
Tally
Frequency
11.5
28.1
1B – Working with Numerical Data
Histograms
Similar to a bar chart with a few very important changes:
• Columns are drawn right against each other
• A gap is left at the very start of the chart
• If coloured in, use the same colour for all columns
• A polygon may be drawn to link the columns
1B – Working with Numerical Data
Histograms
Ungrouped Data –
Data Labels appear directly under the centre of each column
1B – Working with Numerical Data
Histograms
Grouped Data –
End points of each class appear under the edge of each column
1B – Working with Numerical Data
Data Distribution
We can name data according to how it’s distributed.
Is it all crammed together or is there more data in certain areas??
We associate certain names with different shapes of distribution
•
•
•
•
•
Normal – Most common score in the centre of the data
Skewed – Most common score is toward one end of the data
Bimodal – More than one score that is most frequent
Spread – Data is spread over a wide range
Clustered – Most of the data is confined to a small range
1B – Working with Numerical Data
Data Distribution
Normally Distributed Data
• The most common score in the centre of the data.
• The graph is symmetrical.
1B – Working with Numerical Data
Data Distribution
Skewed Data
• The most common score is toward one end of the data.
• Most data toward the left – Postively Skewed
• Most data toward the right – Negatively Skewed
1B – Working with Numerical Data
Data Distribution
Bimodal Data
• More than one score that is most frequent
• This looks like two peaks on the graph
1B – Working with Numerical Data
Data Distribution
Spread Data
Data is rather evenly spread over a wide range
1B – Working with Numerical Data
Data Distribution
Clustered Data
Most of the data is confined to a small range
Now do
Chapter 1B
Questions from your
work record
Q1 Q2 Q4 Q6 Q7 Q8
Q9 Q10 Q11 Q12 Q13
1D – Measures of Centre
Would you agree that one of the main things statisticians do with
a set of data, is to find the average, the middle or the most
commonly occurring score? We call these values:
The Mean – The Average of all scores.
The Median – The middle score in a set of ordered data.
The Mode – The score which occurs most often.
We can find these values as follows…..
The Mean
The average of the scores
𝑀𝑒𝑎𝑛 = 𝑥 =
𝑥𝑖
𝑛
𝑡ℎ𝑒 𝑠𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑠𝑐𝑜𝑟𝑒𝑠 (𝑎𝑙𝑙 𝑠𝑐𝑜𝑟𝑒𝑠 𝑎𝑑𝑑𝑒𝑑 𝑡𝑜𝑔𝑒𝑡ℎ𝑒𝑟)
=
𝑡ℎ𝑒 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑐𝑜𝑟𝑒𝑠
eg. Find the mean of the data set:
4
2
𝑥=
6
7
10
4+2+6+7+10+3+7+3+6+7
10
3
7
=
55
10
3
= 5.5
6
7
The Median
The middle score of an ordered data set
𝑛+1
𝑀𝑒𝑑𝑖𝑎𝑛 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 =
𝑡ℎ 𝑠𝑐𝑜𝑟𝑒
2
For an ODD number of scores – median is a score in the data
For an EVEN number of scores – median is halfway between 2 scores
eg. Find the median of the data set:
4
2
6
7
10
3
7
Write the scores in order smallest – largest
2
3
3
4
Median = 6
6
6
7
7
7
10
Median =
3
10+1
2
6
=
11
2
7
= 5.5𝑡ℎ 𝑠𝑐𝑜𝑟𝑒
The Mode
The score which occurs most frequently
There can be one or more than one score which occurs most
frequently, in these cases they are both modes – list them both.
eg. Find the mode of the data set:
4
2
6
7
10
3
7
3
6
You may wish to write the scores in order to ensure all data is
accounted for but this is not necessary.
2
3
3
4
Mode = 7
6
6
7
7
7
10
7
eg. Find the Mean, Median and Mode of the following set of data
3 4 6
Mean
𝑥=
𝑥𝑖
𝑛
=
9 10 3 4 5 1 7 8
3+4+6+9+10+3+4+5+1+7+8
11
Median Order the scores….
=
60
11
= 5.45
1 3 3 4 4 5 6 7 8 9 10
Find the position of the median….
Median Position =
𝑛+1
2
=
11+1
2
=
12
2
= 6𝑡ℎ 𝑠𝑐𝑜𝑟𝑒
Median = 5
Mode
Two numbers each occur the most frequently so,
Mode = 3 and 4
Now do
Questions from your
work record
Chapter 1D Q1b,c
Q9a-f
Q15
Q2 Q3 Q4 Q8
Q10 Q11a-d Q14
Q16b
Holiday Homework Reminder!
• Assigned textbook questions to be up to date by 2nd Feb
• Exercise 1A
• Exercise 1B
• Exercise 1D
• Holiday Homework Sheet
Review Question
I surveyed 15 students and asked them the score they got on their test
(out of 60). The following data was obtained.
35, 45, 41, 46, 38, 56, 59, 43, 46, 45, 51, 53, 43, 46, 50
• What type of data is this called?
• Find the Mean, Mode and Median.
• Group this data in an appropriate class size and represent this using a
frequency table.
• Draw a histogram of the data.
• What do we call the distribution of this data?
I surveyed 15 students and asked them the current age of their
mothers in whole years. The following data was obtained.
35, 45, 41, 46, 38, 56, 59, 43, 46, 45, 51, 53, 43, 46, 50
• What type of data is this called? Numerical - Discrete
• Find the Mean, Mode and Median.
Mean:
𝑥𝑖
𝑛
= 46.67
Median: Order data smallest to largest. Median is the midpoint.
35, 38, 41, 43, 43, 45, 45, 46, 46, 46, 50, 51, 53, 56, 59
Mode: Number which occurs most often = 46
I surveyed 15 students and asked them the current age of their mothers
in whole years. The following data was obtained.
35, 45, 41, 46, 38, 56, 59, 43, 46, 45, 51, 53, 43, 46, 50
• Group this data in an appropriate class size and represent this using
a frequency table. Smallest number = 35, Largest number = 59
35 -
40 -
45 -
50 -
55 -
2
3
5
3
2
• Draw a histogram of the data.
• What do we call the distribution of this data?
Normally Distributed
I surveyed 15 students and asked them the current age of their mothers in
whole years. The following data was obtained.
35, 45, 41, 46, 38, 56, 59, 43, 46, 45, 51, 53, 43, 46, 50
Now lets try to find the Mean, Mode and Median using our calculators.
* Main Menu – choose Statistics
* Enter the data into List 1
* Choose Calc then One-variable.
* We have entered each number individually,
so each occurs once, so choose X-List: list1
Freq: 1
* Read off the list:
Mean = 𝑥
Mode = Mode
Median = Med
Review Worksheet
Then problems from text
Now Do
1A – 1e, 3d, 13
1B – 13
1D – 5, 6, 7
1E – Measures of Variability
Data can be viewed in ways other than just finding the middle
of the data (median/mode/mean).
To give us a truer representation of a set of data, we can
calculate how spread out our data is using:
•
•
•
•
The Range
The Interquartile Range (IQR)
The Standard Deviation
The Variance
1E – Measures of Variability
Range
1. Find the largest value in the set of data Xmax
2. Find the smallest value in the set of data Xmin
3. Subtract the smallest value from the largest value
Range = Xmax – Xmin
eg. 2 3 4 4 4 5 6 8
What would the range be?
8–2 = 6
1E – Measures of Variability
Interquartile Range (IQR)
To overcome problems due to extreme values, we can exclude the top &
bottom quarters of the data to find the range of the remaining data.
The lower quartile (Q1) is the number occurring ¼ of the way through the
data – the 25th percentile.
The upper quartile (Q3) is the number occurring ¾ of the way through
the data – the 75th percentile.
The IQR is the difference between these values, so can be found using:
IQR = Q3 – Q1
1E – Measures of Variability
Interquartile Range (IQR)
Steps to calculate the IQR
1.
Arrange the data in order of size
2. Divide the data in two by finding the median
3.
Using the lower half of the data, find the lower quartile (Q1) by
dividing this in two and find the midpoint
4. Repeat this for the upper half of data to find Q3
5.
Calculate the IQR by finding Q3 – Q1.
Eg. Calculate the IQR of the data: 4 7 2 1 10 2 7 6 9 5
1E – Measures of Variability
Interquartile Range (IQR)
Our Data: 4 7 2 1 10 2 7 6 9 5
1.
Arrange in size order:
1 2 2 4 5 6 7 7 9 10
2. Divide in half to find Median
3.
Find Q1
1 2 2 4 5
4. Find Q3
5.
1 2 2 4 5
6 7 7 9 10
Find IQR = Q3 – Q1
=7–2
=5
6 7 7 9 10
1E – Measures of Variability
Standard Deviation
Shows how much variation there is from the average.
A low standard deviation indicates that the data points tend to be
very close to the mean.
A high standard deviation indicates that the data points are spread
out over a large range of values.
Standard Deviation =
1E – Measures of Variability
Standard Deviation
1.
Find the mean 𝑥
2. Find the difference between each piece of data & the mean
3.
Square the differences
4. Add the squared differences
5.
Divide by the number of scores, less one
6. Take the square root
1E – Measures of Variability
Variance
The variance is simply the standard deviation squared.
1E – Measures of Variability
Information Overload??
Relax!
We can use our calculator to find these values too – make sure you
know how to use it!!
Video
Use these to find Range
Use these to find IQR
Standard Deviation
Variance =
(Standard Deviation) 2
1E – Working with Grouped Data
We can find the measures of centre (Mean, Mode, Median) and the
measures of spread (Range, IQR, Standard Deviation, Variance) of
grouped data by first finding the midpoint of each group.
Data
20 -
30 -
40 -
50 -
60 -
Frequency
4
5
10
7
6
Midpoint
25
35
45
55
65
These are
the
columns
we enter
into our
calculator
1E – Working with Grouped Data
eg. Find the mean and the standard deviation of the grouped data
35 -
40 -
45 -
50 -
55 -
2
3
5
3
2
Find the midpoint of each group.
37.5
42.5
47.5
52.5
57.5
2
3
5
3
2
Enter this new table into our calculator.
Menu -> Statistics -> Enter the data into 2 lists
Find the mean and the standard deviation of the grouped data
37.5
42.5
47.5
52.5
57.5
2
3
5
3
2
Calc  One-Variable
Choose XList: list1 (where your values are)
And Freq: list2 (where your frequencies are)
Mean → 𝒙
Standard
Deviation → 𝑺𝒙
Exercise 1E
Now Do
1c, 1e, 4a, 4b, 4c, 10,
12, 13, 14, 15, 16, 19
1F – Stem and Leaf Plots
Instead of using a frequency table, we can also display our data using
a stem and leaf plot.
Similar to frequency tables, we can choose an appropriate group size
in which to represent our data, usually using a class size of 5 or 10.
Example:
Stem & Leaf representation of the
following data (class size of 10):
6, 7, 8, 10, 12, 13, 14, 17, 17, 17, 18, 19,
21, 23, 24, 24, 25, 27, 31, 31, 32, 36,
36, 39, 41, 45, 45, 46, 49, 50
Example:
Stem & Leaf representation of the following data (class size of 10):
6, 7, 8, 10, 12, 13, 14, 17, 17, 17, 18, 19, 21, 23, 24, 24, 25, 27, 31, 31, 32, 36,
36, 39, 41, 45, 45, 46, 49, 50
Now lets try using a class size of 5…..
Separate each class using
*
beside the stem
number for the upper
end of the group
Include a key, this time
with an example for
both lower (no *) and
upper ends (with *)
1F – Stem and Leaf Plots
Your Turn: Organise the following data onto a stem and leaf plot
using class size of 10.
10, 12, 16, 21, 24, 27, 29, 31, 33, 34
Now try again using a class size of 5.
Now Do
Exercise 1F
1, 3, 4, 7, 8, 9, 10, 11, 12
1G – 5-Number Summary
We can summarise a set of data using a 5-Number summary.
This set of 5 numbers represents the spread of the set of data.
The 5-Number summary includes (in order):
Xmin
(Lowest Score)
Q1
(The Score of the way through the data)
Median
1
4
1
2
(The Score way through the data)
3
4
Q3
(The Score of the way through the data)
Xmax
(Highest Score)
1G – 5-Number Summary
eg1. Write the 5-Number summary for the following set of data.
3 4 4 6 8 9 9 10 13 15 16 18 19 19 20
3 4 4 6 8 9 9
Xmin
Q1
Median
Q3
Xmax
13 15 16 18 19 19 20
(Lowest Score)
1
(The Score of the way through)
4
1
(The Score way through)
2
3
(The Score of the way through)
4
(Highest Score)
So the 5-Number summary is:
3, 6, 10, 18, 20
1G – 5-Number Summary
7.5
eg2. Write the 5-Number summary for the following set of data.
Arrange in order:
5 8 2 10 13 8 9 3 4 4 16 18 7 3
2 3 3 4
2 3 3 4
Xmin
Q1
Median
Q3
Xmax
4
5
4
7
(Lowest Score)
1
(The Score of the way through)
4
1
(The Score way through)
2
3
(The Score of the way through)
4
(Highest Score)
5
7 8 8 9 10 13 16 18
8 8 9 10 13 16 18
So the 5-Number summary is:
2, 4, 7.5, 10, 18
Eg2 cont’d. Check your answer using the calculator
5 8 2 10 13 8 9 3 4 4 16 18 7 3
Our Answer was
Menu →
Statistics
2, 4, 7.5, 10, 18
Enter data
into list1
Calc →
One-Variable
Select the list your
data is in as XList
minX
Q1
Med
Q3
maxX
1G - Boxplots
We can represent the 5-number summary on a boxplot. Boxplots are:
• Always drawn to scale
• Drawn with labels (Xmin, Q1 etc) or with a scaled & labelled axis
running alongside the plot
Q1
Xmin
Q1
Median
Q3
Xmax
(Lowest Score)
1
( way through)
4
1
(
2
3
(
4
Xmin
Median
Q3
Xmax
way through)
way through)
(Highest Score)
Scale
1G - Boxplots
eg3. Using the 5-figure summary from example 2, sketch it’s boxplot.
The 5-Number summary is:
2, 4, 7.5, 10, 18
Xmin Q1
Median Q3
Xmax
Eg3 cont’d. We can also use the calculator to help sketch the plot.
Once the data has been input to the calculator, do the following:
SetGraph →
Check StatGraph1 box →
Choose Setting
Choose ‘Type’ →
Select ‘MedBox’
Check that selections
are OK →
Then click ‘Set’
To select point on the plot click:
Exercise 1G
Now Do
2, 4, 5, 8, 9, 10, 11, 12,
13, 14, 15
1H - Comparing sets of data
Back-to-back Stem & Leaf Plots
• Used to compare 2 similar sets of data
• The two sets of data share the same central stem
• Data is ordered from smallest to largest around the central stem
1H - Comparing sets of data
Back-to-back Stem & Leaf Plots
Create a back-to-back Stem & Leaf for the two sets of data (using a
class size of 10) :
Sample A: 4, 6, 7, 10, 12, 15, 19, 24
Sample B: 5, 7, 9, 9, 13, 16, 20, 22
Remember to
start each line
from the centre
and work your
way out
7, 6, 4
9, 5, 2, 0
4
0
1
2
5, 7, 9, 9
3, 6
0, 2
Always
include
a key!
1H - Comparing sets of data
Back-to-back Stem & Leaf Plots
Create a back-to-back Stem & Leaf for the two sets of data
(using a class size of 5) :
Sample A: 4, 6, 7, 10, 12, 15, 19, 24
Sample B: 5, 7, 9, 9, 13, 16, 20, 22
Remember to
start each line
from the centre
and work your
way out
Always
include
a key!
4 0
7, 6 0*
2, 0 1
9, 5 1*
4 2
5, 7, 9, 9
3
6
0, 2
1H - Comparing sets of data
Back-to-back Stem & Leaf Plots
Find the 5-number summary for each set of data
Xmin, Q1, Median, Q3, Xmax
Group One
2, 10, 19, 24, 27
Group Two
1, 7, 14, 23, 27
1H - Comparing sets of data
Side-by-Side Box Plots
• Recall Boxplot –
Q1
Xmin
Median
Q3
Xmax
• Two or more sets of data compared using side-by-side boxplots.
• The boxplots share a common scale so they can be compared
appropriately
1H - Comparing sets of data
Side-by-Side Box Plots
Compare the two box plots……..what can be said about the data?
1H - Comparing sets of data
Side-by-Side Box Plots
eg. Two sets of data gave the following 5-figure summaries.
Sample A
8, 10, 15, 21, 23
Sample B
5, 12, 18, 22, 25
Compare the two using side-by-side box plots.
Sample A
Sample B
Now Do
Exercise 1H
1 – 10; 14
Revision Problems
Univariate Data
Revision
Question One
Group Size
Frequency
The following table shows the
dinner bookings from a local
restaurant over an evening.
1
2
2
14
3
10
4
13
5
8
• What is the frequency of a group having 3 people?
• What is the relative frequency of a group with 3 people?
• What is the percentage frequency of a group with 3 people?
• What is the total number of people who attended the restaurant that evening?
• Draw a histogram of the data.
• What is the average group size?
Revision
Question Two
• State the minimum height.
Key:
15*
16
The stem and leaf plot below shows
the height of a group of 20
students.
8 = 158cm
0 = 160cm
Stem Leaf
• State the median height.
• State the Mode.
• State the IQR.
15*
8, 9
16
0, 2, 4
• State the Standard Deviation.
16*
5, 6, 6, 8, 9
• How many people over 172cm tall?
17
1, 3, 4, 4, 4
17*
5, 8, 9
• What is the relative frequency of a person who
is 166cm tall?
18
1, 4
• What type of distribution is this?
Revision
Question Three
The batting scores of two batsmen
were collected over a cricket season.
Their results are compared on the
boxplots below.
•
Which batsman had the highest score? What was this score?
•
Write a 5-number summary for each Batsman A & Batsman B.
•
Which batsman had the best median performance?
•
Which batsman had the smallest range?
•
What scores made up the top 50% of the runs by Batsman A?
•
What scores made up the bottom 25% of the runs of Batsman B?
•
Which batsman had the best overall result? Explain.
Revision
Question Four
Consider the following data that shows
the heights (in cm) of 40 girls who are
competing in trials to form a
basketball squad.
• Using your calculator * Find the points of central tendency, that is the Mean, Mode and Median.
* Find the measures of variability, that is the Range, IQR, Standard
Deviation & Variance.
* Find the 5-number summary and use this to draw the boxplot of the data.
• Draw a frequency table of the data, using a class size of 5.
• Represent the data on a histogram