Download S1 Summarising and Representing Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Time series wikipedia , lookup

Transcript
Qualitative
Kinds of data
fred
lissy
max
jack
callum
zoe
luke
stephen
Quantitative
Continuous
10 red
15 blue
5 green
size
size
size
size
12 baby 5lb3oz
14
6lb10oz
16
7lb12oz
18
11lb1oz
160cm
172cm
181cm
4 bedroomed
3 bedroomed
2 bedroomed
Discrete
Page 10
Exercise 2A
Q1 - 5 and 7
Averages
There are three types of average
and they all begin with M
.....most popular value or class
.....middle value if all values are
placed in order
.....the sum of all the values shared
by how many values there are
Mean
How would you say the mean average differed
from the median average?
In which circumstances may you use the mean
rather than the median and vice versa?
Rather than describing how to find the mean in
words we need to learn some notation.
The sum of all the x values should be written as:
The sum of the values in the fx column should be
written as:
Page 15
Exercise 2B
Q1, 2, 4, 6 and 7
In what kind of questions will we need to
add the values in the fx column rather than
just add together all the x values?
Try finding the mean of the year 7 girl
heights and compare it to the year 7 boy
heights.
Would you use the mean or the median to
summarise this data and why?
If you used the raw data for the above
calculation why?
If you used the grouped data from the
frequency polygon or histogram work why
am I going to tell you you've made a
mistake?
Mean Formulae
for a list of data
for a frequency table of data
for grouped data it is necessary to find the
midpoint of each class first and use this as a
value for x and then use the same equation
above.
We use the "x Bar" notation to
represent a sample mean.
If I was using all the data possible
it would be called a population
mean and we use the "mu"
notation
Page 18
Exercise 2C
Q1, 4 and 5
Stem and Leaf
Put the data below into a suitable stem and
leaf diagram.
127, 135, 147, 147, 149, 139, 145, 155, 149, 155,
151, 159, 139, 141, 155, 160, 138, 144, 155, 148
156, 143, 147, 157, 152, 150, 161, 133, 146, 155
The data represents heights
of a first year class in a boys
school.
How else can you summarise
or represent this data?
Below is a list of the heights of 30 year 7
girls. Add these to the other side of your
stem and leaf diagram and make some
comparitive statements based on suitable
summary data you find.
127, 145, 147, 147, 149, 149, 145, 165, 139, 157,
152, 169, 129, 121, 158, 160, 148, 141, 155, 148
156, 143, 157, 156, 152, 150, 161, 133, 146, 155
Page 55
Exercise 4A
Q2 and 5
If you have grouped your data with equal
class widths, you have little to worry
about. However if the class widths are
uneven you will need to plot them against
frequency density rather than just
frequency.
Frequency Density = Class Frequency
Class Width
Sometimes Relative Frequency Density is
plotted on the y axis. This can be
calculated as:
Rel Freq Dens = Class Frequency
Total
Frequency
Histograms
Histograms are similar to bar charts apart
from the consideration of areas.
In a bar chart, all of the bars are the same
width and the only thing that matters is the
height of the bar.
In a histogram, the area is the important
thing.
It is best that Histograms are plotted
against frequency density or relative
frequency density.
They should also only be drawn with
continuous data.
Discrete or qualitative data can be plotted
in Bar Charts but their bars should not
really touch as they aren't connected
Page 64
Exercise 4E
Q1, 4 and 5
We may also need to find
the average from these
grouped continuous data
sets
Page 22
Exercise 2D
Q1 - 5
Summarising Data
What types of data summary have you come up
with so far?
And how do they differ?
Give examples of when one type would be better
than another.
You will recall finding
the median from a list
of data involves adding
one to the number of
values before halving it
to find out which value
(placed in order) you
should use.
For example in a list of
7 numbers the median
value is the
(7 + 1) / 2 th value
the 4th value
Quartiles
3, 5, 5, 6, 8, 10, 10
If you consider the quartiles
you can see that it is the 2nd and 6th values. These can be
found by dividing the number of values by 4 and as long as this
gives a whole number find the average of it's value and the
value above it.
If this yields a decimal value rather than a whole number then
always round UP to the value above it.
In a list of 14 numbers the lower quartile would be taken as the
14 / 4 th value or 3.5th value
so you'd take the 4th value
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14
Page 34
Exercise 3A
Q1 and 2
Interquartile Ranges and Boxplots
We use the Quartiles and minimum and maximum values
to draw boxplots.
These are great for comparing the spread of data
between two or more data sets
Q1
Q0
Q2
Q3
Q4
Where Q0 = min value
Q1 = lower quartile
Q2 = median
Q3 = upper quartile
Q4 = max value
Draw box plots to compare the year
7 height data you put into stem and
leaf diagrams earlier
Read page 57
Page 58
Exercise 4B
Q1 - 2
Page 59
Exercise 4C
Q1 - 2
Page 61
Exercise 4D
Q1 - 2
Cumulative Frequency
We have seen how to find the quartiles from a list
of data or stem and leaf diagrams.
We have also seen that data is often stored in
frequency distributions. If these are grouped it
becomes difficult to find these quartiles.
Why?
We used to overcome this by drawing
cumulative frequency curves.
by joining up these
two points we
pretend the 30
students with
between 40 to 50
marks are spread
evenly throughout
the band.
60 students got
below 50 marks
30 students got
below 40 marks
Interpolation
We could actually find this value much quicker
by using some simple mathematics known as
interpolating.
Think how you could find the mark of the 40th
student from this year 10 class using just the
data rather than reading from the graph.
Now try estimating the mark of the 55th
student.
Remember estimating doesn't mean guessing; it
involves exact calculations but it is unlikely to be the
true mark as the students are unlikely to be spaced
evenly througout the class. we cannot know the exact
mark without the raw data - we are not given this with
grouped data - hence we estimate
Quantiles
We can divide the data into as many equal parts
as we like.
Quartiles divide in four
Deciles in ten and Percentiles into 100
The formula below is known as interpolating
and estimates a quantile by assuming the data
collected in each class is spread evenly.
It should be very similar to the formula yo
created earlier to find quartiles without drawing a
cumulative frequency curve.
Quantile = b + (Qn - f) x w
fm
Where fm is the frequency of the class the quantile
falls in
f is the cumulative frequency up to the class the
quantile falls in
w is the class width of the class the quantile falls in
Q is the quantile you are finding expressed as a
fraction
n is the number of data
b is the LOWER bound of the class the quantile fits
in
Page 34
Exercise 3A
Q3 and 5
Page 37
Exercise 3B
Q1, 3 and 5