Download Representation of Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Data mining wikipedia , lookup

Time series wikipedia , lookup

Transcript
Representation of Data
CURRICULUM CONTENT:
Candidates should be able to:
Representation of data , select a suitable way of presenting raw statistical data, and discuss
advantages and/or disadvantages that particular representations may have, construct and
interpret stem-and-leaf diagrams, box-and-whisker plots, histograms and cumulative
frequency graphs, understand and use different measures of central tendency (mean, median,
mode) and variation (range, inter quartile range, standard deviation), e.g. in comparing and
contrasting sets of data; use a cumulative frequency graph to estimate the median value, the
quartiles and the inter quartile range of a set of data; calculate the mean and standard
deviation of a set of data (including grouped data) either from the data itself or from given
totals such as βˆ‘ 𝒙 , βˆ‘ π’™πŸ , βˆ‘(𝒙 βˆ’ 𝒂) andβˆ‘(𝒙 βˆ’ 𝒂)𝟐
Topics:
a. Data, statistical data or raw data. It is a set of numerical information e.g. Number of students
in different sections of a school. Heights of boys in a class. Marks of students in a class test.
b. Discrete Data that can take exact values is called the discrete. The number of cars passing a
check point in 30 minutes. The number of tomatoes on each plant in green hose.
c. Continuous data: if data is not exact then it is called continuous data. The speed of vehicles as
it passes a check point.
d. Notation of βˆ‘ 𝒙and βˆ‘ π’™πŸ
e. Mean: 𝒙 =
π‘Ίπ’–π’Ž 𝒐𝒇𝒂𝒍𝒍 𝒗𝒂𝒍𝒖𝒆𝒔 𝒐𝒇 𝒅𝒂𝒕𝒂
𝒕𝒐𝒕𝒂𝒍 π’π’–π’Žπ’ƒπ’†π’“ 𝒐𝒇 𝒗𝒂𝒍𝒖𝒆𝒔 π’Šπ’ 𝒅𝒂𝒕𝒂
=
βˆ‘π’™
𝒏
.
f.
Basic properties of mean.
ο‚· Mean of the constant set of data.
ο‚· 𝒂𝒙 + 𝒃 = 𝒂𝒙 + 𝒃
ο‚· Mean of two same set of data with different values.
g. Median: it is the mid values of set of data after arranging in descending or ascending order. It
can be calculated by two different ways
ο‚· When even number of values are given.
ο‚· When odd number of values are given.
h. Mode: The mode is the value that occurs most often.
i. Upper and Lower quartiles. Upper quartile is a value before which the 25% data is lying and it
is denoted by π‘ΈπŸ . Lower quartile is a value before which the 75% data is lying and it denoted
by π‘ΈπŸ‘ The whole data is devided into two equal halves the median of first half is lower quartile
and the median of second half is upper quartile.
Q.1
Find the mean median, lower quartile, upper quartile, and mode of the following marks
of the students in a class test.
7
3
3
5
3
7
6
5
4
3
6
j. Stem and leaf diagram. It is used to arrange the large data in ascending order. Advantages
and disadvantages of stem and leaf diagram.
Q.2 Draw stem and leaf diagram for the following data. What conclusions can you draw from
the diagram? Remember to give a key to each diagram.
(a) The masses, correct to nearest kg, of 30 men:
74, 52, 67, 68, 71, 76, 86, 81, 73, 68, 64, 75, 71, 57, 67, 57, 59, 72, 79, 64, 70, 74, 77, 79,
65, 68, 76, 83, 61, 63
(b) The times, correct to the nearest second, taken by 20 boys to swim one length of a pool:
32, 31, 26, 27, 27, 32, 29, 26, 25, 25, 29, 31, 32, 26, 30, 24, 32, 27, 26, 31.
Compiled By : Sir Rashid Qureshi
www.levels.org.pk
1
(c) A group of adults take part in a reaction-timing experiment. Their results are measured to
the nearest hundredth of a second.
0.14, 0.17, 0.21, 0.20, 0.20, 0.22, 0.14, 0.24, 0.26, 0.17, 0.14, 0.17, 0.21, 0.20, 0.22, 0.14,
0.24, 00.26, 0.17, 0.18, 0.17, 0.21, 0.20, 0.23, 0.17, 0.23, 0.21, 0.23, 0.24 0.23.
(d) The daily hours of sunshine in London during the month of august.
7.0, 7.6, 12.5, 12.9, 8.3, 9.7, 8.4, 11.1, 7.5, 7.5, 9.8, 10.4, 11.6, 11.3, 7.3, 7.8, 6.5, 6.2, 6.1,
5.6, 5.6, 5.8, 4.8, 4.3, 0.0, 0.6, 0.8, 1.6, 0.2, 2.4, 2.6
k. Variance and standard deviation of raw data. Variance is a measure of how far a set of
numbers is spread out. Variance is the mean of squares of deviations from mean of set of
data. The positive square root of variance is called the standard deviation of the set of values
of data.
Note: low standard deviation indicates that the data points tend to be very close to the mean,
whereas high standard deviation indicates that the data points are spread out over a large
range of values. Consistent and inconsistent data.
Formulas to calculate the variance and standard deviation:
ο‚·
Variance =
ο‚·
Variance =
βˆ‘ π’™πŸ
βˆ’ (𝒙)𝟐 and
𝒏
βˆ‘(π’™βˆ’π’™)𝟐
𝒏
Q.3
l.
The age at which a child first walked (to the nearest month) was recorded for 8 children.
The results were as follows.
12
11
16
19
10
12
12
13
Calculate the mean and standard deviation of the data.
Range of set of data, five point summery and Box and whisker diagram.
Q.4
30 Girls estimate the length of a line in cm, correct to the nearest mm:
9.2, 7.3, 7.0, 6.5, 5.4, 5.3, 10.1, 8.4, 8.8, 7.1, 7.6, 7.9, 6.7, 9.6, 5.5, 7.4, 7.0, 8.2, 5.5, 7.8,
8.2, 7.5, 6.1, 6.1, 3.9, 6.8, 7.6, 8.1, 8.0, 10.0
Find the five point summary of this set of data, hence draw Box and Whisker diagram of
this data.
m. Frequency distribution: It is arrangement of data in rows and columns with their respective
frequencies.
n. Terminology of frequency distribution .i.e. Class intervals and class boundaries. Upper and
lower class limit and boundaries. Class size or width. Class mark or midpoint.
Q.5
i. 38 children solved a simple problem and the time taken by each was noted.
Find the class boundaries, class mark, class size and
Time (Seconds)
Frequency
ii.
5- 10- 20- 25- 40- 452 12 7 15 2 0
The masses, measured to the nearest kg, of 200 girls were recorded.
Find the class boundaries, class mark, class size and
Mass (kg)
Frequency
41-50 51-55 56-60 61-70 71-75
21
62
55
50
12
Compiled By : Sir Rashid Qureshi
www.levels.org.pk
2
o. Histogram: It is a bar chart without gaps; the area of the rectangles is proportional to the
frequencies of the class interval.
p. Histogram is with equal and unequal class width.
Q.6
On a particular day, the length of stay of each car at a city car park was recorded.
Length of the stay (min)
t < 25
25 ≀ t < 60
60 ≀ t < 80
80 ≀ t < 150
150 ≀ t < 300
Frequency
62
70
88
280
30
Express given information in Histogram.
q. Mean variance and standard deviation of frequency distribution.
ο‚·
Variance =
ο‚·
Variance =
Q.7
i.
βˆ‘ π’‡π’™πŸ
βˆ’ (𝒙)𝟐 and
βˆ‘π’‡
βˆ‘ 𝒇(π’™βˆ’π’™)𝟐
βˆ‘π’‡
Find the mean and variance of following frequency distribution.
Ages
5-9
10-14
15-19
20-24
25-29
30-34
F
4
6
12
10
7
1
Speeds
101-104
105-108
109-112
113-116
117-120
f
13
18
21
12
6
Q.8
For a particular set of data
N = 100, βˆ‘(x – 50) = 123.5, βˆ‘(x – 50)2 = 238.4
Find the mean and the standard deviation of x.
Q.9
Find the variance of x if
βˆ‘f(x – 100) = 127, βˆ‘f(x – 100)2 = 2593, βˆ‘f = 20
r. Cumulative frequency: The cumulative frequency of a class interval is the sum of frequencies of
that class and all its preceding classes.
s. Cumulative frequency graph: The graph of cumulative frequencies against the upper class
boundary is called cumulative frequency curve.
Q.10 Draw cumulative frequency curve
Age
(years)
1620253035-
Number of
Births
70
470
535
280
118
Compiled By : Sir Rashid Qureshi
www.levels.org.pk
3
45-
0
Past Papers Questions:
Q.1
Q.2
Q.3
Q.4
Q.5
Compiled By : Sir Rashid Qureshi
www.levels.org.pk
4
Q.6
Q.7
Q.8
Compiled By : Sir Rashid Qureshi
www.levels.org.pk
5
Q.9
Q.10
Compiled By : Sir Rashid Qureshi
www.levels.org.pk
6
Compiled By : Sir Rashid Qureshi
www.levels.org.pk
7