Download File

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
PAPER 2 –STATISTICS (QUESTION 7)
To Know =>
Mean = Average =
Median =
Middle number
π‘ π‘’π‘š π‘œπ‘“ π‘£π‘Žπ‘™π‘’π‘’π‘ 
π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ π‘£π‘Žπ‘™π‘’π‘’π‘ 
Put numbers in ascending order and
find the middle value.
Mode =
most popular number
Page 33 (log tables)
All Formulas
Mean and Standard Deviation
Example:
6 is the mean of 3, 1, 9, x, 5
find x.
Where:
Average (mean)
x=
numbers
n=
total of numbers
6=
3+1+9+π‘₯+5
5
30 = 18 + x
X = 12
Frequency Table
Finding the mean and mode are different from a frequency table. You
have to consider the frequency.
Days
0
Absent (x)
No. of
7
employees
(f)
Mean
1
2
3
4
5
9
11
12
7
4
=
=
(0)(7)  1(9)  2(11)  3(12)  4(7)  5(4)
7  9  11  12  7  4
=
0  9  22  36  28  20
20
=
115
50
=
2.3 days
Multiply top by the
bottom in the
table. Add them
together and divide
by total on the
bottom.
Mode
= The value that goes along with the highest frequency.
The highest frequency is 12 and the value that goes with this is 3.
Therefore:
Find where the
Mode = 3
frequency is the
highest and read
the value that goes
above it.
Grouped Frequency Table
Are tables with ranges used for larger numbers not as accurate as doing each individually.
To get the mean
1. Get the Mid Interval Values in the ranges
2. The do the same as a frequency table βˆ‘ (top)(bottom)
βˆ‘ bottom
Example
Mid Interval Value
Number
No. of people
2
1-3
3
5
4-6
17
8
7-9
2
Get the Mean = (2)(3)+(5)(17)+(8)(2)+(11)(8) = 6+85+16+88 = 195
3+17+2+8
30
30
To get the Modal
4-6
17
11
10-12
8
= 6.5
is got by looking at the highest value in the bottom row of the table
and taking the range above it. So the modal is 4-6
To get the median
30 ÷ 2 =15th and 16th value
If you add the 1st and 2nd box 3+17=21 so 16th value must lie in the 4-6 range
Standard deviation
It’s the spread of numbers around the mean the lower the standard deviation the smaller spread.
1.
2.
3.
4.
5.
Find the Mean
Find the difference between the number and the mean (x)
2
Square each answer (x- )
Add the squares together and divide by the total numbers βˆ‘ (x- )2
Get the square root of the result
Example: Find the standard deviation of 2,5,6,7
2+5+6+7 = 20 = 5 = Mean /
4
4
(2-5)2+(5-5)2+(6-5)2+(7-5)2
4
= (-3)2+0+(1)2+(2)2 = 9+1+4 = 14 = 3.8 = √3.8 = 1.9
4
4
4
Standard deviation of a table
= βˆ‘ f ( x- )2
βˆ‘f
Same as above except you multiply each square by the frequency
Variable (x)
Frequency(f)
2
4
4
3
6
0
Get Mean (2)(4)+(4)(3)+(6)(0)+(8)(2) = 8+12+0+16
4+3+0+2
9
8
2
=36
9
=4
4(2-4)2+3(4-4)2+0(6-4)2+2(8-4)2 = 4(4)+3(0)+0(4)+2(16) = 16+32 = 48 = 5.333 =√5.33 = 2.3
9
9
9
9
The Empirical Rule
68% of data lies within 1 standard deviation
of the mean
,
+
95% of data lies between 2 standard
deviation of the mean -2
, +2
99.7% of data lies between 3 standard
deviations
-3
,
+3
Example A normal distribution has mean =60 and standard deviation = 5
68% data lies
95% of data lies
,
+
60-5 , 60 +5
55 , 65
[ 55,65]
- 2
,
+2
60-2(5), 60+2(5)
50
, 70
[50,70]
Margin of error
E=
Error = E
Sample = n
Sample surveys are never 100% accurate there is always an error.
n= 4000
E=
1
√4000
= 0.016
Note: The bigger the sample size the smaller the error
Confidence Interval
95% interval for the population
= sample proportion
P
= population
Example:
500 houses in a random sample. 80 said they had a pet. What is the confidence Interval?
Margin of error = 1
√500
80 - 1
< p < 80
500
√500
500
= 80
500
+
1
√500
0.11 < p < 0.20
11% < p < 20%
Hypothesises test
1.
2.
3.
4.
State the null hypotheses whatever is being claimed Ho
Then come up with the alternative hypotheses opposite claim H1
Get the sample population
Get the margin of error
5. Get the confidence Interval
6. Accept the null hypothesis if within % and reject if outside %
Example:
A company states 20% of visitors to their website buy one of their products. A sample of 400
is asked and 64 purchased a product. Check using the null hypothesis.
Ho = 20%
H1 = not 20%
64
400
-
1
√400
0.11
11%
< p < 64 +
1
400
√400
< p
< p
< 0.21
< 21%
Null hypothesis is true 20% falls between these figures so accept
Representing Data
Bar charts- bars separated
Line plots- dots on graph
Pie charts- know how to get angles and draw and go from picture to show data
Stem and leaf graphs- back to back and mean, median, mode, interquartile range
Scatter graphs shows if two sets of data are related
Histograms – bars stuck together
Drawing pie charts when given the information
Size 8
Size 10
3
Size 12
7
Size 14
10
Size 16
12
Size 18
6
2
3 X 360 = 27° Size 8
40
Chart Title
Size 18
7 X 360 = 63° Size 10
40
Size 8
Size 16
10 X 360 = 90° Size 12
40
Size 10
Size 14
12 X 360 = 108° Size 14
40
Size 12
6 X 360 = 54° Size 16
40
Size 8
Size 10
Size 12
Size 14
Size 16
Size 18
Draw circle with protractor and measure out each angle
2 X 360 = 18° Size 18
40
Given the pie chart put angle X people to get amount of each sector
360°
Shapes of Distributions
Normal distibution
Negative Skew –Tail to the left
Positive Stewed -tail on the right
Stem and leaf Graphs
Always write in order from lowest to highest and put in a key
Marks received by 19 students in test
2
3
4
5
6
7
8
2
4
2
4
0
2
1
6
7 9
5 589
2 67
6
4
Key: 4 2 = 42 marks
Find the range?
Subtract the lowest and highest values 84-22 =62
Find the Median?
19÷ 2= 9.5 – 10th value is 58 marks
Find the lower quartile Q1?
¼(19) = 4.7- 5th value is 47
Find the Upper quartile Q3?
¾(19) = 15th value is 67
Find the interquartile range Q3-Q1?
67-47 =20
ο‚·
If even numbers then you take the 2 values and add them and then ÷ 2 when getting median.
ο‚·
Same for the lower /upper quartile for example 5th and 6th value add then ÷ 2
Collecting Data
Primary Data- Information you collect yourself eg. surveys, questionnaires, experiments
Secondary Data- Information from existing records eg. Internet, newspapers etc.
DATA
Categorical
Data(words)
Nominal(not ordered)
Ordinal(ordered)
Numerical
Data(numbers)
Discrete(definite)
Continuous(Measured
height,speed,
weight,time)
Mean is better used for data when values are all around the same
Median is best used if there are outliers (extreme values)
Scatter Graphs
ο‚·
Used to show if 2 sets of data are related
ο‚·
Correlation is the measure of the strength of the relationship between two variables
ο‚·
The closer the points on the scatter graph are to a straight line the stronger the correlation
r=1
ο‚·
When a change in 1 variable directly causes a change in another variable there is a casual
relationship
Strong positive correlation
R=1
Weak positive correlation
R = 0.5
Strong negative correlation
R=-1
Weak negative correlation
R= - 0.5
No correlation
R=0