Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PAPER 2 βSTATISTICS (QUESTION 7) To Know => Mean = Average = Median = Middle number π π’π ππ π£πππ’ππ ππ’ππππ ππ π£πππ’ππ Put numbers in ascending order and find the middle value. Mode = most popular number Page 33 (log tables) All Formulas Mean and Standard Deviation Example: 6 is the mean of 3, 1, 9, x, 5 find x. Where: Average (mean) x= numbers n= total of numbers 6= 3+1+9+π₯+5 5 30 = 18 + x X = 12 Frequency Table Finding the mean and mode are different from a frequency table. You have to consider the frequency. Days 0 Absent (x) No. of 7 employees (f) Mean 1 2 3 4 5 9 11 12 7 4 = = (0)(7) ο« 1(9) ο« 2(11) ο« 3(12) ο« 4(7) ο« 5(4) 7 ο« 9 ο« 11 ο« 12 ο« 7 ο« 4 = 0 ο« 9 ο« 22 ο« 36 ο« 28 ο« 20 20 = 115 50 = 2.3 days Multiply top by the bottom in the table. Add them together and divide by total on the bottom. Mode = The value that goes along with the highest frequency. The highest frequency is 12 and the value that goes with this is 3. Therefore: Find where the Mode = 3 frequency is the highest and read the value that goes above it. Grouped Frequency Table Are tables with ranges used for larger numbers not as accurate as doing each individually. To get the mean 1. Get the Mid Interval Values in the ranges 2. The do the same as a frequency table β (top)(bottom) β bottom Example Mid Interval Value Number No. of people 2 1-3 3 5 4-6 17 8 7-9 2 Get the Mean = (2)(3)+(5)(17)+(8)(2)+(11)(8) = 6+85+16+88 = 195 3+17+2+8 30 30 To get the Modal 4-6 17 11 10-12 8 = 6.5 is got by looking at the highest value in the bottom row of the table and taking the range above it. So the modal is 4-6 To get the median 30 ÷ 2 =15th and 16th value If you add the 1st and 2nd box 3+17=21 so 16th value must lie in the 4-6 range Standard deviation Itβs the spread of numbers around the mean the lower the standard deviation the smaller spread. 1. 2. 3. 4. 5. Find the Mean Find the difference between the number and the mean (x) 2 Square each answer (x- ) Add the squares together and divide by the total numbers β (x- )2 Get the square root of the result Example: Find the standard deviation of 2,5,6,7 2+5+6+7 = 20 = 5 = Mean / 4 4 (2-5)2+(5-5)2+(6-5)2+(7-5)2 4 = (-3)2+0+(1)2+(2)2 = 9+1+4 = 14 = 3.8 = β3.8 = 1.9 4 4 4 Standard deviation of a table = β f ( x- )2 βf Same as above except you multiply each square by the frequency Variable (x) Frequency(f) 2 4 4 3 6 0 Get Mean (2)(4)+(4)(3)+(6)(0)+(8)(2) = 8+12+0+16 4+3+0+2 9 8 2 =36 9 =4 4(2-4)2+3(4-4)2+0(6-4)2+2(8-4)2 = 4(4)+3(0)+0(4)+2(16) = 16+32 = 48 = 5.333 =β5.33 = 2.3 9 9 9 9 The Empirical Rule 68% of data lies within 1 standard deviation of the mean , + 95% of data lies between 2 standard deviation of the mean -2 , +2 99.7% of data lies between 3 standard deviations -3 , +3 Example A normal distribution has mean =60 and standard deviation = 5 68% data lies 95% of data lies , + 60-5 , 60 +5 55 , 65 [ 55,65] - 2 , +2 60-2(5), 60+2(5) 50 , 70 [50,70] Margin of error E= Error = E Sample = n Sample surveys are never 100% accurate there is always an error. n= 4000 E= 1 β4000 = 0.016 Note: The bigger the sample size the smaller the error Confidence Interval 95% interval for the population = sample proportion P = population Example: 500 houses in a random sample. 80 said they had a pet. What is the confidence Interval? Margin of error = 1 β500 80 - 1 < p < 80 500 β500 500 = 80 500 + 1 β500 0.11 < p < 0.20 11% < p < 20% Hypothesises test 1. 2. 3. 4. State the null hypotheses whatever is being claimed Ho Then come up with the alternative hypotheses opposite claim H1 Get the sample population Get the margin of error 5. Get the confidence Interval 6. Accept the null hypothesis if within % and reject if outside % Example: A company states 20% of visitors to their website buy one of their products. A sample of 400 is asked and 64 purchased a product. Check using the null hypothesis. Ho = 20% H1 = not 20% 64 400 - 1 β400 0.11 11% < p < 64 + 1 400 β400 < p < p < 0.21 < 21% Null hypothesis is true 20% falls between these figures so accept Representing Data Bar charts- bars separated Line plots- dots on graph Pie charts- know how to get angles and draw and go from picture to show data Stem and leaf graphs- back to back and mean, median, mode, interquartile range Scatter graphs shows if two sets of data are related Histograms β bars stuck together Drawing pie charts when given the information Size 8 Size 10 3 Size 12 7 Size 14 10 Size 16 12 Size 18 6 2 3 X 360 = 27° Size 8 40 Chart Title Size 18 7 X 360 = 63° Size 10 40 Size 8 Size 16 10 X 360 = 90° Size 12 40 Size 10 Size 14 12 X 360 = 108° Size 14 40 Size 12 6 X 360 = 54° Size 16 40 Size 8 Size 10 Size 12 Size 14 Size 16 Size 18 Draw circle with protractor and measure out each angle 2 X 360 = 18° Size 18 40 Given the pie chart put angle X people to get amount of each sector 360° Shapes of Distributions Normal distibution Negative Skew βTail to the left Positive Stewed -tail on the right Stem and leaf Graphs Always write in order from lowest to highest and put in a key Marks received by 19 students in test 2 3 4 5 6 7 8 2 4 2 4 0 2 1 6 7 9 5 589 2 67 6 4 Key: 4 2 = 42 marks Find the range? Subtract the lowest and highest values 84-22 =62 Find the Median? 19÷ 2= 9.5 β 10th value is 58 marks Find the lower quartile Q1? ¼(19) = 4.7- 5th value is 47 Find the Upper quartile Q3? ¾(19) = 15th value is 67 Find the interquartile range Q3-Q1? 67-47 =20 ο· If even numbers then you take the 2 values and add them and then ÷ 2 when getting median. ο· Same for the lower /upper quartile for example 5th and 6th value add then ÷ 2 Collecting Data Primary Data- Information you collect yourself eg. surveys, questionnaires, experiments Secondary Data- Information from existing records eg. Internet, newspapers etc. DATA Categorical Data(words) Nominal(not ordered) Ordinal(ordered) Numerical Data(numbers) Discrete(definite) Continuous(Measured height,speed, weight,time) Mean is better used for data when values are all around the same Median is best used if there are outliers (extreme values) Scatter Graphs ο· Used to show if 2 sets of data are related ο· Correlation is the measure of the strength of the relationship between two variables ο· The closer the points on the scatter graph are to a straight line the stronger the correlation r=1 ο· When a change in 1 variable directly causes a change in another variable there is a casual relationship Strong positive correlation R=1 Weak positive correlation R = 0.5 Strong negative correlation R=-1 Weak negative correlation R= - 0.5 No correlation R=0