Download Using your GDC to calculate the χ 2 statistic

Mathematical Studies Standard Level for the IB Diploma Revision Topic 4: Statistical applications Chapter 11: The normal distribution The normal distribution curve The shape of data plotted in a histogram can be compared to the normal distribution curve. This is a standardised view of how data can be distributed, and it has the following properties:         bell-shaped symmetrical about the mean value, μ equal values for the mean, median and mode area under the curve equals 1 68% of the data lies within 1 standard deviation, σ, of the mean 95% of the data lies within 2 standard deviations of the mean 99% of the data lies within 3 standard deviations of the mean to find the standard deviation marks along the horizontal axis for the percentages 68%, 95% and 99%, start at the middle (the mean) and add or take away the correct number of standard deviations. The normal distribution is written using this notation: X  N(μ, σ2) In a question you may be told that some data, X, follows the normal distribution, X  N(12, 52), or you may just be given the values of μ and σ. Copyright Cambridge University Press 2014. All rights reserved. Page 1 of 12 Mathematical Studies Standard Level for the IB Diploma Probability calculations using the normal distribution You may be asked for the probability that an event will happen or the percentage of time that an event occurs. These mean the same thing, which is that you should work out the area under the curve between two points. You should use your GDC to do this, obtaining both a graph of the relevant area and the value that you want. Questions that ask you to do probability calculations with the normal distribution will give you the following information:  that the data follows a normal distribution or is ‘normally distributed’  the value of the mean  the value of the standard deviation  one or two boundary values. When given boundary values, the question will ask you to calculate a probability associated with one of the following situations: Situation given in the question More than a value Between two values Below a value Lower value to enter into GDC the value you are given the lower value you are given −99999 Upper value to enter into GDC 99999 the higher value you are given the value you are given Using your GDC It is easier to follow what to do on your GDC if we look at a particular example: A large number of mobile phone calls were monitored, and their lengths were recorded to the nearest minute. The call lengths were found to be normally distributed with a mean of 12 minutes and a standard deviation of 5 minutes. What is the percentage of calls that lasted over 15 minutes? This question gives you the following values:  lower boundary value = 15  upper boundary value = 99999  μ = 12  σ=5 Copyright Cambridge University Press 2014. All rights reserved. Page 2 of 12 Mathematical Studies Standard Level for the IB Diploma Texas TI-84 Before drawing the graph on the TI-84, you need to set your window so you can actually see the graph: Set the window boundaries as follows: Xmin = μ − 3σ Xmax = μ + 3 Ymin = −0.25 Ymax = 0.25 This will give you shadenorm(…); then the values should be entered in this order: lower upper mean standard deviation Casio fx-9750GII Get to the variable screen: Input the values in this order: lower upper standard deviation mean Then draw the graph and read off the probability value: Then draw the graph and read off the probability value (Area): The GDC gives P = 0.274, so the percentage of calls lasting more than 15 minutes is 27.4%. Copyright Cambridge University Press 2014. All rights reserved. Page 3 of 12 Mathematical Studies Standard Level for the IB Diploma Inverse normal calculations If you know the probability of an event happening, along with the mean and standard deviation of the normal distribution, you can work out the boundary value(s) of the event. For example, if 30% of a group of students scored below the pass mark on a test, and you know that   45 and   7.2 for the test scores, then you can find the pass mark. To do this, you need to use the ‘inverse normal’ function on your GDC. You will need to input the following values to calculate the boundary value: Area under the normal curve σ μ This is the given probability or percentage written as a decimal. The standard deviation The mean So, for the example above you would do the following: Texas TI-84 Casio fx-9750GII Navigate to the inverse normal function. Enter the known values in this order: Area, μ, σ Input the values given in the question. Note: The TI-84 always gives the ≤ value or the left tail boundary. If you want the ≥ value (right tail boundary), you need to subtract the GDC result from 1 to get the final answer. So the pass mark was 41.2 (or 41 to the nearest whole number). Enter the values of Area, σ, μ. ‘Tail’ means the side of the graph that is shaded and depends on whether the probability given in the question corresponds to ≤ or ≥ the boundary value: ≤ is left ≥ is right Copyright Cambridge University Press 2014. All rights reserved. Page 4 of 12 Mathematical Studies Standard Level for the IB Diploma Chapter 12: Correlation The concept of correlation Bivariate data Correlation Data that consists of measurements of two variables collected from each individual in a sample The relationship between the two variables of bivariate data The variables of bivariate data can be classified as follows: Independent variable Dependent variable Variable that is controlled by the person conducting the study Observed variable that should demonstrate the effect of the hypothesised relationship For example, in the hypothesis ‘A greater number of calories eaten per day will make a person heavier’ the independent variable is the number of calories consumed per day and the dependent variable is the person’s weight. Scatter diagrams The easiest way to see if there is a pattern in bivariate data is to draw a scatter diagram by creating coordinates from your data in this order: (independent variable value, dependent variable value) Then plot these coordinates on a grid and look at the grouping of the points to determine what type of correlation there is. Positive correlation As one variable increases, so does the other. No correlation No apparent relationship Copyright Cambridge University Press 2014. All rights reserved. Negative correlation As one variable increases, the other decreases. Page 5 of 12 Mathematical Studies Standard Level for the IB Diploma Correlation and causation  Just because two variables have a correlation, it doesn’t mean that one causes the other. Be cautious when making judgements based on data.  Don’t forget to consider all the variables that might affect the results. Line of best fit To highlight the relationship between the two variables of bivariate data, you should draw a line of best fit on your scatter diagram. To do this, follow these steps:    Find the mean of each variable (i.e. the data plotted along the x-axis and the data plotted along the y-axis), giving you the mean point (x, y) . Plot the mean point on the scatter diagram. Draw a line through the mean point so that the other points of the scatter diagram are spread evenly above and below the line. This line represents the relationship between the two variables. Drawing a scatter diagram and line of best fit on your GDC Texas TI-84 Casio fx-9750GII Put the bivariate data into your GDC: enter it in the data table as two lists, the first for the independent variable and the second for the dependent variable. Copyright Cambridge University Press 2014. All rights reserved. Page 6 of 12 Mathematical Studies Standard Level for the IB Diploma Set the graph type to ‘scatter’. Then draw the graph. Once you have created the scatter diagram, you can get the GDC to calculate the line of best fit along with a measure of the strength of the correlation. Texas TI-84 Casio fx-9750GII In this case the equation of the line of best fit (regression line) is y = −0.944x + 10.3 To draw the regression line on the TI-84, you To draw the regression line on the scatter need to manually input the data into the [Y=] diagram: screen. Copyright Cambridge University Press 2014. All rights reserved. Page 7 of 12 Mathematical Studies Standard Level for the IB Diploma Pearson’s product moment correlation coefficient This is a measure, based on the data and the line of best fit, which tells you how strong the correlation is. Remember the following points:  This coefficient is usually denoted by r.  −1 ≤ r ≤ 1  If r = +1, there is a perfect positive correlation.  If r = 0, there is no correlation.  If r = −1, there is a perfect negative correlation.  If the value of r is between −0.5 and 0.5, the correlation is too weak to draw any meaningful conclusions from the regression line.  The closer to ±1 the value of r is, the stronger the correlation. In the GDC example above, r = −0.913, which indicates a very strong negative correlation. Regression line of y on x A regression line is a line of best fit that minimises the overall distance between the data points and the line of fit. Remember that:  The line has an equation of the form y  ax  b  You should use your GDC to find the values of a and b.  You should rearrange the equation so that it is written sensibly.  If the correlation is strong, you can use the regression line to predict values.  If the correlation is weak (i.e. −0.5 < r < 0.5), then you should not predict values using the regression line.  You should not use a regression line to predict values outside the range of data given. In the GDC example above, a = −0.944 and b = 10.3, so the equation of the regression line is y  0.944 x  10.3 , which could also be written as y  0.944 x  10.3 . Copyright Cambridge University Press 2014. All rights reserved. Page 8 of 12 Mathematical Studies Standard Level for the IB Diploma Chapter 13: Chi-squared hypothesis testing The chi-squared test is used to see if two variables are independent. It can also be used to assess whether data differs significantly from what is expected, called the ‘goodness of fit’. Expected frequencies First, you need to be able to work out the frequencies that you would ‘expect’ to see, based on some hypothesis you assume for the data. How this is done depends on the type of problem you have.  For a goodness-of-fit test: Assuming a certain theoretical distribution for the data, the expected frequency of each outcome would be total frequency × probability of that outcome occurring  For a test of independence of two variables: Given a two-way table summarising the observed row total  column total frequencies of the data, the expected frequencies would be total (This is the probability of the row outcome multiplied by the column total, or vice versa, and it gives you the correct share of the total you should expect.) The χ2 statistic The χ2 statistic is a measure of the discrepancy between the observed and expected frequencies. You should use your GDC to calculate the χ2 statistic, and then interpret it in relation to the following: Critical χ2 value Significance level,  The threshold value above which the discrepancy is considered significant. In exam questions you will be given this value. The maximum probability of making a mistake in your conclusion, deciding that the result is significant when actually it isn’t. In questions you will be given this value, and it is normally 1%, 5% or 10%. Null hypothesis, H0 The hypothesis that the factors being tested are independent. Alternative hypothesis, H1 The hypothesis that the factors being tested are dependent. p-value The probability of getting a discrepancy as large as the calculated χ2 statistic if the theoretical distribution or null hypothesis were correct. The number of outcomes that can be independent, given that the total frequency is fixed. In the goodness-of-fit test, degrees of freedom  number of outcomes  1 In the independence test, degrees of freedom  (number of rows  1)  (number of columns  1) Degrees of freedom Copyright Cambridge University Press 2014. All rights reserved. Page 9 of 12 Mathematical Studies Standard Level for the IB Diploma Using your GDC to calculate the χ2 statistic Suppose you need to do a goodness-of-fit test on the following data, where the theoretical distribution assumes equally likely outcomes: A 11 B 6 C 9 D 10 Put the data into your GDC as a list. Texas TI-84 Casio fx-9750GII Go into list mode. Enter your data (observed frequencies) in list 1 and the expected frequencies in list 2. Access the χ2 statistic and p-value. Copyright Cambridge University Press 2014. All rights reserved. Page 10 of 12 Mathematical Studies Standard Level for the IB Diploma Suppose you need to perform an independence test on the following bivariate data, given in a twoway frequency table: A B C a 15 12 8 b 6 11 7 c 9 6 14 D 6 7 20 In this case, put the data into a table or matrix. Texas TI-84 Casio fx-9750GII Go into matrix edit mode. Set the size of the matrix: enter the number of rows first (3) followed by the number of columns (4). Then input your data. Access the χ2 statistic and p-value. If you need to see the expected frequencies (which are calculated automatically by the GDC), open ‘matrix B’. Copyright Cambridge University Press 2014. All rights reserved. Page 11 of 12 Mathematical Studies Standard Level for the IB Diploma Understanding the χ2 statistic and p-value The GDC gives you the χ2 statistic, the p-value and the degrees of freedom (df). For each test, comparing the χ2 statistic with the critical value or the p-value with the significance level will lead to the following conclusions: χ2 statistic χ2 < critical value χ2 > critical value p-value comparison p-value > significance level p-value < significance level Goodness-of-fit test Good fit Not a good fit Copyright Cambridge University Press 2014. All rights reserved. Independence test Accept the null hypothesis Reject the null hypothesis Page 12 of 12

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Using your GDC to calculate the χ 2 statistic