Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
This test compares 2 types of data to see whether one is independent of the other or if there is some link between them. For example, is favorite musical style independent of age, or can we see a link between the two? Here is an example question: For his Mathematical Studies Project a student gave his classmates a questionnaire to fill out. The results for the question on the gender of the student and specific subjects taken by the student are given in the table below, which is a 2 × 3 contingency table of observed values. History Biology French Totals Female 22 20 18 (60) Male 20 11 9 (40) Totals (42) (31) (27) TOTAL 100 Step 1: State the null hypothesis and the alternative/alternate hypothesis. H 0 null hypothesis : The variables are independent. H1 alternative/alternate hypothesis : The variables are NOT independent. Do not say that they are dependent, although sometimes the IB questions will ask you if they are dependent. H0: Subject taken is independent of gender H1: Subject taken is not independent of gender Step 2: Find the chi-square calculated value. a. We make another similar table showing expected values: Female Male History Biology French p 18.6 16.2 16.8 12.4 10.8 The expected values for each cell are calculated from values in the first table as follows: To find p : 60 x 42 x 100 = 25.2 100 100 Note how the horizontal and vertical totals relating to cell p are used. This is similar to multiplying independent events in probability. P: probability of female x probability of history x total b. Use the formula to calculate the Chi Squared value χ2 = Σ (fo – fe)2 fe fo observed frequencies & fe expected frequencies For each cell in the table, subtract the expected value from the observed value and square the answer. Divide this by the expected value Add together these values for all the cells. c. Calculator way: We find the Chi Squared value by putting the values from the table of observed values and the table of expected values into the calculator. 1) 2) 3) 4) 5) 6) 7) Choose MATRIX (2nd x^-1) and go to EDIT Make sure your matrix is the right size (here we have a 2x3) Enter your Observed (don’t include the total columns) values in Matrix A Choose STAT and go to TESTS Scroll down to χ2-Test and press ENTER Choose Calculate. The calculator finds the expected values and stores them in Matrix B Step 3: Find the chi-square critical value. a. We must find the number of degrees of freedom. This is simple: (number of rows – 1) x (number of columns – 1), here (2-1) x(3-1) = 1 x 2 = 2 b. Note the level of significance. This is given in the IB exam but you have to decide which level to use in your project. The most common levels are 1%, 5%, and 10%. c. Use the level of significance to find the critical value from the table of data in the data booklet: Go down the left column until you reach the number of degrees of freedom (here 2) Go across until you reach the column represented by 1 – significance level. For example go across until you reach the column represented by 0.95 and read off your value. (note: we use 0.95 because there is usually a 5% significance level and 0.95 is 1 – 5%) The level of significance indicates the level of “error” we are willing to accept in making our conclusion. A 1% level of significance means that with an average of 1 time out of 100 we rejected the have rejected it. H 0 when we shouldn’t The smaller the level of significance, the more statistically significant our conclusion will be. We usually use a level of significance of 5%, but sometimes we will use 1% or 10%. Step 4: Compare the chi-square calculated value to the chi-square critical value. a. If the χ2 value is less than the critical value, we fail to reject the null hypothesis. In other words, If 2 2 calc crit , then we fail to reject the null hypothesis (we cannot accept). This means that there is not enough evidence to justify a rejection of H 0 and so we would conclude that the variables are independent. b. If the χ2 value is more than the critical value, we reject the null hypothesis. In other words, If 2 2 calc crit , then we reject the null hypothesis. This means that there is enough evidence to justify rejecting H 0 and this would indicate that the variables are NOT independent. We choose H1 with the understanding that we have not proved H1 beyond a reasonable doubt. For example: “Since 1.78 < 5.99 Fail to Reject H0 – Subject taken is independent of gender” Using the p-value to decide whether to reject or fail to reject: a. Calculate the p-value the same way you find the chi-square calculated value (Step 2) b. If the p-value is less than the significance level then we reject the null hypothesis. c. If the p-value is more than the significance level then we fail to reject the null hypothesis. For example p-value= 0.411. So we reject the null hypothesis because 0.411 < 0.05. Name ______________________________________________ Date ______________ Period _________ Ibms Statistics review Probability and Statistics 1. A country motel has room for 80 people. The manager keeps records of the number of guests staying at the motel over the summer period of 90 days. The results are shown below. Number of Guests 1 – 10 11 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 – 70 71 – 80 Frequency (days) 8 11 14 17 20 9 7 4 a) Calculate an estimate of: i) ii) iii) the mean median standard deviation b) State the modal group c) Construct a histogram and frequency polygon of the data. Use a scale of 1 cm to represent 10 guests on the horizontal axis and 1 cm to represent 1 day on the vertical axis. d) Prepare a cumulative frequency curve of the data. Use a scale of 1 cm represents 5 guests on the horizontal axis and 1 cm represents 5 days on the vertical axis. Use your curve to find: i) ii) iii) 2. Julie examines a new variety of bean and does a count on the number of beans in 33 pods. Her results were: 5, 8, 10, 4, 2, 12, 6, 5, 7, 7, 5, 5, 5, 13, 9, 3, 4, 4, 7, 8, 9, 5, 5, 4, 3, 6, 6, 6, 6, 9, 8, 7, 6 a) b) c) d) e) 3. the number of nights when the motel had 30 or less guests the number of nights when the motel had less than 45 guests The motel will “break even” if it has less than 30 guests for no more than 25% of the time. Did the motel break even? Find the mean, median, mode, standard deviation, lower and upper quartiles for this set of data. Find the interquartile range of the data set. What are the lower and upper boundaries for the outliers? Are there any outliers? Draw a box and whisker plot of the data set. Eight sample values are: 6, a, 7, a, 4, b, 6 and 8 where a and b are single digit numbers and the mean is 7. a) Show that a and b have two possible solutions. b) If there is a single mode, what is the median? The diagram shows the distribution of weekly income of the employees of a small company. Frequency 4. 4 2 0 200 400 600 800 Weekly Income ($) a) b) c) d) 5. i) How many employees are recorded? Estimate the mean weekly income. Estimate the median wage. If the owner of the company, who receives $1680 per week, is added to the data, which of these measures will be affected most; median or mean? A local cricket club keeps a record of the number of runs their star batsman, Izzy Wacket scores, and the number of runs the team scores during 10 matches in a season. Match Runs Izzy scores (x) Runs the team scores (y) 1 2 3 4 5 6 7 8 9 10 52 65 13 120 105 24 48 140 20 76 201 180 270 320 295 260 195 402 84 306 (a) Calculate the mean number of runs Izzy scores over the 10 matches. (b) Calculate the mean number of runs the team scores over the 10 matches. (c) (i) Find the standard deviation of Izzy’s runs. (ii) Given that sxy 2550 , find the equation of the line of regression y on x. (iii) In the 11th match Izzy scores 90 runs. Use your equation to estimate how many runs you would expect the team to score. (i) One match has caused the gradient to become smaller than it perhaps should be. Which match do you think this is? Justify your answer. (ii) Explain what effect this will have on the answer to part (c). (d) (ii) A survey into the votes of 400 men and 400 women was carried out in Ohio and the results are shown below. (a) Democrat Republican Women 250 150 Men 220 180 It is assumed that the vote cast is independent of gender. To prove this a 2 test of independence is conducted. The first part is a table of expected values drawn below. Democrat Republican Women a b Men c 165 Find the values of a, b, and c that are missing in the table above. 6. (b) Calculate the 2 test statistic. (c) Are the votes cast independent of gender at the 5% level of significance? Justify your answer. A sample of twelve pairs of numbers is described by the following statistics: x 9.8 y 101.8 S xy 25.64 S x 4.66 S y 5.93 The product moment correlation coefficient between the variables is r 0.93 . a) If the value of x decreases will the value of y decrease, increase or remain the same? State a reason for your answer. b) Find the equation of the regression line for y on x, in the form y mx c .