Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
THE NORMAL DISTRIBUTION Many distributions exhibit a symmetrical shape, that is an even spread from the mean. The mean, x and standard deviation, s are thus used to fully describe the distribution. The following facts hold for a normal distribution: • Middle 68% of observations lie within one standard deviation either side of the mean or within: x − s≤ x ≤ x + s • x±s) Middle 95% of observations lie within two standard deviation either side of the mean or within: x − 2s ≤ x ≤ x + 2s • (or (or x ± 2s ) Middle 99.7% of observations lie within three standard deviation either side of the mean or within: x − 3s ≤ x ≤ x + 3s (or x ± 3s ) We can use these properties to predict what percentage of a given set of data lies 1, 2 or 3 standard deviation units from the mean. This is sometimes called the 68 – 95 – 99.7% rule. The School For Excellence 2015 The Essentials – Further Mathematics – Core Materials Page 35 QUESTION 35 The heights of female students in a Year 11 level has a normal distribution with a mean of 162 cm and a standard deviation of 12 cm. Under these circumstances, give the range of heights that would include approximately 95% of the Year 11 female students. Solution x − 2s ≤ x ≤ x + 2s x ± 2 s = 162 ± 2 × 12 cm = 162 − 24 and 164 + 24 = 138 cm and 188 cm QUESTION 36 The volume of a particular tomato juice carton is normally distributed with a mean of 250 ml and a standard deviation of 5 ml. In a sample of 400 cartons, how many would be expected to have a volume of more than 245 ml? Solution Draw a normal distribution graph, Insert the values for the distribution at the appropriate places, then evaluate the answer(s). x + s = 250 + 5 = 255 x + 2s = 250 + 10 = 260 x + 3s = 250 + 15 = 265 x − s = 250 − 5 = 245 x − 2s = 250 − 10 = 240 x − 3s = 250 − 15 = 235 245 ml = x − s X > 245 = 34 + 34 + 13.5 + 2.35 + 0.15% = 84% 84% of 400 = 336 cartons The School For Excellence 2015 The Essentials – Further Mathematics – Core Materials Page 36 QUESTION 37 The distribution of the weights of Easter eggs is normally distributed with a mean of 20 g and a standard deviation of 2 g. Easter eggs weighing less than 16 g are rejected. The percentage of eggs that would be rejected is closest to: A B C D E 95% 68% 99% 5% 2.5% QUESTION 38 The distribution of systolic blood pressure of a large group of teenagers is approximately bell-shaped with a mean of 122 and a standard deviation of 9. The percentage of these students with a systolic blood pressure less than 131 is closest to: A B C D E 5% 16% 68% 84% 95% STANDARD (Z) SCORES When comparing, for example, a male student’s height and a female student’s height, it is not possible directly compare as they come from different distributions with different means and standard deviations. One possible method is to use Standard or z scores. actual data − mean standard deviation x−x = s z= To calculate a standard z score, you use the following rule: QUESTION 39 At Tooshort High Year 11 girls’ heights are normally distributed with a mean of 162 cm and standard deviation of 12 cm. Janet is 174 cm tall. Her brother, Damon is 188 cm tall. His Year 9 classmates’ heights are normally distributed with a mean of 176 cm and standard deviation of 16 cm. Which of the two is taller for their sex and age? Solution To compare the brother and sister’s heights, calculate their individual z -scores. Janet: Damon: x − x 174 − 162 z = = = +1 s 12 x − x 188 − 176 z= = = +0.75 s 16 Based on their z -scores, Janet is taller for sex and age compared to her brother Damon. The School For Excellence 2015 The Essentials – Further Mathematics – Core Materials Page 37 QUESTION 40 A summary of the results for Horace Cope’s classes in the French and Mathematics exams are shown in the table below. Class mean Class standard deviation French 62% 7% Mathematics 60% 5% Horace obtained 65% in both exams. The difference in his z-scores is: A 0.02 B 0.36 C 0.57 D 1.0 E 1.63 QUESTION 41 A football player whose height was 198 cm was informed at the summer draft camp that he had a z-score of 1.4 for height for that group. If the standard deviation is 6.5 cm then the average height of football players at the camp (in centimetres to one decimal place) is: A 207.1 B 193.4 C 191.5 D 189.0 E 188.9 QUESTION 42 Tim has a z-score for height of 1.7. This means that: A B C D E He is in the top 50% of heights, but not in the top 16% of heights. He is in the top 16% of heights, but not in the top 2.5% of heights. He is in the top 2.5% of heights, but not in the top 0.15% of heights. He is in the top 34% of heights, but not in the top 5% of heights. He is in the top 50% of heights, but not in the top 34% of heights. The School For Excellence 2015 The Essentials – Further Mathematics – Core Materials Page 38 BIVARIATE DATA Bivariate Data explores the relationship between two variables. The study of bivariate data attempts to: • Determine whether any relationship actually exists between the two variables. • If a relationship exists we try to describe the nature of the relationship, quantifying the relationship if possible. Bivariate data can explore the relationship between variables: • That are both categorical. • Where one is categorical and the other numerical. • That are both numerical. The type of data being examined will determine the type of analysis and the type of display that is appropriate for the data. BIVARIATE DATA — TWO CATEGORICAL VARIABLES Two categorical variables are displayed using: • Two way frequency tables OR • Two way percentaged frequency tables OR • Comparative segmented bar charts. Two way frequency tables are displayed with the independent (explanatory) variable filling the columns. Percentaged two way frequency tables are used more commonly because they allow us to compare values that may come from different sample sizes. Comparative segmented bar charts allow a visual comparison between two or more categories. They are also usually percentaged to allow accurate and meaningful comparison. The School For Excellence 2015 The Essentials – Further Mathematics – Core Materials Page 39 QUESTION 43 The comparative segmented bar chart shows the twitter usage of students in Years 7- 8, 9 – 10 and 11 – 12. Describe any relationship observed between twitter use and year level, including percentages to support your answer. Solution The School For Excellence 2015 The Essentials – Further Mathematics – Core Materials Page 40 QUESTION 44 A survey was conducted to determine whether males or females read novels more. Respondents were asked whether they read novels regularly, sometimes or never. The results are shown in the two way frequency table below. Male Female Total Reads regularly 23 32 55 Reads sometimes 34 35 69 Reads never 9 2 11 Total 66 69 135 Complete the percentaged two way table below correct to the nearest percent: Male Female Reads regularly Reads sometimes Reads never Total QUESTION 45 Compare the reading behaviour of the males and females in this survey. Solution The School For Excellence 2015 The Essentials – Further Mathematics – Core Materials Page 41 BIVARIATE DATA — ONE CATEGORICAL AND ONE NUMERICAL VARIABLE Two displays are available for one categorical and one numerical variable: • If there are two categories we can use a back to back stem and leaf plot. • If there are two or more variables we use parallel boxplots. Back to back stem and leaf plots are displayed as shown below: Note that the data on the left hand side of the back to back stem and leaf plot is ordered backwards, that is from the centre outwards. When comparing these two categories we describe each data set as a univariate set, describing shape, centre, spread and the presence or otherwise of outliers. Comparisons must include comparative words such as “greater than”, “less than”, “similar to”, etc. A good comparison of the data shown in the back to back stem and leaf plot would be: The distribution of reading times in this sample is symmetric for females, but the male distribution is positively skewed. The median male reading time of 18 hours is less than the median female reading time of 27 hours. There is a larger range of female reading times (48 hours) than male reading times (39 hours). Neither data set has outliers. The School For Excellence 2015 The Essentials – Further Mathematics – Core Materials Page 42 Parallel boxplots can be used for two or more categories. The box plots are displayed on the one scale so that any differences or similarities are visually apparent. QUESTION 46 The three parallel boxplots suggest that gestation time and size of mammal (small, medium and large) are positively related. Explain why, giving reference to an appropriate statistic. Solution The School For Excellence 2015 The Essentials – Further Mathematics – Core Materials Page 43 QUESTION 47 The parallel boxplots show the distributions of sodium content of beef and poultry based sausages. Compare the sodium content of beef and poultry sausages. Solution QUESTION 48 A survey was conducted that explored the relationship between preferred mode of transport (car, bus or train) and gender (male or female). An appropriate display for this data would be: A B C D E A histogram Parallel boxplots Back to back stem and leaf plots Percentaged two way frequency tables A scatterplot The School For Excellence 2015 The Essentials – Further Mathematics – Core Materials Page 44 EXAM 2 STYLE QUESTIONS QUESTION 49 Daniel owns a house in a popular Melbourne suburb. His workplace has moved location and so Daniel decides to sell his house and move. Before he does, he does some research to find out what price he should sell his house for. He finds out that the average price for a house in his suburb is $350,000 and the distribution of selling prices is bell shaped. The standard deviation of sales prices is $25,000. (a) What percentage of houses in this suburb would sell for above $300,000? _____________________________________________________________________ _____________________________________________________________________ (b) In one particular month there are 63 houses sold in this suburb. How many houses would have sold for less than $375,000? _____________________________________________________________________ _____________________________________________________________________ _____________________________________________________________________ (c) Daniel sells his house for $370,000. What is the z-score for this sale? _____________________________________________________________________ _____________________________________________________________________ (d) Daniel then buys a house in another suburb nearer his new workplace where the average house price is $410,000 with a standard deviation of $15,000. He pays $420,000 for this new house. What is the z-score associated with his purchase? _____________________________________________________________________ _____________________________________________________________________ (e) Did Daniel have more success with buying or selling? Use your answers to (c) and (d) above to compare his two transactions. _____________________________________________________________________ _____________________________________________________________________ _____________________________________________________________________ _____________________________________________________________________ (f) Daniel’s workmate actually sold his house for a value that had a standard score of –0.3. What price did he sell his house for? _____________________________________________________________________ _____________________________________________________________________ ____________________________________________________________________ The School For Excellence 2015 The Essentials – Further Mathematics – Core Materials Page 45 QUESTION 50 A crop scientist has been trialling different types of wheat. He believes that there are several factors involved in the amount of production, including the variety of the wheat grown, as well as the type and amount of fertiliser used and the climate where the crop is grown. (a) What kind of variable is the variety of wheat? _____________________________________________________________________ _____________________________________________________________________ _____________________________________________________________________ (b) The scientist wanted to examine the climate in one region where he wishes to grow the wheat. He recorded the average temperature every month for a year. The results are listed below: Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Temp (ºC) 30º 31º 32º 21º 21º 7º 9º 21º 25º 25º 25º 28º Display this data as an ordered stem and leaf plot. (c) Are there any outliers in this data set? Explain your answer. _____________________________________________________________________ _____________________________________________________________________ _____________________________________________________________________ _____________________________________________________________________ The School For Excellence 2015 The Essentials – Further Mathematics – Core Materials Page 46 (d) The Scientist next looks at the effect of 2 different fertilisers on the growth of one variety of wheat, recording the percentage of plants of each variety that grow in excess of 40 cm in one month and the percentage of plants that grow less than 40 cm in one month. He calls these categories “fast growth” and “slow growth”. He displays this data in a two-way percentaged frequency table. If the display is as shown below, label the correct positions of the variables “fertiliser type” and “growth rate”. (Note: it is usual to put the independent variable in the columns and the dependent in the rows) Variable _________________________ Variable _________________________ (e) The scientist is concerned that his data isn’t specific enough when he uses the categories “fast growth” and “slow growth”, so instead he decides to record the actual height of each plant with each of the fertilisers. He displays this data on a parallel boxplot as shown below. Compare the results of the two fertilisers. _____________________________________________________________________ _____________________________________________________________________ _____________________________________________________________________ The School For Excellence 2015 The Essentials – Further Mathematics – Core Materials Page 47 (f) The scientist then decides to try another type of fertiliser, Fertiliser C. He records the results of the growth in one month for this fertiliser in the stem and leaf plot below: Display this data as a boxplot on the grid below: The School For Excellence 2015 The Essentials – Further Mathematics – Core Materials Page 48 BIVARIATE ANALYSIS — NUMERICAL AND NUMERICAL The prime objective for numerical Bivariate analysis is to determine the existence of a relationship between two variables and, if a relationship exists, state the: • Strength of the relationship • Form • Direction The data is two numerical bivariate if both of the variables are numerical variables. Bivariate statistics enable us to measure how strongly two variables are connected or associated. (EXPLANATORY) INDEPENDENT and (RESPONSE) DEPENDENT VARIABLES It is most important that the variables are correctly identified. Variables: If the value of y depends on x : • • y is the (response) dependent variable (DV). It is plotted on the vertical axis. x is the (explanatory) independent variable (IV). It is plotted on the horizontal axis. If the variable is controlled by us, it is called the (explanatory) independent variable. • In some circumstances: • If neither variable is controlled by us, we may choose to place either variable on the horizontal axis. (Warning! If you do not correctly decide which is the IV and which is the DV then your analysis will be wrong!) Tests for independence: 1. Does one variable affect the other? If so, the one being affected is dependent (response). 2. Did one variable occur before the other one? If so, the one that came first is independent (explanatory). 3. What are we trying to predict? The variable that is being predicted is dependent (response). Note: Use exam cues! During reading time look for a graph with axes labelled with variables or an equation written in the form y = mx + c. These may help determine which variable is independent or dependent (explanatory or response). The School For Excellence 2015 The Essentials – Further Mathematics – Core Materials Page 49 Initially with two numerical bivariate analysis we focus on: • Strength – Is the relationship between the two variables: STRONG, MODERATE, WEAK or NO correlation • Form – Is the relationship between the two variables: LINEAR or NON-LINEAR • Direction – Is the relationship between the two variables: POSITIVE or NEGATIVE Scatterplots: • To determine whether a relationship exists between 2 variables, we draw a scatterplot. • Data may group about a well-defined curve such as a line, parabola etc. • In some cases, there is absolutely no relationship between the 2 variables and the scatterplot will not display any clear association or pattern. • Relationships in scatterplots are preliminarily judged by eye for STRENGTH, FORM and DIRECTION. Positive Linear Strength Negative Linear Strong Moderate Weak No Relationship The School For Excellence 2015 The Essentials – Further Mathematics – Core Materials Page 50 QUESTION 1 A scientist recorded the effect of different amounts of one fertiliser on one particular variety of wheat. His results are shown in the scatterplot below. Describe the relationship in terms of strength, direction and form. _________________________________________________________________________ _________________________________________________________________________ _________________________________________________________________________ QUESTION 2 For the following surveys or studies conducted decide which of the pairs of variables is the dependent and independent variable. (a) Relationship between height and weights of 100 Year 12 male students. (b) Temperature each hour and time of the day. (c) Mathematics test mark and English test mark of 30 students from a Year 9 homegroup class. (d) Study whether English influences Mathematics given the English and Mathematics test marks of 30 students. The School For Excellence 2015 The Essentials – Further Mathematics – Core Materials Page 51 QUESTION 3 The number of people attending an outdoor event was recorded on a series of days of different temperature. Which of the following graphs is correctly labelled and titled? A B Attendance vs Temperature Temperature vs Attendance Attendance Attendance 6000 6000 5000 4000 3000 2000 1000 0 5000 4000 3000 2000 1000 0 0 5 10 15 20 25 30 35 0 10 Temperature (celcius) 20 30 40 Temperature (celcius) C D Temperature vs Attendance Temperature (celcius) 6000 5000 4000 3000 2000 1000 0 0 5 10 15 20 25 30 35 Attendance Temperature (celcius) Attendance vs Temperature 35 30 25 20 15 10 5 0 0 1000 2000 3000 4000 5000 6000 Attendance E Temperature (celcius) Attendance vs Temperature 35 30 25 20 15 10 5 0 0 1000 2000 3000 4000 5000 6000 Attendance The School For Excellence 2015 The Essentials – Further Mathematics – Core Materials Page 52 BIVARIATE ANALYSIS — THE STRENGTH OF A LINEAR RELATIONSHIP Bivariate statistics enables us to measure how strongly two variables are connected or associated. While visual inspection of a scatterplot gives us an indication of the strength and direction of a relationship, a more accurate measure of the relationship is given by calculating the value of Pearson’s product moment correlation coefficient, also known as the r value. PEARSON’S PRODUCT MOMENT CORRELATION COEFFICIENT or the r VALUE This value can be found on the calculator by entering the data values in the statistics lists and calculating the least square regression line. Pearson’s can take values between 1 and 1, the sign being an indication of the direction of the relationship and the value indicating the strength. Pearson’s tells us nothing about the form of the relationship, because it is calculated from the assumption that the relationship is linear. Using Pearson’s to indicate the strength and direction of a non-linear relationship is therefore not reliable. The interpretation of this value is as follows: QUESTION 4 The relationship between the number of hours of study done by a Year 12 Further Maths student and their final study score has an r value of 0.7. Interpret the meaning of this value. Solution The School For Excellence 2015 The Essentials – Further Mathematics – Core Materials Page 53 THE COEFFICIENT OF DETERMINATION • The coefficient of determination is given by r2. Obviously, it is very easy to calculate – we merely square Pearson’s product–moment correlation coefficient (r) (it appears on the same screen as r also). Make sure that you bracket a negative value of Pearson’s before squaring! • The value of the coefficient of determination ranges from 0 to 1. It is often expressed as a percentage. • The coefficient of determination is useful when we have two variables which have a linear relationship. It tells us the proportion of variation in one variable which can be explained by the variation in the other variable. • The coefficient of determination provides a measure of how well the linear rule linking the two variables ( x and y ) predicts the value of y when we are given the value of x. Standard statement for the coefficient of determination: ( r × 100%) of the variation in the (dependent/response variable) can be explained by the variation in the (independent/explanatory variable). 2 An additional part to this statement is also used occasionally: The other (100- r2%) of the variation in the (dependent/response variable) can be explained by other factors. In each case the part of the statement(s) in brackets needs to be replaced with an appropriate value or variable. EXAMPLE In Question 4 we looked at an r value of 0.7 for the relationship between study score and hours of study. The value of the coefficient of determination would be (0.7)2 = 0.49. 49% of the variation in the study score can be explained by the variation in the hours of study. The other 51% of the variation in the study score can be explained by other factors. The School For Excellence 2015 The Essentials – Further Mathematics – Core Materials Page 54 QUESTION 5 A study to find a relationship between lung capacity in litres per minute (dependent (response) variable) and years of cigarette smoking (independent (explanatory) variable) produced a value of Pearson’s product moment correlation coefficient of 0.9. (a) Interpret the meaning of Pearson’s product moment correlation coefficient for the relationship between lung capacity and years of smoking. _____________________________________________________________________ (b) Complete the following statements. The co-efficient of determination is calculated to be _________. This means that ________% of the variation in _________________ can be explained by the variation in ___________________. The other _______% of the variation in __________________ can be explained by other factors. QUESTION 6 A set of data comparing blood alcohol level (BAL) and a driver’s ability to control a car is found to have a coefficient of determination of 64%. A competent driver with zero BAL would score high in ability to control a car. The Pearson’s correlation coefficient r is most likely to be: A 0.64 B +0.8 C ±0.8 D -0.8 E -0.64 The School For Excellence 2015 The Essentials – Further Mathematics – Core Materials Page 55 QUESTION 7 For the following data set: x y 25 36 45 78 89 99 110 78 153 267 456 891 1020 1410 The coefficient of determination (to two decimal places) is closest to: A 14.14 B –381.97 C 0.91 D 0.95 E 0.94 Solution TI-Nspire Add Lists and spreadsheet and enter data, name lists Casio ClassPad In Statistics Menu enter data into Lists 1 & 2 Menu 4. Statistics 1. Stat Calculations 4. Linear Regression (a+bx) Calc Linear Reg set lists as x list and y list Set lists as x list and y list ok r and r2 are displayed r and r2 are displayed The School For Excellence 2015 The Essentials – Further Mathematics – Core Materials Page 56 CORRELATION AND CAUSATION While we are entitled to say that there is a strong association between, say, the height of a footballer and the number of marks he takes, we cannot assert that the height of a footballer causes him to take a lot of marks. Being tall might assist in taking marks, but there will be many other factors which come into play — for example skill level, accuracy of passes from team mates, abilities of the opposing team, and so on. So, while establishing a high degree of correlation between two variables may be interesting and can often flag the need for further, more detailed investigation, it in no way gives us any basis to comment on whether or not one variable causes particular values in another variable. QUESTION 8 The correlation between two variables x and y is −0.98. Which of the following statements is true? A B C D E x increases, causing y to increase. x increases, causing y to decrease. There is a poor fit between x and y . As x increases, y tends to decrease. As x decreases, y tends to decrease. QUESTION 9 A study was conducted recording the number of hours of television students watched on average each night during Year 12 and their final ATAR score. The value of Pearson’s product moment correlation coefficient was found to be 0.72. From this information it could be concluded that: A B C D E Watching a lot of television is detrimental to a student’s performance. Approximately 52% of students watch too much television. Television watching should be limited during Year 12. Students who watched more television tended to have lower ATAR scores. Television watching improved 72% of students’ ATAR scores. The School For Excellence 2015 The Essentials – Further Mathematics – Core Materials Page 57 INTRODUCTION TO REGRESSION In Further Maths three different regression lines are used to quantify bivariate relationships. They are: • Line by eye or line of best fit. • The three median line. • The least squares regression line. Each of these lines has its advantages and disadvantages: Line by eye is variable between individuals, particularly if the relationship is moderate or weak, but it is relatively unaffected by outliers. The three median line is unaffected by outliers, but is mainly suitable for a smaller number of data values. The least squares regression line is the most commonly used, but it is affected by outliers and can be very inaccurate when they are present. INTRODUCTION TO REGRESSION — LINE BY EYE Step 1: Draw a line of best fit through the data. This is the line that follows the direction of the data, has approximately the same number of points above the line as below and ignores outlying data points. Step 2: Choose two coordinate points on the line (preferably at either end) and work out the gradient, m using the formula: m= y 2 − y1 x 2 − x1 Step 3: Either find the y-intercept, c, from the graph (but make sure the x value is zero and it is the y axis) and substitute m and c into the equation y = mx + c . or Substitute a point and the gradient directly into the formula: y − y1 = m( x − x1 ) . Note: An alternative to using the algebraic approach above is to enter the two points into the calculator statistics menu and perform a regression on those two points. The School For Excellence 2015 The Essentials – Further Mathematics – Core Materials Page 58 QUESTION 10 Calculate the equation to the line below. Solution Choose two points on the line (65,20) and (40,80) and work out gradient, m using the formula: m= y −y x −x 2 1 2 1 80 − 20 40 − 65 = −2.4 = Substitute a point (from the line) and the gradient directly into the formula, y − y1 = m( x − x1 ) y − 20 = −2.4 ( x − 65 ) y − 20 = −2.4 x + 156 y = −2.4 x + 156 + 20 y = −2.4 x + 176 Note: The y -intercept appears to be too high but check of the x -axis reveals that it does not start at zero. Be careful that you do not try to extrapolate the line to the y -axis and read the intercept from the graph. Alternatively: The School For Excellence 2015 The Essentials – Further Mathematics – Core Materials Page 59 QUESTION 11 A line by eye drawn on a data set is shown in the figure below. The best estimate of the regression equation is: A B C D E y = 4.5 x + 30 y = −3 x + 30 y = 0.25 x + 9.5 y = 8x + 2 y = −2 x + 26 THE THREE MEDIAN METHOD This method is not affected by outliers and is often used when there are outliers in the data set. Also the most suitable for small data sets (e.g. up to 20 points). Step 1: Plot the points on a scatterplot. Step 2: Divide the data into three even groups, L (lower), M (middle) and U (upper) according to the order of the x -values. The number of points in a data set will not always be exactly divisible by 3. If there is one extra value it goes in the middle group and if there are two extra values they go into the lower and upper groups. Step 3: Find the median of the x and y values in each of the groups, (xL, yL), (xM, yM) And (xU,yU). Step 4: Use the LOWER and UPPER MEDIAN to find the gradient, m, using the rule m= yU − y L xU − x L Step 5: Find the y -intercept, c, using the formula: c= 1 [(y L + y M + yU )− m(x L + x M + xU )] 3 Step 6: Substitute m and c into the rule y = mx + c . The School For Excellence 2015 The Essentials – Further Mathematics – Core Materials Page 60