Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 4.1 Contemporary Mathematics Instruction: Population, Sample, Data, Statistics A datum is a statement of fact or at least accepted as fact. Data is the plural of datum, so a set of data is a collection of statements. Data can be quantitative or qualitative. Consider a sporting goods store that carries jerseys. The sizes of the jerseys carried can represent a set of data that is quantitative, e.g., N = {10, 12, 14, 16, 18, 20, 24, 28} , or a set of data that is qualitative, e.g., D = {small, medium, large, extra large} . Data can be collected from a population or from a sample. A population is a set of data. A sample is a subset of a population. A sample, then, is a set of data, and in this course, we will deal exclusively with samples that are sets of quantitative (numerical) data. There are many ways to sample the population. If our efforts are aimed at attempting to reach all members of the population, the technique is census-taking; otherwise, we employ other techniques to obtain a sample. Consider a shipment of 300,000 sealed crates. Law enforcement officers searching for contraband may not possess the resources to inspect each crate (take a census). Instead, officers may randomly select thirty crates from the population. The thirty crates represent a random sample. There are many types of samples: random samples (selecting thirty crates at random), cluster samples (selecting ten crates from three particular cargo bays), stratified samples (selecting a few crates from several types such as small, medium, and large crates or Nigerian, Ugandan, and Liberian crates), systematic samples (selecting every tenthousandth crate), and many more. As mentioned above, we will consider numerical data sets understood to be samples, which, in turn, are understood to be subsets of larger populations. Numbers that describe a population are called parameters. A parameter is a numerical value that describes a population. We will be interested in statistics, that is, numbers that describe a sample in some way. The term "statistics" is ambiguous because it refers both to the plural of statistic and to a field of study as defined below. A statistic is a numerical value that describes a sample. Statistics refers both to the plural of a statistic and to a field of science, the science of collecting, organizing, and analyzing empirical data. Statistics–numbers that describe samples–are sometimes used to infer characteristics of population parameters. Statistics–the field of study–employs certain tests and procedures to gather knowledge concerning populations. Instruction: Scores and Variables Collecting data involves some activity requiring observation or measurement. The measurements yield data values called scores, which are referred to as raw scores when it is necessary to emphasize that the score has not been changed from the initial measurement. Lecture 4.1 A score is a datum collected by measurement or observation. Raw scores equal unchanged measurements or observations. This course deals with quantitative samples which are numerical data sets. The data is collected via some measurement, each measurement being a particular datum or score. The scores change (or at least could change) from object to object in the set. The measurement itself is called a variable represented by the letter X, and scores are possible values of the variable (possible xvalues). A variable is a measurable characteristic that takes different values. We will be concerned with two types of variables, discrete variables and continuous variables. Discrete variables are typically restricted to whole numbers. For example, counting the number of siblings of individuals or the number of deaths that occur in a hospital in a week. A discrete variable takes values that represent separate categories such that when the scores are ordered any two consecutive possible scores are separated by a span of impossible values. Not all variables are discrete. If a variable is not discrete, it is continuous. A continuous variable takes values that represent categories such that an infinite number of possible scores fall between any two measured scores. Examples of continuous variables include heights, weights, and durations. In cases where the variable is continuous but measurements are rounded, it is important to recognize the rounded scores as values of a continuous variable, not a discrete variable. Instruction: Summation Notation Statistics often requires the summation of a large number of numbers, so a special notation for "summation" is required. The capital Greek letter sigma, Σ , serves this purpose. For example, given a set of scores (x-values), A = { x1 , x2 , … , xn } , then ∑ X = x1 + x2 + + xn . In particular, if A = {5, 7, 8, 9, 11, 12, 18} where each element in the set is a datum considered to be the value of some measurement called the random variable X, then ∑ X = 5 + 7 + 8 + 9 + 11 + 12 + 18 , so ∑ X = 70 . Application Exercise 4.1 Problems Donald Robertson writes in Space & Communications, "The SSMEs [space shuttle main engines] are the most reliable of today's rocket engines. In 168 engine flights, there have been no critical failures, just one early shut-down in flight caused by a sensor problem, and only four shut-downs on the pad, according to Rocketdyne. Challenger was destroyed by a Solid Rocket Booster leak. This main engine reliability has been achieved despite engines being re-used as many as fifteen times." #1 Is the proportion of "shut-downs" to "engine flights" an example of a statistic or a parameter? A Gallup survey of 1,000 telephone interviews conducted shortly after the Columbia tragedy indicated that 82 percent of respondents expressed support for continuing the manned space shuttle program. #2 Is the proportion of respondents who "expressed support for continuing the manned space shuttle program" a statistic or a parameter? A reporter investigating possible asteroid/Earth impact catastrophes writes, "The orbit of asteroid 2003-QQ47 has been calculated using only fifty-one observations during a seven-day period. Further observations are required to determine if any danger of impact with Earth does exist. This asteroid will be monitored closely over the next two months. Astronomers expect the risk of impact to decrease significantly as more data is gathered." #3 To scientists assessing the risk of impact from asteroid 2003-QQ47, are the "fifty-one observations" discussed in the article a population or a sample? About.com reports, "More than 70 spacecraft have been sent to the Moon; 12 astronauts have walked upon its surface and brought back 382 kg of lunar rock and soil to Earth." #4 To scientists interested in the Moon, does the "382 kg of lunar rock and soil" mentioned in the report represent a population or sample? #1 parameter #2 statistic #3 sample #4 sample Assignment 4.1 Problems Identify each data set described below as a sample or a population. #1. A survey of five-hundred University of Texas students. #2 The age of each U. S. president upon election to office. Identify each numerical value described below as a statistic or a parameter. #3 The average annual salary of forty of a company's 1,100 employees is $76,000.00. #4 According to ACT, Inc., the average ACT math score for all graduates in a particular year was 20.7. Consider the set of x-values: {5, 6, 9, 10} . #5 Find ∑ X . Lecture 4.2 Instruction: Frequency Distributions One statistic is the frequency of a particular value in a data set. The frequency, f, of a score, x, equals the number of times the score appears in the data set. A frequency distribution is a common tool used to organize data from a sample. A frequency distribution is an organized display—be it a tabulation or a graph—that shows the frequency of each data value in a sample. A frequency distribution helps organize data sets (samples) that contain numerous repeated values. For example, consider the data set T collected by the National Oceanic and Atmospheric Administration. T is the set of number of deaths in the United States attributed to tornados in the month of February for the years 1950 to 1983. T = {45, 1, 10, 3, 2, 0, 8, 0, 13, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 134, 0, 0, 0, 7, 5, 2, 0, 0, 0, 2, 0, 1} The random variable, X, equals the number of tornado deaths in February in the United States for a given year. The frequency distribution shown below lists each x-value and the corresponding frequency of that value for the data contained in T. X 0 1 2 3 5 7 8 10 13 21 45 134 1 f 20 2 3 1 1 1 1 1 1 1 1 The frequency distribution above is in tabular form, but frequency distributions can be graphs as well. Imagine a traffic engineer collecting data on intersections. Consider sample S comprised of the monthly number of fatal automobile accidents at a particular intersection over a period of sixteen months. If S = {4, 0, 1, 4, 2, 1, 1, 0, 0, 3, 0, 1, 0, 2, 0, 1} , then Figure A is a frequency distribution of S in the form of a line graph, which is called a frequency polygon. Figure A f 7 6 5 4 3 2 1 0 0 1 2 3 4 x Lecture 4.2 Besides the line graph above, frequency distributions can take the form of bar graphs. Bar graphs represent discrete variables with rectangles whose heights represent the frequency of each score. The bar graph below is a frequency distribution for S. Bar Graph 7 6 5 4 3 2 1 0 0 1 2 3 4 Some frequency distributions show the relative frequency of the data values. A relative frequency distribution shows the fraction or percentage of the data set represented by a data value. The table below is a relative frequency distribution for sample S = {4, 0, 1, 4, 2, 1, 1, 0, 0, 3, 0, 1, 0, 2, 0, 1} . 0 38 x f 1 5 16 2 18 3 1 16 4 18 Figure B is a relative frequency distribution for sample S in bar graph form. Figure B 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 1 2 3 4 Lecture 4.2 Figure C is a relative frequency distribution of sample S in circle graph form. Circle graphs divide the area inside a circle into wedges whose sizes represent the relative frequencies of the data values. Figure C 4, 12.5% 3, 6.25% 0, 37.50% 2, 12.50% 1, 31.25% Another type of frequency distribution is a grouped frequency distribution. A grouped frequency distribution shows the frequencies of ranges of values. The ranges of values are sometimes referred to as "classes" or "bins." Consider set D, a set of lengths. D = {3.1, 3.6, 2.9, 4.1, 4.2, 2.2, 2.6, 3.1, 3.3, 4.2, 4.4, 5.0, 2.9, 4.1, 4.6} It could be advantageous to organize this data according to ranges of values. Figure D is a grouped frequency distribution using four classes in histogram form. Histograms represent continuous variables with contiguous rectangles whose heights represent the frequency. The histogram below is a grouped frequency distribution for set D. Figure D f 7 6 5 4 3 2 1 0 1.5 ≤ x1< 2.5 2.5 ≤2x < 3.5 3.5 ≤ x3< 4.5 4.5 ≤ x4 < 5.5 Lecture 4.2 In Figure D, the numbers 1.5, 2.5, 3.5, and 4.5 represent lower class limits, the smallest possible data values for the four classes respectively. The upper class limits for the four classes are 2.5, 3.5, 4.5, and 5.5 respectively. Here, the upper class limits represent a boundary on the largest possible data value for each class. Class limits are either the extreme most possible data values (least or greatest) or a minimal/maximal boundary on the possible data values (least or greatest). The uniform class width equals the difference of any two successive upper class limits. The class width for Figure D is 1 as calculated here: 5.5 − 4.5 = 1 . The class mark is the "middle" value of a class and can be calculated by dividing the sum of the lower and upper limits of a class by two. The class marks for Figure D are calculated below. 1.5 + 2.5 = 2, 2 2.5+3.5 = 3, 2 3.5+4.5 = 4, 2 4.5+5.5 =5 2 Application Exercise 4.2 Problems Suppose NASA studies the effects of micro-gravity on the immune system. As part of this study, NASA collects thirty blood samples from astronauts after six consecutive weeks in orbit and records the number of white cells in thousands per cubic millimeter below. 3.6 5.9 6.3 5.1 5.0 7.2 5.2 9.3 8.1 7.1 9.9 9.2 5.9 9.9 5.7 7.9 9.9 8.4 6.0 8.5 6.7 7.9 7.7 4.4 8.0 4.7 6.9 7.8 9.1 4.9 #1 Create a grouped frequency distribution using seven classes. #2 Represent the grouped frequency distribution from problem one using a frequency polygon. Label each class using the class mark. #3 Change the grouped frequency distribution from problem one into a relative frequency distribution. 9.9 − 3.6 6.3 = = 0.9 7 7 3.6 < x < 4.5 4.5 < x < 5.4 2 5 #1 width ≈ x f 5.4 < x < 6.3 4 6.3 < x < 7.2 4 7.2 < x < 8.1 6 8.1 < x < 9.0 3 9.0 < x < 9.9 6 #2 f 7 6 5 4 3 2 1 0 1 4.05 2 4.95 3 5.85 4 6.75 5 7.65 6 8.55 7 9.45 x #3 x f/n 3.6 < x < 4.5 1/15 4.5 < x < 5.4 1/6 5.4 < x < 6.3 2/15 6.3 < x < 7.2 2/15 7.2 < x < 8.1 1/5 8.1 < x < 9.0 1/10 9.0 < x < 9.9 1/5 Assignment 4.2 Problems #1 Display the following data using a frequency polygon. {1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 5, 5, 5, 5, 6, 6, 6, 7, 7, 7, 7, 8, 9, 9, 9, 9, 10, 10} #2 Display the following data as a grouped frequency distribution using a histogram. {0.2, 0.3, 0.7, 1.1, 1.7, 2.2, 2.3, 2.4, 2.5, 2.7, 2.8, 2.9, 3.3, 3.9} #3 Use a frequency polygon to display a grouped frequency distribution for the data, which represents the prices of grade A eggs (in dollars per dozen) for the indicated years. 1900 1991 1992 1993 1994 1995 #4 1996 1997 1998 1999 2000 2001 1.03 1.02 0.97 0.99 1.09 1.10 Use a pie chart to display the data as a relative frequency distribution. The numbers represent the number of Nobel Prize laureates by country during the years from 1901 to 2002. U.S. U.K. #5 1.00 1.01 0.98 0.99 0.98 1.01 270 100 France Sweden 49 30 Germany Other 77 157 Construct a grouped frequency distribution with six classes using a histogram for the data set. The numbers represent the average amount in dollars spent on energy bills for a month. 91 188 189 266 190 30 472 341 127 248 398 354 279 266 8 101 88 222 249 199 526 375 269 93 530 142 184 486 43 352