Download 1342Lecture1.pdf

1 Lecture 1 Business Statistics Instruction: Population, Sample, Data, Statistics A datum is a statement of fact or at least accepted as fact. Data is the plural of datum, so a set of data is a collection of statements. Data can be quantitative or qualitative. Consider a sporting goods store that carries jerseys. The sizes of the jerseys carried can represent a set of data that is quantitative, e.g., N = {10, 12, 14, 16, 18, 20, 24, 28} , or a set of data that is qualitative, e.g., D = {small, medium, large, extra large} . Data can be collected from a population or from a sample. A population is a set of data. A sample is a subset of a population. A sample, then, is a set of data, and in this course, we will deal mostly with samples that are sets of quantitative (numerical) data. There are many ways to sample the population. If our efforts are aimed at attempting to reach all members of the population, the technique is census-taking; otherwise, we employ other techniques to obtain a sample. Consider a shipment of 300,000 sealed crates. Law enforcement officers searching for contraband may not possess the resources to inspect each crate (take a census). Instead, officers may randomly select thirty crates from the population. The thirty crates represent a random sample. There are many types of samples: random samples (selecting thirty crates at random), cluster samples (selecting ten crates from three particular cargo bays), stratified samples (selecting a few crates from several types such as small, medium, and large crates or Nigerian, Ugandan, and Liberian crates), systematic samples (selecting every tenthousandth crate), and many more. As mentioned above, we will consider numerical data sets understood to be samples, which, in turn, are understood to be subsets of larger populations. Numbers that describe a population are called parameters. A parameter is a numerical value that describes a population. We will be largely interested in statistics, that is, numbers that describe a sample in some way. The term "statistics" is ambiguous because it refers both to the plural of statistic and to a field of study as defined below. A statistic is a numerical value that describes a sample. Statistics refers both to the plural of a statistic and to a field of science, the science of collecting, organizing, and analyzing empirical data. Statistics–numbers that describe samples–are sometimes used to infer characteristics of population parameters. Statistics–the field of study–employs certain tests and procedures to gather knowledge concerning populations. Instruction: Scores and Variables Collecting data involves some activity requiring observation or measurement. The measurements yield data values called scores, which are referred to as raw scores when it is necessary to emphasize that the score has not been changed from the initial measurement. 2 Lecture 1 A score is a datum collected by measurement or observation. Raw scores equal unchanged measurements or observations. This course deals mostly with quantitative samples which are numerical data sets. The data is collected via some measurement, each measurement being a particular datum or score. The scores change (or at least could change) from object to object in the set. The measurement itself is called a variable represented by the letter X, and scores are possible values of the variable (possible x-values). A variable is a measurable characteristic that takes different values. We will be concerned with two types of variables, discrete variables and continuous variables. Discrete variables take values that are separated by impossible values. A typical discrete variable is one restricted to whole number values. For example, counting the number of siblings of individuals or the number of deaths that occur in a hospital in a week. A variable restricted to dollar values to the nearest cent are also discrete because values such as $0.015 are impossible. A discrete variable takes values that represent separate categories such that when the scores are ordered any two consecutive possible scores are separated by a span of impossible values. Not all variables are discrete. If a variable is not discrete, it is continuous. A continuous variable takes values that represent categories such that an infinite number of possible scores fall between any two measured scores. Examples of continuous variables include heights, weights, and durations. In cases where the variable is continuous but measurements are rounded, it is important to recognize the rounded scores as values of a continuous variable, not a discrete variable. Instruction: Summation Notation Statistics often requires the summation of a large number of numbers, so a special notation for "summation" is required. The capital Greek letter sigma, Σ , serves this purpose. For example, given a set of scores (x-values), A = { x1 , x2 , … , xn } , then ∑ X = x1 + x2 + + xn . In particular, if A = {5, 7, 8, 9, 11, 12, 18} where each element in the set is a datum considered to be the value of some measurement called the random variable X, then ∑ X = 5 + 7 + 8 + 9 + 11 + 12 + 18 , so ∑ X = 70 . 3 Lecture 1 Instruction: Frequency Distributions One statistic is the frequency of a particular value in a data set. The frequency, f, of a score, x, equals the number of times the score appears in the data set. A frequency distribution is a common tool used to organize data from a sample. A frequency distribution is an organized display—be it a tabulation or a graph—that shows the frequency of each data value in a sample. A frequency distribution helps organize data sets (samples) that contain numerous repeated values. For example, consider the data set T collected by the National Oceanic and Atmospheric Administration. T is the set of number of deaths in the United States attributed to tornados in the month of February for the years 1950 to 1983. T = {45, 1, 10, 3, 2, 0, 8, 0, 13, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 134, 0, 0, 0, 7, 5, 2, 0, 0, 0, 2, 0, 1} The random variable, X, equals the number of tornado deaths in February in the United States for a given year. The frequency distribution shown below lists each x-value and the corresponding frequency of that value for the data contained in T. X 0 1 2 3 5 7 8 10 13 21 45 134 1 f 20 2 3 1 1 1 1 1 1 1 1 The frequency distribution above is in tabular form, but frequency distributions can be graphs as well. Imagine a traffic engineer collecting data on intersections. Consider sample S comprised of the monthly number of fatal automobile accidents at a particular intersection over a period of sixteen months. If S = {4, 0, 1, 4, 2, 1, 1, 0, 0, 3, 0, 1, 0, 2, 0, 1} , then Figure A is a frequency distribution of S in the form of a line graph, which is called a frequency polygon. Figure A f 7 6 5 4 3 2 1 0 0 1 2 3 4 x 4 Lecture 1 Besides the line graph above, frequency distributions can take the form of bar graphs. Bar graphs represent discrete variables with rectangles whose heights represent the frequency of each score. The widths of the bars in a bar graph are uniform (equal), and all the bars are separated by gaps of uniform size. The bar graph below is a frequency distribution for S. Bar Graph 7 6 5 4 3 2 1 0 0 1 2 3 4 Some frequency distributions show the relative frequency of the data values. A relative frequency distribution shows the fraction or percentage of the data set represented by a data value. The table below is a relative frequency distribution for sample S = {4, 0, 1, 4, 2, 1, 1, 0, 0, 3, 0, 1, 0, 2, 0, 1} . 0 38 x f 1 5 16 2 18 3 1 16 4 18 Figure B is a relative frequency distribution for sample S in bar graph form. Figure B 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 1 2 3 4 5 Lecture 1 Figure C is a relative frequency distribution of sample S in circle graph form. Circle graphs divide the area inside a circle into wedges whose sizes represent the relative frequencies of the data values. Figure C 4, 12.5% 3, 6.25% 0, 37.50% 2, 12.50% 1, 31.25% Another type of frequency distribution is a grouped frequency distribution. A grouped frequency distribution shows the frequencies of ranges of values. The ranges of values are sometimes referred to as "classes" or "bins." Consider set D, a set of lengths. D = {3.1, 3.6, 2.9, 4.1, 4.2, 2.2, 2.6, 3.1, 3.3, 4.2, 4.4, 5.0, 2.9, 4.1, 4.6} It could be advantageous to organize this data according to ranges of values. Figure D is a grouped frequency distribution using four classes in histogram form. Histograms conveniently represent continuous variables. Histograms use contiguous rectangles whose heights represent the frequency and whose widths represent some quantity. The histogram below is a grouped frequency distribution for set D. Figure D f 7 6 5 4 3 2 1 0 1.5 ≤ x1< 2.5 2.5 ≤2x < 3.5 3.5 ≤ x3< 4.5 4.5 ≤ x4 < 5.5 6 Lecture 1 In histograms, the widths of the bars represent class range. A class is a division or subset of the data that includes data that falls within a certain range of values. In Figure D, the numbers 1.5, 2.5, 3.5, and 4.5 represent lower class limits, the smallest possible data values for the four classes respectively. The upper class limits for the four classes are 2.5, 3.5, 4.5, and 5.5 respectively. Here, the upper class limits represent a boundary on the largest possible data value for each class. Class limits are either the extreme most possible data values (least or greatest) or a minimal/maximal boundary on the possible data values (least or greatest). The uniform class width approximately equals the ratio of the range of the data and the desired number of classes. class width largest data value − smallest data value desired number of classes Class limits should be chosen such that the uniform class width equals the difference of any two successive upper class limits. The class width for Figure D is 1 as calculated here: 5.5 − 4.5 = 1 . The class mark is the "middle" value of a class and can be calculated by dividing the sum of the lower and upper limits of a class by two. The class marks for Figure D are calculated below. 1.5 + 2.5 = 2, 2 2.5+3.5 = 3, 2 3.5+4.5 = 4, 2 4.5+5.5 =5 2 Assignment 1 7 Problems #1 Using a complete sentence, identify each data set described below as a sample or a population. A. B. #2 Using a complete sentence, identify each numerical value described below as a statistic or a parameter. A. B. #3 The average annual salary of thirty of a company's 1,500 employees is $76,000.00. According to ACT, Inc., the average ACT math score for all graduates in a particular year was 20.7. Consider the set of x-values: {5, 6, 9, 10} . Let n represent the cardinal number of the set. A. #4 A survey of five-hundred University of Texas students. The age of each U. S. president upon election to office. Find ∑ X . B. Evaluate ∑X . n The table below shows the oil reserves (rounded to nearest billion of barrels) of countries in the Western Hemisphere at a given time in history. Create a relative frequency distribution in the form of a bar graph. Country Billions of Barrels United States 36 Canada 16 Mexico 60 South America 72 Total 186 #5 The data set below represents the prices of grade A eggs (in dollars per dozen) for the indicated years. Use a frequency polygon to display the data set. 1990 1991 1992 1993 1994 1995 #6 1.00 1.01 1.02 0.98 0.97 0.94 1996 1997 1998 1999 2000 2001 0.95 0.93 0.94 0.94 0.99 1.02 Use a pie chart to display the data. The numbers represent the number of Nobel Prize laureates by country during the years from 1901 to 2002. U. S. 270 U. K. 100 France Sweden 49 30 Germany Other 77 157 Assignment 1 #7 8 Construct a frequency distribution in histogram form with six classes for the arrayed data. The data set represents the amount in dollars (rounded to the nearest dollar) spent on books for a semester by thirty students. Show the class marks (mid-points of each class). 91 188 189 #8 266 190 30 472 341 127 248 398 354 279 266 8 101 88 222 249 199 526 375 269 93 530 142 184 486 43 352 Fresh N Ready Sandwiches strives to provide fresh sandwiches and quick service. The table below reflects the results of a recent survey of customers who entered the restaurant only to leave before ordering. Use the data to create a Pareto diagram. Reason for Leaving Frequency Long lines 38 Pricing 6 Restaurant appearance 4 Other 2 Total 50 #9 Using complete sentences, explain why the following procedures do not give a random sample for the entire population of Manhattan. Be sure to note any bias the procedure may contain. A. B. C. #10 selecting every third woman entering a beauty shop in Gramercy Park selecting every third person entering a bar in Chelsea selecting every third person coming out of a boxing match at Madison Square Garden Using a complete sentence, identify each variable described below as discrete or continuous. A. B. C. D. the number of times a telephone rings before it is answered the amount of time a customer waits before service the amount of oil held in reserve by a nation the price of grade A eggs

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 1342Lecture1.pdf