Download 1342Lecture1.pdf

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Elementary mathematics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
1
Lecture 1
Business Statistics
Instruction: Population, Sample, Data, Statistics
A datum is a statement of fact or at least accepted as fact. Data is the plural of datum, so
a set of data is a collection of statements. Data can be quantitative or qualitative. Consider a
sporting goods store that carries jerseys. The sizes of the jerseys carried can represent a set of
data that is quantitative, e.g., N = {10, 12, 14, 16, 18, 20, 24, 28} , or a set of data that is
qualitative, e.g., D = {small, medium, large, extra large} .
Data can be collected from a population or from a sample. A population is a set of data.
A sample is a subset of a population. A sample, then, is a set of data, and in this course, we will
deal mostly with samples that are sets of quantitative (numerical) data.
There are many ways to sample the population. If our efforts are aimed at attempting to
reach all members of the population, the technique is census-taking; otherwise, we employ other
techniques to obtain a sample. Consider a shipment of 300,000 sealed crates. Law enforcement
officers searching for contraband may not possess the resources to inspect each crate (take a
census). Instead, officers may randomly select thirty crates from the population. The thirty
crates represent a random sample. There are many types of samples: random samples (selecting
thirty crates at random), cluster samples (selecting ten crates from three particular cargo bays),
stratified samples (selecting a few crates from several types such as small, medium, and large
crates or Nigerian, Ugandan, and Liberian crates), systematic samples (selecting every tenthousandth crate), and many more.
As mentioned above, we will consider numerical data sets understood to be samples,
which, in turn, are understood to be subsets of larger populations. Numbers that describe a
population are called parameters.
A parameter is a numerical value that describes a population.
We will be largely interested in statistics, that is, numbers that describe a sample in some way.
The term "statistics" is ambiguous because it refers both to the plural of statistic and to a field of
study as defined below.
A statistic is a numerical value that describes a sample.
Statistics refers both to the plural of a statistic and to a field of science, the science
of collecting, organizing, and analyzing empirical data.
Statistics–numbers that describe samples–are sometimes used to infer characteristics of
population parameters. Statistics–the field of study–employs certain tests and procedures to
gather knowledge concerning populations.
Instruction: Scores and Variables
Collecting data involves some activity requiring observation or measurement. The
measurements yield data values called scores, which are referred to as raw scores when it is
necessary to emphasize that the score has not been changed from the initial measurement.
2
Lecture 1
A score is a datum collected by measurement or observation. Raw
scores equal unchanged measurements or observations.
This course deals mostly with quantitative samples which are numerical data sets. The data is
collected via some measurement, each measurement being a particular datum or score. The
scores change (or at least could change) from object to object in the set. The measurement itself
is called a variable represented by the letter X, and scores are possible values of the variable
(possible x-values).
A variable is a measurable characteristic that takes different values.
We will be concerned with two types of variables, discrete variables and continuous variables.
Discrete variables take values that are separated by impossible values. A typical discrete
variable is one restricted to whole number values. For example, counting the number of siblings
of individuals or the number of deaths that occur in a hospital in a week. A variable restricted to
dollar values to the nearest cent are also discrete because values such as $0.015 are impossible.
A discrete variable takes values that represent separate categories such
that when the scores are ordered any two consecutive possible scores
are separated by a span of impossible values.
Not all variables are discrete. If a variable is not discrete, it is continuous.
A continuous variable takes values that represent categories such that
an infinite number of possible scores fall between any two measured
scores.
Examples of continuous variables include heights, weights, and durations. In cases where the
variable is continuous but measurements are rounded, it is important to recognize the rounded
scores as values of a continuous variable, not a discrete variable.
Instruction: Summation Notation
Statistics often requires the summation of a large number of numbers, so a special
notation for "summation" is required. The capital Greek letter sigma, Σ , serves this purpose.
For example, given a set of scores (x-values), A = { x1 , x2 , … , xn } , then ∑ X = x1 + x2 + + xn .
In particular, if A = {5, 7, 8, 9, 11, 12, 18} where each element in the set is a datum
considered to be the value of some measurement called the random variable X, then
∑ X = 5 + 7 + 8 + 9 + 11 + 12 + 18 , so ∑ X = 70 .
3
Lecture 1
Instruction: Frequency Distributions
One statistic is the frequency of a particular value in a data set. The frequency, f, of a
score, x, equals the number of times the score appears in the data set. A frequency distribution is
a common tool used to organize data from a sample.
A frequency distribution is an organized display—be it a tabulation or a graph—that
shows the frequency of each data value in a sample.
A frequency distribution helps organize data sets (samples) that contain numerous repeated
values. For example, consider the data set T collected by the National Oceanic and Atmospheric
Administration. T is the set of number of deaths in the United States attributed to tornados in the
month of February for the years 1950 to 1983.
T = {45, 1, 10, 3, 2, 0, 8, 0, 13, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 134, 0, 0, 0, 7, 5, 2, 0, 0, 0, 2, 0, 1}
The random variable, X, equals the number of tornado deaths in February in the United States for
a given year. The frequency distribution shown below lists each x-value and the corresponding
frequency of that value for the data contained in T.
X 0 1 2 3 5 7 8 10 13 21 45 134
1
f 20 2 3 1 1 1 1 1 1 1 1
The frequency distribution above is in tabular form, but frequency distributions can be
graphs as well. Imagine a traffic engineer collecting data on intersections. Consider sample S
comprised of the monthly number of fatal automobile accidents at a particular intersection over a
period of sixteen months. If S = {4, 0, 1, 4, 2, 1, 1, 0, 0, 3, 0, 1, 0, 2, 0, 1} , then Figure A is a
frequency distribution of S in the form of a line graph, which is called a frequency polygon.
Figure A
f 7
6
5
4
3
2
1
0
0
1
2
3
4
x
4
Lecture 1
Besides the line graph above, frequency distributions can take the form of bar graphs. Bar
graphs represent discrete variables with rectangles whose heights represent the frequency of each
score. The widths of the bars in a bar graph are uniform (equal), and all the bars are separated by
gaps of uniform size. The bar graph below is a frequency distribution for S.
Bar Graph
7
6
5
4
3
2
1
0
0
1
2
3
4
Some frequency distributions show the relative frequency of the data values. A relative
frequency distribution shows the fraction or percentage of the data set represented by a data
value. The table below is a relative frequency distribution for sample
S = {4, 0, 1, 4, 2, 1, 1, 0, 0, 3, 0, 1, 0, 2, 0, 1} .
0
38
x
f
1
5 16
2
18
3
1 16
4
18
Figure B is a relative frequency distribution for sample S in bar graph form.
Figure B
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0
1
2
3
4
5
Lecture 1
Figure C is a relative frequency distribution of sample S in circle graph form. Circle
graphs divide the area inside a circle into wedges whose sizes represent the relative frequencies
of the data values.
Figure C
4, 12.5%
3, 6.25%
0, 37.50%
2, 12.50%
1, 31.25%
Another type of frequency distribution is a grouped frequency distribution. A grouped
frequency distribution shows the frequencies of ranges of values. The ranges of values are
sometimes referred to as "classes" or "bins." Consider set D, a set of lengths.
D = {3.1, 3.6, 2.9, 4.1, 4.2, 2.2, 2.6, 3.1, 3.3, 4.2, 4.4, 5.0, 2.9, 4.1, 4.6}
It could be advantageous to organize this data according to ranges of values. Figure D is a
grouped frequency distribution using four classes in histogram form. Histograms conveniently
represent continuous variables. Histograms use contiguous rectangles whose heights represent
the frequency and whose widths represent some quantity. The histogram below is a grouped
frequency distribution for set D.
Figure D
f
7
6
5
4
3
2
1
0
1.5 ≤ x1< 2.5
2.5 ≤2x < 3.5 3.5 ≤ x3< 4.5
4.5 ≤ x4 < 5.5
6
Lecture 1
In histograms, the widths of the bars represent class range. A class is a division or subset
of the data that includes data that falls within a certain range of values. In Figure D, the numbers
1.5, 2.5, 3.5, and 4.5 represent lower class limits, the smallest possible data values for the four
classes respectively. The upper class limits for the four classes are 2.5, 3.5, 4.5, and 5.5
respectively. Here, the upper class limits represent a boundary on the largest possible data value
for each class. Class limits are either the extreme most possible data values (least or greatest) or
a minimal/maximal boundary on the possible data values (least or greatest). The uniform class
width approximately equals the ratio of the range of the data and the desired number of classes.
class width
largest data value − smallest data value
desired number of classes
Class limits should be chosen such that the uniform class width equals the difference of any two
successive upper class limits. The class width for Figure D is 1 as calculated here: 5.5 − 4.5 = 1 .
The class mark is the "middle" value of a class and can be calculated by dividing the sum of the
lower and upper limits of a class by two. The class marks for Figure D are calculated below.
1.5 + 2.5
= 2,
2
2.5+3.5
= 3,
2
3.5+4.5
= 4,
2
4.5+5.5
=5
2
Assignment 1
7
Problems
#1
Using a complete sentence, identify each data set described below as a sample or a population.
A.
B.
#2
Using a complete sentence, identify each numerical value described below as a statistic or a
parameter.
A.
B.
#3
The average annual salary of thirty of a company's 1,500 employees is $76,000.00.
According to ACT, Inc., the average ACT math score for all graduates in a
particular year was 20.7.
Consider the set of x-values: {5, 6, 9, 10} . Let n represent the cardinal number of the set.
A.
#4
A survey of five-hundred University of Texas students.
The age of each U. S. president upon election to office.
Find ∑ X .
B.
Evaluate
∑X
.
n
The table below shows the oil reserves (rounded to nearest billion of barrels) of countries in the
Western Hemisphere at a given time in history. Create a relative frequency distribution in the
form of a bar graph.
Country
Billions of Barrels
United States
36
Canada
16
Mexico
60
South America
72
Total
186
#5
The data set below represents the prices of grade A eggs (in dollars per dozen) for the indicated
years. Use a frequency polygon to display the data set.
1990
1991
1992
1993
1994
1995
#6
1.00
1.01
1.02
0.98
0.97
0.94
1996
1997
1998
1999
2000
2001
0.95
0.93
0.94
0.94
0.99
1.02
Use a pie chart to display the data. The numbers represent the number of Nobel Prize laureates
by country during the years from 1901 to 2002.
U. S. 270
U. K. 100
France
Sweden
49
30
Germany
Other
77
157
Assignment 1
#7
8
Construct a frequency distribution in histogram form with six classes for the arrayed data. The
data set represents the amount in dollars (rounded to the nearest dollar) spent on books for a
semester by thirty students. Show the class marks (mid-points of each class).
91
188
189
#8
266
190
30
472
341
127
248
398
354
279
266
8
101
88
222
249
199
526
375
269
93
530
142
184
486
43
352
Fresh N Ready Sandwiches strives to provide fresh sandwiches and quick service. The table
below reflects the results of a recent survey of customers who entered the restaurant only to
leave before ordering. Use the data to create a Pareto diagram.
Reason for Leaving
Frequency
Long lines
38
Pricing
6
Restaurant appearance
4
Other
2
Total
50
#9
Using complete sentences, explain why the following procedures do not give a random sample
for the entire population of Manhattan. Be sure to note any bias the procedure may
contain.
A.
B.
C.
#10
selecting every third woman entering a beauty shop in Gramercy Park
selecting every third person entering a bar in Chelsea
selecting every third person coming out of a boxing match at Madison Square Garden
Using a complete sentence, identify each variable described below as discrete or continuous.
A.
B.
C.
D.
the number of times a telephone rings before it is answered
the amount of time a customer waits before service
the amount of oil held in reserve by a nation
the price of grade A eggs