Download Set 3: Producing Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Business Statistics for Managerial
Decision
Producing Data
Producing Data



Numerical data are the raw material for
sound conclusions
Executives, investors, and managers want to
base their decisions on data rather than
relying on subjective impressions.
Statistics is concerned with producing data
as well as with interpreting already
available data.
Observational versus Experimental Studies


An observational study observes
individuals and measures variables of
interest but does not attempt to influence the
responses.
An experiment deliberately imposes some
treatment on individuals to observe their
responses.
Observational versus Experiment

For example;


To answer the question:


We want to know what percent of American adults
agree that the economy is getting better?
we interview American adults. We can’t afford to ask
all adults, so we put the question to a sample chosen to
represent the entire adult population
Sample surveys are one kind of observational
study.
Observational versus Experiment

To answer the question:



Which TV ad will sell more toothpaste?
We show each ad to a separate group of
consumers and note whether they buy the
tooth paste.
Experiments, like samples provide useful
data only when properly designed.
Population, Sample


The population in a statistical study is the
entire group of individuals about which we
want information.
A sample is part of the population from
which we actually collect information, used
to draw conclusions about the whole.
Simple Random Sample

A simple random sample(SRS) of size n
consists of n individuals from the
population chosen in such a way that every
set of n individuals has an equal chance to
be the sample actually selected.
Random Digits
A table of random digits is a long string of the
digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 with these two
properties:

1.
2.


Each entry in the table is equally likely to be any of
the 10 digits 0 through 9.
The entries are independent of each other. That is
knowledge of one part of the table gives no
information about any other part.
Table of Random Numbers (Table B) of the
textbook is an example of a random digits table.
This table may be used to draw random samples.
Choosing an SRS
Choose an SRS in two steps:

1.
2.
Label, assign a numerical label to every
individual in the population.
Use table of random numbers to select labels
at random.
Stratified Random Sample
To select a stratified random sample:

1.
2.

first divide the population into groups of similar
individuals, called strata.
Choose a separate SRS in each stratum and combine
these SRS to form the full sample.
A stratified design can produce more exact
information than an SRS of the same size by
taking advantage of the fact that individuals in
the same stratum are similar to one another.
Under-Coverage and Non-Response


Under-coverage occurs when some groups
in the population are left out of the process
of choosing the sample.
Non-response occurs when an individual
chosen for the sample can’t be contacted or
refuses to cooperate.
Designing Experiments



The individuals studied in an experiment are often
called subjects, especially if they are people.
The explanatory variables in an experiment are
often called factors.
A treatment is any specific experimental condition
applied to the subjects. If an experiment has
several factors, a treatment is a combination of
each of the factors.
Example

A chemical engineer is designing the production
process for a new product. The chemical reaction
that produces the product may have higher or
lower yield, depending on the temperature and the
stirring rate in the vessel in which the reaction
takes place. The engineer decides to investigate
the effects of combination of two temperatures
(50°C and 60°C) and three stirring rates (60 rpm,
90 rpm, and 120 rpm) on the yield of the process.
She will produce two batches of the product at
each combination of temperature and stirring rate.
Example




What are the individuals and the response
variable in this experiment?
How many factors are there?
How many treatments?
How many individuals are required for the
experiment?
Designing Experiments

Completely
Randomized Design

In a completely
randomized
experimental design,
all the subjects are
allocated at random
among all the
treatments
Principles of Experimental Design
1.
2.
3.
Control the effect of the lurking variables
on the response, most simply by
comparing two or more treatments
Randomize- use impersonal chance to
assign subjects to treatments
Replicate each treatment to enough
subjects to reduce chance variation in the
results.
Example

Many utility companies have introduced
programs to encourage energy conservation
among their costumers. An electric
company considers placing electronic
indicators in households to show what the
cost would be if the electricity use at the
moment continued for a month. Would
indicators reduce electricity use? Would
cheaper methods work almost as well?
Example




One cheaper approach is to give customers a chart
and information about monitoring their electricity
use.
The experiment compares these two approaches
(indicator, chart) and also a control.
The control group of the customers receives
information about energy conservation but no help
in monitoring electricity use.
The company finds 60 single-family residences in
the same city willing to participate.
Example
Designing Experiments

Matched pair Design



Compares just two treatments. Choose pair of
subjects that are as closely matched as possible.
Assign one of the treatment to each subject
randomly.
Sometimes each “pair in a matched pairs design
is one subject. In this model each subject serves
as his or her own control.
Designing Experiments

Block Design


A block is a group of subjects that are known
before the experiment to be similar in some
way expected to affect the response to the
treatments.
In a block design, the random assignment of
individuals to treatment is carried out separately
within each block.
Statistical Inference




A market research firm interviews a random
sample of 2500 adults. Results: 66% find shopping
for cloths frustrating and time consuming.
That is the truth about the 2500 people in the
sample.
What is the truth about almost 210 million
American adults who make up the population?
Since the sample was chosen at random, it is
reasonable to think that these 2500 people
represent the entire population pretty well.
Statistical Inference



Therefore, the market researchers turn the fact that
66% of sample find shopping frustrating into an
estimate that about 66% of all adults feel this way.
Using a fact about a sample to estimate the truth
about the whole population is called statistical
inference.
To think about inference, we must keep straight
whether a number describes a sample or a
population.
Parameters and Statistics

A parameter is a number that describes the
population.


A parameter is a fixed number, but in practice we do
not know its value.
A statistic is a number that describes a sample.


The value of a statistic is known when we have taken a
sample, but it can change from sample to sample.
We often use statistic to estimate an unknown
parameter.
Example

A public opinion poll in Ohio wants to determine
whether registered voters in the state approve of a
measure to ban smoking in all public areas. They
select a simple random sample of 50 registered
voters from each county in the state and ask
whether they approve or disapprove of the
measure. The proportion of registered voters in
the state who approve of banning smoking in
public areas is an example of (parameter, or
statistic)
Example

A survey conducted by the marketing
department of Black Flag asked whether the
purchasers of a new type of roach disk
found it effective in killing roaches.
Seventy-nine percent of the respondents
agreed that the roach disk was effective.
The number 79% is a (parameter, or
statistic)
Example



In the marketing research example, the survey
asked a nationwide random sample of 2500 adults
if they agreed or disagreed that “ I like buying new
cloths, but shopping is often frustrating and time
consuming.”
Of the respondents, 1650 said they agreed.
The proportion of the sample who agreed that
cloths shopping is often frustrating is:
1650
Pˆ 
 .66  66%
2500
Example



The number P̂ = .66 is a statistic.
The corresponding parameter is the
proportion (call it P) of all adult U.S.
residents who would have said “agree” if
asked the same question.
We don’t know the value of parameter P, so
we use as
P̂ its estimate.
Sampling Variability, Sampling Distribution




If the marketing firm took a second random
sample of 2500 adults, the new sample would
have different people in it.
It is almost certain that there would not be exactly
1650 positive responses.
That is, the value of P̂ will vary from sample to
sample.
Random samples eliminate bias from the act of
choosing a sample, but they can still be wrong
because of the variability that results when we
choose at random.
Sampling Variability, Sampling Distribution



The first advantage of choosing at random is that
it eliminates bias.
The second advantage is that if we take lots of
random samples of the same size from the same
population, the variation from sample to sample
will follow a predictable pattern.
All statistical inference is based on one idea: to
see how trustworthy a procedure is, ask what
would happen if we repeated it many times.
Sampling Variability, Sampling Distribution



Suppose that exactly 60% of adults find shopping
for cloths frustrating and time consuming.
That is, the truth about the population is that
P = 0.6.
What if we select an SRS of size 100 from this
population and use the sample proportion P̂ to
estimate the unknown value of the population
proportion P?
Sampling Variability, Sampling Distribution

To answer this question:




Take a large number of samples of size 100
from this population.
Calculate the sample proportion P̂ for each
sample.
Make a histogram of the values of P̂ .
Examine the distribution displayed in the
histogram for shape, center, and spread, as well
as outliers or other deviations.
Sampling Variability, Sampling Distribution



We can not afford to actually take many
samples from a large population such as all
adult U.S. residents.
We can imitate many samples by using
random digits.
Using random digits from a table or
computer software to imitate chance
behavior is called simulation.
Sampling Variability, Sampling Distribution



The result of many SRS have a regular pattern.
Here we draw 1000 SRS of size 100 from the same population.
The histogram shows the distribution of the 1000 sample proportions P̂
Sampling Distribution

The sampling distribution of a statistic is
the distribution of values taken by the
statistic in all possible samples of the same
size from the same population.
Sampling Distribution



The distribution of sample proportionsP̂ for 1000 SRS of size 2500
drawn from the same population as in previous figure.
The two histograms have the same scale
The statistic from larger sample is less variable.
Bias and Variability

Bias concerns the center of the sampling
distribution.


A statistic used to estimate a parameter is unbiased if
the mean of its sampling distribution is equal to the true
value of the parameter being estimated.
The variability of a statistic is described by the
spread of its sampling distribution.


This spread is determines by sampling design and the
sample size n
Statistics from larger samples have smaller spread
(variability).
Bias and Variability
Managing Bias and Variability

To reduce Bias:



use random sampling. When we start with a list of the
entire population, simple random sampling produces
unbiased estimates
The value of a statistic computed from an SRS neither
consistently overestimate nor consistently
underestimate the value of the population parameter.
To reduce variability of a statistics from an SRS

Use a larger sample. You can make the variability as
small as you want by taking a large enough sample.
Sampling Distribution of sample mean

The sampling
distribution of sample
mean X for 1000
SRSs of size 10 from
the domestic gross
sales (millions of
dollars) of all movies
released in The U.S. in
the 1990s.
X
Sampling Distribution of sample mean


The distribution of sample
means X for 1000 SRSs
of size 100 from the
domestic gross sales
(millions of dollars) of all
movies released in the
U.S. in the 1990s.
Note the change in
variability and shape of
the distribution.
X
Probability and sampling distribution




What is the mean income of households in the
United States?
The Bureau of Labor Statistics contacted a random
sample of 55,000 households in March 2001 for
the current population survey.
The mean income of the 55,000 households for the
year 2000 was X  $57,045.
$57,045 is a statistic that describes the CPS
(Current Population Survey) sample households.
Probability and sampling distribution




We use sample mean to estimate an unknown
parameter, the mean income of all 106 million
American households.
We know that X would take several different
values if the Bureau of Labor Statistics had taken
several samples in March 2001.
We also know that this sampling variability
follows a regular pattern that can tell us how
accurate the sample result is likely to be.
That pattern obeys the laws of probability.
The Idea of Probability




Toss a coin, or choose a SRS. The result can not
be predicted in advance, because the result will
vary when you toss the coin or choose the sample
repeatedly.
But there is still a regular pattern in the results, a
pattern that emerges only after many repetitions.
Chance behavior is unpredictable in the short run
but has a regular and predictable pattern in the
long run.
This fact is the basis for the idea of probability.
The Idea of Probability



The proportion of tosses
of a coin that give a head
changes as we make more
tosses.
Eventually , however, the
proportion approaches 0.5,
the probability of a head.
This figure shows the
results of two trials of
5000 tosses.