Download Why should you study Statistics? Two broad applications of Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

German tank problem wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
Why should you study Statistics?
to enhance your ability to comprehend statistical jargon presented by the media and retailers
to enable you to apply basic statistical techniques
to help you realize that statistics is an important tool
that is needed in business and industry
{ estimate the percentage of defective light bulbs in
a large warehouse based on testing only a small
portion of the bulbs (Chapter 7)
{ \...lax quality control were basic causes" for the
deaths of motorists in Ford Explorers with Firestone tires (Chapter 12)
{ predict the amount of sales for a company based
on how much the company spent on advertising
(Chapters 10, 11)
Two broad applications of Statistics
Example:
Collect a set of data of the GPA of 25 sophomores
sampled from those enrolled in Stat 227.
Descriptive Statistics
Compute sample statistics (i.e., mean, variance), construct histograms etc., and report these just for the 25
sophomores.
Here numerical and graphical summaries are used just to
describe the sample at hand.
Inferential Statistics
Compute sample statistics, construct histograms etc., and
use statistical methods to make a general statement about
all sophomores enrolled in 227 (even perhaps in other
years).
Here we use a sample to make statements (or draw conclusions) about the whole population.
2
1
A population is the set of all units of interest in a study.
Example of populations:
all employed workers in the U.S.
all registered voters in CA.
everyone who has purchased a domestic minivan
last year.
all year 2000 transactions at the Memorial
Union ATM machine.
Example:
A variable is a characteristic of the units of a population.
A variable is either
A measurement is the process of assigning numbers to
the variables of the individual units.
If you had the resources, so that you can obtain the
weights of all ISU students then you have a census.
A census is obtained when every unit in a population is
measured.
However, in most studies, obtaining a census is too expensive, time consuming or just not reasonable.
For example, it is not reasonable to measure the diameter
of every ball-bearing of a 500,000 lot made in a day by a
machine.
quantitative
- age or income of workers,
- lifetime of bulbs,
- rating given to the minivan,
- value of each sale
qualitative
- gender,
- political party,
- state of residence
Suppose that a researcher is interested in the percent of
all ISU students who are overweight.
a). Population: All ISU students (the
units are the students)
b). Variable: Weight.
c). Measurement: Record weight in pounds
using a scale.
A possible solution is to take a sample of the population.
3
4
A sample is a subset of a population.
How should you select a sample?
Some ways of selecting a sample are better than others for
making valid statistical statements about the population.
Your goal must be to obtain a sample that is representative of the population.
Example Weight of ISU students(continued)
Methods for obtaining a sample:
i). Weigh the oensive linemen of the football team.
What's bad about the sample here?
Sample not representative of ISU students
Average sample weight will over-estimate
the average population weight.
ii). Weigh the rst 50 students who enter the library
at 7:30 in the morning.
Is this sample better than the rst?
Yes, the sample is more representative of the
student body, plus you measure more units
in this subset.
However, you restrict your sample to students who go to the library early mornings.
iii). Randomly, select 50 students for weighing
from the ocial enrollment list.
Is this sample better than the others?
Yes, every student has an equal chance
of being selected.
Hence, this sample is more representative of the population.
The better sampling schemes have some form of randomness involved in the selection process.
After the sample is obtained and measurements made,
statistics and/or graphs are computed one can make a
statement about the population.
5
Drawing conclusions and make decisions from
data.
A statistical inference is an estimate, prediction, or
other generalization about a population based on information contained in a sample.
Example(continued)
After computing the average weight of students in the
random sample, one can use ststistics to make a statement
about the average weight of all ISU students.
When one obtains a census, there is no inference involved
and all averages are exact.
However, when we obtain a sample and make inferences
about a population the conclusions are not exact.
Hence, we must attach a measure of reliability to any inference we make. A measure of reliability is a quantied statement about the degree of uncertainty associated
with statistical inference.
The average weight of the random sample of 50 students
will not exactly equal the unknown average of the ISU
student population.
For example, we may be able to state the degree of certainty involved as:
\the population average weight is within (150 10) lbs
with a 1% chance of error."
7
6
Describing Data
Graphical Methods
Bar Graphs (Section 2.1)
Histograms (Section 2.2)
Scatterplots (Section 2.9)
Time Series plot (Section 2.10)
Numerical Summaries
Measures of Central Tendency (Section 2.4)
Mean
Median
Measures of Variability (Section 2.5)
Range
Variance
Standard deviation
Measures of Relative Standing (Section 2.7)
z-score
Percentile
A measure of relative standing compares an individual
object to the rest of the population. For example, if you
got a the 75th percentile on the SAT standardized test,
this means that you did better that 75% of the people
that took the test.
8
Sample Statistics vs. Parameters
A parameter is a descriptive statistic of a population.
A sample statistic is a descriptive statistic of a sam-
ple.
Take a census of a population recording values of a variable as x1; x2; : : : ; xN , where N is the population size,
then
P
Population mean, = Nx
P
Population variance, 2 = (xN;)2
p
Population standard deviation, = 2
Since censuses are not taken for every population we wish
to study, parameters cannot be exactly calculated and
thus usually unknown.
Take a sample of a population recording a values of a
variable as x1; x2; : : : ; xn, where n is the sample size,
recording a values as
P
sample mean, x = nx
P
sample variance, s2 = (nx;;1x)2
p
sample standard deviation, s = s2
, 2, and are parameters.
x, s2, and s are sample statistics.
An Example
Calculate the sample mean, variance, and standard deviation for data from Example 2.8
i
1
2
3
4
5
x
2
3
3
4
3
P x = 15
10
Page 1 of 1
Percentage of Revenues Spent on R & D
Histogram
Relative Frequency
Quantiles
0.40
100.0% maximum
99.5%
0.30
97.5%
0.20
90.0%
0.10
quartile
75.0%
median
50.0%
quartile
25.0%
5.5 6.6 7.7 8.8 9.9 11 12.1 13.2
10.0%
R & D percentage
2.5%
0.5%
The distribution of the R & D percentages is slightly
minimum
skewed to the right. Note that the Mean lies to the right of 0.0%
the Median. The standard deviation s is 1.98. In the text it
is shown that approximately 94% of the data values are
within 2 standard deviations of the mean.
11
Moments
13.500
13.500
13.500
11.280
9.625
8.050
7.050
6.500
5.310
5.200
5.200
8.492000
Mean
1.980604
Std Dev
0.280100
Std Err Mean
upper 95% Mean 9.054880
7.929120
lower 95% Mean
50.000000
N
4
9
9
16
9
P x2 = 47
Compute sample variance s2 and sample standard deviation s.
P x2 ; (P x)2 47 ; (15)2
2
n =
5 = 2=4 = 0:5
s =
n;1
5;1
r
p
s = (s2) = 0:5 = 0:71
9
JMP Analysis of Example from Chapter 2
x2