Download chapt_9 - Gordon State College

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Statistics
Dealing With Uncertainty
Objectives





Describe the difference between a sample and a
population
Learn to use descriptive statistics (data sorting,
central tendency, etc.)
Learn how to prepare and interpret histograms
State what is meant by normal distribution and
standard normal distribution.
Use Z-tables to compute probability.
Statistics

“There are lies, d#$& lies,
and then there’s statistics.”
Mark Twain
Statistics is...


a standard method for...
- collecting, organizing, summarizing,
presenting, and analyzing data
- drawing conclusions
- making decisions based upon the
analyses of these data.
used extensively by engineers (e.g.,
quality control)
Populations and Samples

Population - complete set of all of the
possible instances of a particular object


Sample - subset of the population


e.g., the entire class
e.g., a team
We use samples to draw conclusions
about the parent population.
Why use samples?




The population may be large
 all people on earth, all stars in the sky.
The population may be dangerous to observe
 automobile wrecks, explosions, etc.
The population may be difficult to measure
 subatomic particles.
Measurement may destroy sample
 bolt strength
Team Exercise: Sample Bias


To three significant figures, estimate the
average age of the class based upon
your team.
When would a team not be a
representative sample of the class?
Measures of Central Tendency

If you wish to describe a population (or a
sample) with a single number, what do you
use?



Mean - the arithmetic average
Mode - most likely (most common) value.
Median - “middle” of the data set.
What is the Mean?

The mean is the sum of all data values
divided by the number of values.
Sample Mean
1
x 
n
n
x
i 1
Where:
 x is the sample mean


xi are the data points
n is the sample size
i
Population Mean
  1
N
N
x
i 1
i
Where:

μ is the population mean

xi are the data points

N is the total number of observations in the
population
What is the Mode?

mode - the value that occurs the most
often in discrete data (or data that have
been grouped into discrete intervals)

Example, students in this class are
most likely to get a grade of B.
Mode continued

Example of a grade distribution with
mean C, mode B
25
20
15
10
5
0
F
D
C
B
A
What is the Median?

Median - for sorted data, the median is
the middle value (for an odd number of
points) or the average of the two middle
values (for an even number of points).
 useful to characterize data sets with a
few extreme values that would distort
the mean (e.g., house price,family
incomes).
What Is the Range?

Range - the difference between the
lowest and highest values in the set.

Example, driving time to Houston is 2
hours +/- 15 minutes. Therefore...
 Minimum = 105 min
 Maximum = 135 minutes
 Range = 30 minutes
Standard Deviation

Gives a unique and unbiased
estimate of the scatter in the data.
Standard Deviation

Population
 

1
N
N
2
(
x


)
 i
Variance = 2
i 1
Sample
n
1
2
s
(
x

x
)

i
(n  1) i 1
Deviation
Variance = s2
The Subtle Difference
Between  and σ
N versus n-1
n-1 is needed to get a better estimate of
the population  from the sample s.
Note: for large n, the difference is trivial.
A Valuable Tool


Gauss invented standard deviation circa
1700 to explain the error observed in
measured star positions.
Today it is used in everything from
quality control to measuring financial
risk.
Team Exercise

In your team’s bag of M&M candies,
count




the number of candies for each color
the total number of candies in the bag
When you are done counting, have a
representative from your team enter
your data on the board
Using Excel, enter the data gathered by
the entire class
More
Team Exercise (con’t)
For each color, and the total number of
candies, determine the following:
maximum
minimum
range
mean
mode
median
standard deviation
variance
Individual Exercise:
Histograms




Flip a coin EXACTLY ten times. Count
the number of heads YOU get.
Report your result to the instructor who
will post all the results on the board
Open Excel
Using the data from the entire class,
create bar graphs showing the number
of classmates who get one head, two
heads, three heads, etc.
Data Distributions



The “shape” of the data is described by
its frequency histogram.
Data that behaves “normally” exhibit a
“bell-shaped” curve, or the “normal”
distribution.
Gauss found that star position errors
tended to follow a “normal” distribution.
The Normal Distribution

The normal distribution is sometimes
called the “Gauss” curve.
1
RF 
 2
1
2
 x   /  2
e 2
mean
RF
Relative
Frequency
x
Standard Normal Distribution
Define:
z  x    / 
Area = 1.00
Then
RF 
0.5
1 2
 z
e 2
0.4
0.3
0.2
2
0.1
0.0
-4.0
-3.0
-2.0
-1.0
0.0
z
1.0
2.0
3.0
4.0
Some handy things to know.



50% of the area lies on each side of the
mid-point for any normal curve.
A standard normal distribution (SND)
has a total area of 1.00.
“z-Tables” show the area under the
standard normal distribution, and can
be used to find the area between any
two points on the z-axis.
Using Z Tables
(Appendix C, p. 624)

Question: Find the area between z= -1.0
and z= 2.0

From table, for z = 1.0, area = 0.3413
By symmetry, for z = -1.0, area = 0.3413
From table, for z= 2.0, area = 0.4772
Total area = 0.3413 + 0.4772 = 0.8185

“Tails” area = 1.0 - 0.8185 = 0.1815



“Quick and Dirty” Estimates of
 and 





 @ (lowest + 4*mode + highest)/6
For a standard normal curve, 99.7% of
the area is contained within ± 3  from
the mean.
Define “highest” =  + 3 
Define “lowest” =   3 
Therefore,  @ (highest - lowest)/6
Example:
Drive time to Houston



Lowest = 1 h
Most likely = 2 h
Highest = 4 h (including a flat tire, etc.)



 = (1+4*2+4)/6 = 2.16 (2 h 12 min)
 = (4 - 1)/6= 0.5 h
This technique (Delphi) was used to
plan the moon flights.
Team Exercise


You want to put a scale on your
rubber-band car to relate a given scale
setting and an expected distance
traveled.
Design an experiment to establish a
scale for your car.
More
Team Exercise continued.

Some Issues to consider:



Sample size
Range of distances
Desired accuracy
Review

Central tendency




Scatter




mean
mode
median
range
variance
standard deviation
Normal Distribution