Document related concepts
no text concepts found
Transcript
```Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions
Chapter 1: Populations, Samples and
Processes
Applied Probability and Statistics for Engineering and Science
Chapter 1: Populations, Samples and Processes
1
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions
Outline of Chapter 1
1.1
1.2
1.3
1.4
1.5
1.6
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing Distributions
The Normal Distribution
Other Continuous Distributions
Several Useful Discrete Distributions
Applied Probability and Statistics for Engineering and Science
Chapter 1: Populations, Samples and Processes
2
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions
Introduction
Statistics theory and techniques are powerful and
indispensable means in understanding the world around
us.
The means can help one to make intelligent judgments
and decisions in the presence of uncertainty and
variation.
Without uncertainty or variation, there would be little
need for statistical techniques and statisticians.
Applied Probability and Statistics for Engineering and Science
Chapter 1: Populations, Samples and Processes
3
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions
Populations
Sample
Branches of statistics
Populations
Engineers and scientists are constantly exposed to
collections of facts/data in their work.
Population is a well-defined collection of objects.
Examples:
Students in Class ECE08
People in Vietnam
...
Applied Probability and Statistics for Engineering and Science
Chapter 1: Populations, Samples and Processes
4
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions
Populations
Sample
Branches of statistics
Sample
When desired information is available for all objects in the
population, we have what is called a census.
Practical constraints (e.g., money, time and other limited
resources) usually make a census impractical or infeasible.
Sample: a (random) subset of the population.
For instance, we might select a sample of last year’s
quality of the engineering curricula.
Applied Probability and Statistics for Engineering and Science
Chapter 1: Populations, Samples and Processes
5
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions
Populations
Sample
Branches of statistics
Sample: variable
Variable: is any characteristic whose value may change
from one object to another in the population. Examples:
X = gender of a graduating engineer, Y = age of a
graduating engineer, Z = temperature of a certain time
instance in a day.
Univariate data set: consists of observations on a single
variable.
Bivariate data: observations are made on each of two
variables.
Multivariate data: observations are made on more than
two variables.
Applied Probability and Statistics for Engineering and Science
Chapter 1: Populations, Samples and Processes
6
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions
Populations
Sample
Branches of statistics
Branches of statistics
Descriptive Statistics: methods to summarize and
describe important features of the data. Examples:
Graphical: the construction of histogram, stem-and-leaf
display, dot plot
Calculation: numerical measures of means, variances,
correlation,...
Inferential Statistics: techniques for generalizing from a
sample to a population
Applied Probability and Statistics for Engineering and Science
Chapter 1: Populations, Samples and Processes
7
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions
Stem-and-leaf displays
Dotplots
Histograms
Stem-and-leaf displays
Stem-and-leaf display: an effective way to organize
numerical data into two parts:
Stem: one or more leading digits
Leaf: the remaining digits
The display can provide the following information:
Identification of a typical or representative value
Presence of any gaps in the data
Extent of symmetry in the distribution of values
Number and location of peaks
Applied Probability and Statistics for Engineering and Science
Chapter 1: Populations, Samples and Processes
8
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions
Stem-and-leaf displays
Dotplots
Histograms
Stem-and-leaf displays: an example
In a given experiment, the values of the considered
variable are: 41,43,49,52,57,...112,114,123
The related stem-and-leaf can be presented as follows:
Applied Probability and Statistics for Engineering and Science
Chapter 1: Populations, Samples and Processes
9
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions
Stem-and-leaf displays
Dotplots
Histograms
Dotplots
Dotplot: a summary of data when the data set is
reasonably small or there are relatively few distinct data
values.
Each observation is represented by a dot above the
corresponding location on a a horizontal measurement
scale.
When a value occurs more than once, there is a dot for
each occurrence, and these dots are stacked vertically.
extremes, and gaps.
Applied Probability and Statistics for Engineering and Science
Chapter 1: Populations, Samples and Processes
10
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions
Stem-and-leaf displays
Dotplots
Histograms
Dotplot: an example
Here is an example to show what a dotplot looks like and how to interpret it. Suppose 30 first
graders are asked to pick their favorite color. Their choices can be summarized in a dotplot, as
shown below.
*
*
*
*
*
*
*
*
*
Red
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
Orange
Yellow
Green
Blue
*
*
*
*
Indigo
Violet
Each dot represents one student, and the number of dots in a column represents the number of first
graders who selected the color associated with that column. For example, Red was the most popular
color (selected by 9 students), followed by Blue (selected by 7 students). Selected by only 1
student, Indigo was the least popular color.
Applied Probability and Statistics for Engineering and Science
Chapter 1: Populations, Samples and Processes
11
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions
Stem-and-leaf displays
Dotplots
Histograms
Histograms
Construct a histogram for:
discrete data:
Determine the (relative) frequency of each x value in a
sample set
Mark possible x values on a horizontal scale
Above each value, draw a rectangle whose height is the
relative frequency of that value.
continuous data:
Determine the (relative) frequency of each class
Mark the class boundaries on a horizontal measurement
axis
Above each class interval, draw a rectangle whose height
is the corresponding frequency.
Applied Probability and Statistics for Engineering and Science
Chapter 1: Populations, Samples and Processes
12
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions
Stem-and-leaf displays
Dotplots
Histograms
Histogram: an example
1500
Number of values in each interval
Gaussian Histogram
1000
500
0
−4
−3
−2
−1
0
1
2
3
4
Variable value
Applied Probability and Statistics for Engineering and Science
Chapter 1: Populations, Samples and Processes
13
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions
Continuous distributions
Discrete distributions
Density function
A density function f (x) is used to describe
(approximately) the population distribution of a
continuous variable x.
The graph of f (x) is called the density curve.
The following properties of f (x) must be satisfied:
fR (x) ≥ 0
−∞
−∞ f (x)dx = 1 (i.e., the total area under the density
curve is 1)
For any two numbers a and b with
R b a < b, the proportion
of x values between a and b = a f (x)dx.
Applied Probability and Statistics for Engineering and Science
Chapter 1: Populations, Samples and Processes
14
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions
Continuous distributions
Discrete distributions
Density function
Applied Probability and Statistics for Engineering and Science
Chapter 1: Populations, Samples and Processes
15
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions
Continuous distributions
Discrete distributions
Mass function
A mass function p(x) is used to describe (approximately)
the population distribution of a discrete variable x.
The following properties of p(x) must be satisfied:
p(x)
P ≥0
p(x) = 1
Applied Probability and Statistics for Engineering and Science
Chapter 1: Populations, Samples and Processes
16
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions
Definition
The standard normal distribution
Definition
A continuous variable x is said to have a normal distribution
with parameters µ and σ, where −∞ < µ < ∞ and σ > 0, if
the density function of x is
f (x) = √
1
2
2
e−(x−µ) /(2σ ) with − ∞ < x < ∞
2πσ
(1)
The normal distribution is the most important distribution
in statistics.
Many population and process variables have distributions
that can be very closely fit by an appropriate normal
curve.
Applied Probability and Statistics for Engineering and Science
Chapter 1: Populations, Samples and Processes
17
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions
Definition
The standard normal distribution
The standard normal distribution
The normal distribution with parameters µ = 0 and σ = 1 is
1
called the standard normal distribution f (x) = √2πσ
0.4
0.35
0.3
f(x)
0.25
0.2
0.15
0.1
0.05
0
−6
−4
−2
0
2
4
6
x
Applied Probability and Statistics for Engineering and Science
Chapter 1: Populations, Samples and Processes
18
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions
The lognormal distribution
The Weibull distribution
Selecting an appropriate distribution
The lognormal distribution
The nonnegative variable x is said to be have a lognormal
distribution if ln(x) has a normal distribution with
parameters µ and σ.
The density function of the lognormal distribution is
(
2
2
√ 1
e−(ln(x)−µ) /(2σ ) x > 0
2πσx
f (x) =
.
(2)
0
for x ≤ 0.
Applied Probability and Statistics for Engineering and Science
Chapter 1: Populations, Samples and Processes
19
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions
The lognormal distribution
The Weibull distribution
Selecting an appropriate distribution
The lognormal distribution: an example
0.014
σ=1
µ =4
lognormal distribution
0.012
0.01
0.008
0.006
0.004
0.002
0
0
50
100
150
200
250
300
350
400
450
500
x
Applied Probability and Statistics for Engineering and Science
Chapter 1: Populations, Samples and Processes
20
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions
The lognormal distribution
The Weibull distribution
Selecting an appropriate distribution
The Weibull distribution
The distribution was introduced in 1939 by a Swedish
physicist.
A variable x has a Weibull distribution with parameters α
and β if the density function of x is
(
α α−1 −(x/β)α
x e
x>0
βα
(3)
f (x) =
0
x≤0
In recent years, the Weibull distribution has been used to
model engine emission of various pollutants.
Applied Probability and Statistics for Engineering and Science
Chapter 1: Populations, Samples and Processes
21
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions
The lognormal distribution
The Weibull distribution
Selecting an appropriate distribution
The Weibull distribution: an example
2
β=1, α=1
β=1, α=1.5
β=1, α=5
1.8
1.6
Density function
1.4
1.2
1
0.8
0.6
0.4
0.2
0
0
0.5
1
Applied Probability and Statistics for Engineering and Science
1.5
x
2
2.5
3
Chapter 1: Populations, Samples and Processes
22
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions
The lognormal distribution
The Weibull distribution
Selecting an appropriate distribution
Selecting an appropriate distribution
The choice of an appropriate distribution for a continuous
variable x is usually based on sample data.
An investigator must first decide whether a particular
family, such as the Weibull or the normal one, is
reasonable.
Then, any parameters of the chosen family must be
estimated to find a particular member of the family that
best fits the data.
Applied Probability and Statistics for Engineering and Science
Chapter 1: Populations, Samples and Processes
23
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions
The Binomial distribution
The Poisson distribution
The Binomial distribution
Suppose that items or entities of some sort come in
batches or groups of size n.
Let denote ρ the proportion of all items in the population
or process that are satisfactory (S), so the proportion of
all items that are unsatisfactory (F) is 1 − ρ
Assume the condition of any particular item (S or F) is
independent of that of any other item.
Applied Probability and Statistics for Engineering and Science
Chapter 1: Populations, Samples and Processes
24
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions
The Binomial distribution
The Poisson distribution
The Binomial distribution (cont.)
The binomial variable x is the number of S’s in a batch or
group. The mass function of x is given by the formula
n!
ρx (1−ρ)n−
x!(n − x)!
(4)
The binomial distribution is used extensively in genetic
applications.
The use of binomial distribution can be tedious when n is
large.
p(x) = proportion of batches with x S’s =
Applied Probability and Statistics for Engineering and Science
Chapter 1: Populations, Samples and Processes
25
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions
The Binomial distribution
The Poisson distribution
The Binomial distribution: a histogram
0.35
Binomial histogram
0.3
Proportion
0.25
0.2
0.15
0.1
0.05
0
0
1
2
3
Applied Probability and Statistics for Engineering and Science
4
x
5
6
7
8
Chapter 1: Populations, Samples and Processes
26
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions
The Binomial distribution
The Poisson distribution
The Poisson distribution
The Poisson distribution is usually used as a model for the
number of times an ”event” of some sort occurs during a
specific time period or in a particular region of space.
The Poisson mass function is
p(x) =
e−λ λx
x = 0, 1, 2, 3...
x!
(5)
The Poisson distribution is used telephone engineering.
Applied Probability and Statistics for Engineering and Science
Chapter 1: Populations, Samples and Processes
27
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions
The Binomial distribution
The Poisson distribution
The Poisson distribution: a histogram
0.35
λ=2
0.3
Poisson histogram
0.25
0.2
0.15
0.1
0.05
0
0
1
2
3
Applied Probability and Statistics for Engineering and Science
4
x
5
6
7
8
Chapter 1: Populations, Samples and Processes
28
```