Download Elec471 Embedded Computer Systems Chapter 4, Probability and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Elec471 Embedded Computer Systems
Chapter 4, Probability and Statistics
By Prof. Tim Johnson, PE
Wentworth Institute of Technology
Boston, MA
Theory and Design for Mechanical Measurement
by Richard Figliola
Content
•
•
•
•
•
Introduction
Statistical Measurement Theory
Infinite & Finite Statistics
Chi2 distribution
Regression Analysis
Introduction
• For any set of measurement data an average
and standard deviation from the average can be
determined.
• The question is how close does this average
represent all the measurements in the set?
• Would a different set of measurements be
exactly the same?
• Do the variations meet the tolerances?
• How well do the results describe the
measurement?
Statistical Goals
1. A single value that best characterizes the
average of the data set.
2. A value that gives the variation in the data
set from the average.
3. A probability that indicates how well the
single average value represents the true
average value of the variable measured.
Statistical Measurement Theory
• Definition: a sample is a set of data obtained
during repeated measurements of a variable
under fixed operating conditions.
• An assumption is that systematic error in the
measurement is negligible—the average error in
a data set is zero.
• The true value is denoted: x’. The average is 𝑥.
The average is also known as the mean. The
uncertainty interval is ux. The probability level is
P%
𝑥 ′ = 𝑥 ± 𝑢𝑥 𝑃%
Random Variables
• One characteristic about measurements is a random
scattering of the values obtained that collect around a
central value. This behavior is called central tendency.
• In this sense, the measured variable behaves as a
random variable.
• If the variable is continuous in time or space then it is a
continuous random variable.
• If the variable is continuous but has only discrete
values then it is discrete random variable.
• Probability deals with the concept that certain values
of a variable will repeat with some frequency of
occurrence.
Probability Density Functions
• The accumulation of the data points repeating
about a central point creates a density that
occurs with a certain probability.
• The central value and those values scattered
about it can be determined from the
probability density of the measured variable.
• The frequency with which the measured
variable assumes a particular value is
described by it probability density.
Problem Example 4.1
for small data sets
• This problem develops a statistical analysis of the
data set from 20 sample measurements.
• Each sample, x, is numbered sequentially i from 1 to
20 where 20 is the total number of samples, N.
• The conditions for this
sampling is that the
readings taken under
identical
operating
conditions.
Problem example continued
• The data is grouped into K small intervals…
• Where the interval is defined as:
𝑥 − 𝛿𝑥 ≤ 𝑥 ≤ 𝑥 + 𝛿𝑥
𝑥𝑚𝑎𝑥 −𝑥𝑚𝑖𝑛
• The value of 𝛿 is determined by the formula:
𝐾
• Rule: at least one interval has ≥5 members.
• A formula to calculate K is
𝐾 = 1.87(𝑁 − 1).4 +1
• As N becomes large this formula tends to K≈ 𝑁
• nj represents the number of data points in each
interval where j=1 to K.
Problem example continued
𝐾
𝑗=1
𝑛𝑗
• The formula above states that the sum of the number of
occurrences in each interval is equal to the total number of
samples.
• Let fj be equal to the frequency of occurrences in each
interval then the area under the percent frequency
distribution curve will always equal the total frequency of
occurrence or 100%:
𝐾
100 ×
𝑓=1
where fj= nj/N
𝑓𝑗 = 100%
Probability Density Function (PDF)
formula for this example
INSERT FIGURE
Figure 4.2 Histogram and frequency
distribution for data in Table 4.1
𝑝 𝑥 =
𝑛𝑗
𝑁→∞,𝛿𝑥→0 𝑁 2𝛿𝑥
lim
The probability density function, p(x), above defines the probability that
a measured variable might assume a particular value upon any
individual measurement and graphically displays the central tendency
interval wherein is contained the best estimate of the true mean value.
Other Types of Distributions
Normal—is used for most physical
properties that are continuous or regular
in time or space. Variations due to
random error.
Log normal—used for failure or durability
projections; events whose outcomes
tend to be skewed toward the extremity
of the distribution.
Poisson—used for events that
occur randomly in time; p(x) refers
to probability of observing x events
over time.
Types of Distributions con’t
Weibull—Used in fatigue test; similar
to log normal applications.
Binomial—Used in situations describing
the number of occurrences, n, of a
particular outcome during N independent
tests where the probability of any
outcome, P, is the same.
Rule
• Regardless of the type of distribution a
variable can be described by its mean value
and variance.
Calculation of the true mean value x’
𝑇
1
′
𝑥 = lim
𝑇→∞ 𝑇
∞
𝑥 𝑡 𝑑𝑡 ≈
0
𝑥𝑝 𝑥 𝑑𝑥
−∞
If the measured variable is described by discrete data xi where i=1 to N
1
𝑥 = lim
𝑁→∞ 𝑁
𝑁
′
𝑥𝑖
𝑖=1
Calculation of the true variance, 𝜎2
or the width of the data variation
1
2
𝜎 = lim
𝑇→∞ 𝑇
𝑇
∞
𝑥 𝑡 − 𝑥 ′ 2𝑑𝑡 ≈
0
𝑥 − 𝑥 ′ 2𝑝 𝑥 𝑑𝑥
−∞
Or for discrete data:
𝜎2
1
= lim
𝑁→∞ 𝑁
𝑁
(𝑥𝑖 −𝑥′)2
𝑖=1
The standard deviation, 𝜎, is defined as the square root of the variance…
So there is one last step in calculating the standard deviation and that is to
take the square root of the variance: 𝜎 = 𝜎 2
Infinite Statistics
• There are some fundamental difficulties in
working with infinite sets…indicated in the
integrals calculating the mean value and
variance.
• Infinite statistics introduces the connection
between probability and statistics.
• One useful distribution used to introduce
infinite statistics is the normal or Gaussian
distribution.
Gaussian Distribution
• This is a data set that is symmetrical about the central
tendency such as the familiar bell curve.
• The PDF of a Gaussian distribution is:
1
1 𝑥 − 𝑥′ 2
𝑝 𝑥 =
𝑒𝑥𝑝 −
2
𝜎2
𝜎 2𝜋
𝑥−𝑥 ′
𝜎
• Let 𝛽 =
as the
standardized normal
deviation for the z variable
which specifies an interval
on p(x).
INSERT Figure 4.3,
page 118
How to use this
chart:
If (x1-x’)/σ = 1.00
then p(z1) = .3413
Probability would
be 34.13 % or one
standard deviation
double-sided value
is 68.26%.
If (x1-x’)/σ = 2.00
then p(z1) = .4772
Probability would
be 47.72 % or two
standard deviations
double-sided value
is 95.44%.
Two standard
deviations means
95% of the values
for x are included in
the confidence
value.
Finite Statistics
• Finite statistics is used to estimate the true mean
and true variance of a finite sample.
• It provides only an estimate of these values and
describes only the behavior of the sample.
• It estimates are called:
•
•
1 𝑁
the sample mean value, 𝑥 =
𝑥
𝑁 𝑖=1 𝑖
1
𝑁
2
The sample variance, 𝑆𝑥 =
𝑥𝑖
𝑖=1
𝑁−1
−𝑥
2
• The sample standard deviation, 𝑆𝑥 = 𝑆𝑥2
• These equations are reasonable regardless of the
type of PDF for the sample.
Extending finite statistics
the t estimator
• The degrees of freedom, v, in a statistical estimate
equate to the number of data points minus the number
of previously determined statistical parameters used in
estimating that value.
• The weight of z, the interval for the standard deviation,
can be weighted to compensate for the difference
between the statistical estimate and the expected
infinite statistics for the variable.
𝑥𝑖 = 𝑥 ± 𝑡𝑣,𝑃 𝑆𝑥 (𝑃%)
• The variable tv,P is the t estimator which represents a
precision interval given at probability, P%, within which
one should expect any measured value to fall. In table
4-4, you obtain t using v and Pxx (where xx is the
probability desired. See example next slide.
From the example where
N=20, to calculate xi
subtract 3 to get v =17
Then pick % of confidence
that xi is include in the
range, say 90%. In that
case tv,P = 1.74
This is the cofactor Sx is
multiplied by in the
equation on the last slide.
Standard Deviation of the Means
• Finite sample sets will have somewhat
different statistic
• The variation in the sample statistics will be a
normal distribution from the sample mean
values about the true mean.
• The variance of the distribution of the mean
values that could be expected can be
estimated through the standard deviation of
𝑆𝑥
the means, 𝑆𝑥 =
𝑁
Pooled Statistics
• Replication are independent estimates of the
same measured value, their data represents
separate data samples that can be combined to
provide a better statistical estimates of a
measured variable.
• Samples that are grouped in a manner so as to
determine a common set of statistics are said to
be pooled.
• Use M replications of N samples and the
equations on the next slide for pooled data.
Pooled Statistics Equations
• Pooled means of x:
1
𝑥 =
𝑀𝑁
𝑀
𝑁
𝑥𝑖𝑗
𝐽=1 𝑖=1
• Pooled standard deviation of x:
𝑆𝑥 =
1
𝑀(𝑁 − 1)
𝑀
𝑗=1
𝑁
(𝑥𝑖𝑗 − 𝑥𝑗 )2
𝑖=1
• Pooled standard deviation of the means of x:
𝑆𝑥
𝑆𝑥 =
𝑀𝑁
Pooled Statistics Equations
if data set are not equal amounts
• The replications can be weighted by their particular
degrees of freedom…
• Pooled mean of x is defined by its weighted mean:
•
𝑥 =
𝑀
𝑗=1 𝑁𝑗 𝑥𝑗
𝑀 𝑁
𝑗=1 𝑗
where j refers particular data set
• Pooled standard deviation:
•
𝑆𝑥 =
𝑣1 𝑆𝑥2 +⋯
𝑣1 +⋯
• and other definitions for degrees of freedom from the
text.
Chi-Squared Distribution
• The Chi-Squared distribution allows you to
give an estimate of the variance (σ2) interval
within a stated probability for N data points.
• For the normal distribution the Chi-squared
𝑣𝑆𝑥 2
statistic is χ²= 2 where v is the degrees of
𝜎
freedom defined as N-1.
P( 
2
1 / 2
    / 2 )  1  
2
2
Level of Significance
• In summary:
𝑃 𝜒2 = 1 − 𝛼
• Thus the Chi-square probability is equal to
1- α
• α is called the level of significance
• The lower the χ² value the better a data set fits
the assumed distribution function.
• Thus a high α (level of significance) the better the
fit.
• This is called the Goodness-of-Fit Test
Regression Analysis
• Regression analysis is used to establish a
relationship between the measured variable
and an independent variable.
• The analysis develops a formula that allows
you to calculate one value given the other.
• This analysis is used to fit a curve to data.
• The deviation between the actual data and
the curve is denoted by the notation: Sxy
Applying Statistics
• Excel provides some statistical analysis useful with this
lab; such as, Average, Variance, Standard Deviation,
and Regression Analysis.
• Using insert formula to add these formulas at the
bottom of a column of numbers (remember to label
what the number represents…)
• Regression analysis is available using Trendlines (click
add R2 to graph).
• At the end of this lab you should be able to point to the
better of the two measurement systems and state
reason based on your mathematical analysis.
Sample spreadsheet
Measurement System Error estimate
The error in the system is directly
related to the error in the
calculation of the permeability.
Using standard values for area,
length, current, and number of
turns, you should be able to
calculate the permeability coming
up with the same value as 4π*10-7.
Column G is my calculated value and
the difference (being the error) is in
Column H. Finding the average of
the error and its standard deviation
is adequate assessment of the error
in the measurement system.
Reproducibility
Reproducibility refers to the closeness of agreement in results obtained
from duplicate test carried out under changed conditions of
measurements. Combining the results from all the various lab groups
will allow us to measure that using Trendlines (linear) upon graphing
the difference of the implied µ and the actual value of µ. Adding the R2
value to the chart shows the closeness of the fit and in this case using
the sample date for Example 4-1, the lack of reproducibility.
Here we’ve added a linear Trendline to
the graph of the differences to
determine the R2 value. The value
shows the repeatability error. If R2 is
large there is a good likelihood of
being able to repeat the test in a
predictable fashion. For the instance
shown, R2 is very, very small.
Creating a Histograph
• It is impossible from looking at the data on the
previous slide to detect any pattern to the
numbers.
• The only observation is that it looks like noise.
• A histograph can bring order to disarray.
• Statistical software packages can fix this or
you can write your own software.
• Complete the homework on this topic and
learn how to make a histograph using Excel.