Download Lecture05

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Regression toward the mean wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Physics 114: Lecture 5
Uncertainties in Measurement
John Federici
NJIT Physics Department
The contribution of Prof. Dale Gary’s course notes for this course is
gratefully acknowledged.
Nobel Prize Trivia
For which discovery was Albert Einstein awarded
the Nobel Prize?
(a) E=Mc2
(b) Special Relativity
(c) General Relativity
(d) Stimulated Emission – Basic
Laser principles
(e) Brownian Motion
(f) Photoelectric Effect
Nobel Prize Trivia
For which discovery was Albert Einstein awarded
the Nobel Prize?
The Nobel Prize in Physics 1921
was awarded to Albert Einstein "for
his services to Theoretical Physics,
and especially for his discovery of
the law of the photoelectric effect".
In 1905 Einstein published four landmark
papers in physics - on the photoelectric effect,
Brownian motion, the special theory of
relativity and equivalence of matter and energy
(E=mc2).
Some Terms
Accuracy—How close the measurements are to the “true” value
(note that we may not always know the true value).
 Precision—How close repeated measurements are to each other.
A measure of the spread of data points.



One can make measurements that are highly accurate (their mean is close to
the true value) even though they may not be very precise (large spread of
measurements). Conversely, on can make very precise measurements that
are not accurate.
Errors—Deviations of measurements from the “true” value. Error
here does not mean a blunder! Also referred to as uncertainties.


Systematic Errors—deviations from the “true” value that are very reproducible,
generally due to some uncorrected effect of an instrument or measurement
technique. An example is using an oven which is not calibrated properly and
ALWAYS produces a temperature which is higher than the control setting.
Statistical, or Random Errors—fluctuations in measurements that result in their
being both too high and too low, due to how precisely the measurement can
be made, and which are amenable to reduction by doing repeated
measurements.
Precision Versus Accuracy
What do
vertical lines
through data
points mean?
Precise but
Inaccurate
Data
Accurate but
Imprecise
Data
“True” values are represented by straight lines
February 08, 2010
Systematic Errors
Systematic Errors are REPRODUCIBLE discrepancies between the
“measured” and “true” values
February 08, 2010
Parent and Sample Distributions



Imagine a process for manufacturing ball bearings. Although each
ball bearing is nominally the same, any process is going to cause
slight deviations in shape, size, or other measure. If we measure the
weight, say, of an infinite number of such ball bearings, these weight
measurements will spread into a distribution around some mean
value. This hypothetical, infinite distribution is called the parent
distribution. The parent distribution’s spread depends, obviously, on
how precise the manufacturing process is.
We can never measure an infinite number of ball bearings. Instead,
we measure a smaller subset of ball bearings, and from this sample
we again find that our measurements spread into a distribution
around the sample mean. This finite distribution is called the sample
distribution.
In the limit of an infinite sample, of course, the sample distribution
should become the parent distribution (assuming we have no
systematic errors).
Example Sample & Parent Dist.


At the left are the sample distributions for a series of 16 sets of 50
measurements:
At the right is the sum of these 16 measurements (equivalent to 800
measurements). Apparently 800 is close to infinity, since the sample
distribution now is quite close to the parent distribution (red line).
What To Do When the “True”
Value is Unknown—The Mean

When we do not know the “true” value that we are comparing our
sample to, we can take the mean of the measurements as an
approximation of the “true” value.
1
x
N

x
i
N
where we use the notation
x x
i 1
i
Of course, the mean of the parent population is
1
N  N

  lim 

 x 
i
i
Probability and Median
The spread of values about the mean in the parent population (that is,
the histogram) forms a function called a probability density function
(PDF). We will be using this term many, many times during the course.
 Its connection to probability is as follows: if you take the PDF and
normalize its area, so that the area under the curve (the integral) is
unity, then the integral in a restricted range x1 to x2 is the probability
that a given measurement will fall in that range.

x2
P( x1  x  x2 )   p( x)dx
x1
Notice that we use P(x) for the probability, and p(x) for the probability
density (PDF).
 The median (1/2) is the point where the probability is equal (i.e. 1/2)
on each side:

P( xi  1/2 )  P( xi  1/2 )  1/ 2
Most Probable Value (Mode)
Most probability density functions (PDFs) have a single peak. The value
of x at which they peak is the most probable value, or mode. This is
the same as the mean for symmetric PDFs, but they can be quite
different for asymmetric ones.
 The most probable value is called max, and obeys

P(max )  P( xi  max )

Examples of when to use median vs. mean.



For a set of measurements (sample distribution) that follows the Gaussian (Normal)
distribution, the mean and median are basically the same, so long as the sample is large.
However, the median is often preferred over the mean, as an estimate of the true value,
in the presence of outliers.
Say we have a set of measurements x = 190. + randn(1,100); You can check that the
mean and median are nearly identical. Now say there was something wrong with the
42nd measurement (x(42) = 300.;). Now the median is nearly unchanged, but the mean
is much higher.
Deviations and RMS
If the parent distribution mean is , the deviations from the mean can be
written di  xi  .
 The average of the deviations, by virtue of the definition of the mean, must
vanish:
1
1

d  lim   xi     lim   xi   .
N  N
N  N


 Still, we may want to know what is the average absolution deviation, i.e. not
consider the sign of the deviation, just the amount:
1

  lim   xi    .
N  N



For computational purposes, it is better to define the square of the deviations
(called the variance) 2
2
1
1

s  lim    xi      lim   xi 2    2 .
N  N

 N   N

 Then the standard deviation (also called RMS or root-mean-square
deviation) is the square-root of the variance, s.
2
 1
 To calculate the variance of the sample distribution, use: s 2  
x

x


 i
.

 N 1

Basic Statistics in MatLab
February 08, 2010
NOTE: These functions treat
vectors and arrays differently
Consider a two-dimensional input array, A.
•If dim = 1, then mean(A,1) returns a row vector
containing the mean of the elements in each
column.
If dim = 2, then mean(A,2) returns a
column vector containing the mean of
the elements in each row
February 08, 2010
Converting 2D data to 1D data
If you have a 2D data set (for example, an image) and want to perform a statistical
analysis on the ENTIRE 2D set of data, and NOT just on the columns or just the
rows, HOW DO YOU DO IT?
You must convert 2D data set into a 1D data set and THEN apply statistical
functions
“:” operator
A=
converts the
1 2 3
multidimentional
1 2 3
array into a
1 2 3
VECTOR
Excellent reference for indexing of arrays:
https://www.mathworks.com/company/new
sletters/articles/matrix-indexing-inmatlab.html
>> A(:)
ans =
1
1
1
2
2
2
3
3
3
Matlab Example
>> z=randn(1,1000);
>> max(z)
ans =
2.9095
>> min(z)
ans =
-3.0790
>> mean(z)
ans =
0.0038
>> median(z)
ans =
0.0398
>> mode(z)
ans = -3.0790
>> std(z)
ans = 0.9835
The PARENT distribution of the
randn function has a mean of zero.
Note that for this SAMPLE
distribution, the mean is close to
zero
The PARENT distribution of the
randn function has a Standard
Deviation of 1. Note that for this
SAMPLE distribution, the STD is
close to unity.
February 08, 2010
How do these values compare
to the histogram plot?
Mean, Median
2*STD
February 08, 2010
Class Exercise:
• Use Matlab HELP or DOCUMENATION SEARCH as needed
• Create a MATLAB program which
Creates an array of data whose parent function has an
AVERAGE of 15 and a Standard Deviation of 4.
HINT: Scale the randn function….
z=Offset+Factor*randn(1000,1);
Using the appropriate Matlab functions, calculate the mean,
mode, median, and standard deviation of the distribution.
Is the Standard Deviation and average of the sample data
close to that of the PARENT distribution?
February 08, 2010