Download Lecture 15

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

History of statistics wikipedia , lookup

Law of large numbers wikipedia , lookup

Time series wikipedia , lookup

Transcript
Physics 114: Lecture 15
Probability Tests & Linear
Fitting
Dale E. Gary
NJIT Physics Department
Reminder of Previous Results
Last time we showed that rather than considering a single set of
measurements, one can join multiple sets of measurements to refine both
the estimated value and the precision of the mean.
 The rule for finding the standard deviation of such a combination of sets of
measurements, for the case of all statistically identical data sets (i.e. same
errors ), is

 
.
N
 Likewise, the rule for combining data sets with different errors is
1
 2 
.
2
 1  i 


That led us to the concept of weighting, where perhaps the errors
themselves are not known, but the relative weighting of the measurements
is known. In that case, the rule for individual sets of data is:
2
then combine N sets as usual
wi xi

w
(
x


)

N

i
i
 
, 2 

.

w


.
w
N

1
 i

 i
N
Mar 29, 2010
Probability Tests
We sometimes need to know more than just the mean and standard
deviation (uncertainty) of a set of measurements. For many cases, we also
want to assess how likely our result is to be “true.”
 One way to do this is to relate the uncertainty to the Gaussian probability.
For example, we have learned that approximately 68% of measurements in
a Gaussian distribution fall within 1 of the mean . In other words, 68% of
our measurements should fall in the range ( – ) < x < ( + ). If we
repeat our measurement many times to determine the mean more precisely
(    / N ), then again 68% of the repeated measurements should
average in the range (’ – ) < x < (’ + ).
 A table of probability versus  is given in Table C.2. In science, it is
expected that errors are given in terms of ±1. Thus, stating a result as
3.4±0.2 means that 68% of values fall between 3.2 and 3.6. In some
disciplines, it is common instead to state 90% confidence intervals (1.64),
in which case the same measurements would be stated as 3.4±0.37. To
avoid confusion, one should say 3.4±0.37 (90% confidence level).

Mar 29, 2010
Probability Tests, cont’d
A problem, however, occurs when we want to assign a probability estimate to
measurements that are based on only a few samples. Although the samples are
governed by the same parent mean and width (), the sample width s is so poorly
determined with only a few measurements that we should take that into account.
 In such cases, a better estimate of probability is given by Student’s t distribution.
Note that this has nothing to do with students. It was first described by an author
who published under the name Student. In this distribution, the parameter t is the
deviation in units of the sample standard deviation, t = (x – x )/s.
 It is a complicated function:
 (n 1)/2
1 G[(n  1) / 2]  t 2 
pt (t ,n ) 
1  
n G(n / 2)  n 

where G is the gamma function (see Chapter 11), and n is the number of degrees
of freedom (N – 1 in this case).
 This function (listed in Table C.8) differs from Table C.4 for small N, but is nearly
identical for N > 30 or so.
Mar 29, 2010
Chi-Square Probability
I want to introduce a useful concept without proof, called the  (chi-square)
test of goodness of fit. We will need it in the next lecture when we describe
linear fits to data.
20
20
20
20
 Consider our histograms from
10
10
10
10
Lecture 14.
2

0
2
4
6
8
0
2
4
6
8
0
2
4
6
8
0
20
20
20
20
10
10
10
10
0
2
4
6
8
0
2
4
6
8
0
2
4
6
8
0
20
20
20
20
10
10
10
10
0
2
4
6
8
0
2
4
6
8
0
2
4
6
8
0
20
20
20
20
10
10
10
10
0
2
4
6
8
0
2
4
6
8
0
2
4
6
Mar 29, 2010
8
0
2
4
6
8
2
4
6
8
2
4
6
8
2
4
6
8
Chi-Square Probability
Here is a similar histogram from the text, showing the parent pdf (solid
Gaussian curve NPG(x)) and one histogram of 100 measurements of mean 5.
Superimposed is the spread of values in each bin for multiple sets of 100
measurements.
 Since the histogram is a frequency diagram, the value of each bin can only
have integer values—hence, we expect a Poisson distribution with mean NPG(x)
and standard deviation   NPG ( x) .

Mar 29, 2010
Chi-Square Probability

The definition of  2 is
2  
 yi  y 
2
 i2
where yi are the measurements (the bin heights in this case), y is the expected
value (the smooth Gaussian curve NPG(x) in this case), and i is the expected
standard deviation of each yi (  NPG ( x) in this case).
 You can see that in each bin you expect the yi not to stray more than about i
from y on average, so each bin should contribute about 1 to the sum.
 Thus, the sum should be about n, the
number of bins. This is almost right. In
fact, statistically the expectation value
of  2 is not n, but the number of degrees
of freedom n = n – nc, where nc is the
number of constraints.
 Often we use the reduced chi-square
n2   2 / n  1.
Mar 29, 2010
Meaning of the Chi-Square Test

Consider the plot below as some measurements given by the histogram, and
the smooth Gaussian as a fit to the data. If we shift the smooth curve, it will
obviously not fit the data as well. Then
2  
 yi  y 
2
 i2
will be much larger than n, because the deviations of each bin from the shifted
smooth curve are larger than i.
 Likewise, if we change the width, or the
amplitude of the curve, either of these
2
will also raise the value of .
 The best fit of the curve, in fact, is the one
2
that minimizes , which then should be
close to n. What is n in this case? It takes
three parameters to define the Gaussian,
so n = n – nc = 6 – 3 = 3.
Mar 29, 2010
Chapter 6—Least Squares Fit to a
Straight Line
There are many situations where we can measure one quantity (the
dependent variable) with respect to another quantity (the independent
variable). For instance, we might measure the position of a car vs. time,
where the position is the dependent variable and time the independent
variable.
 If the velocity is constant, we expect a straight line x(t )  xo  vot. Let us
generically call the dependent variable y
for this discussion, and the independent
variable x. Then we can write such a
linear relationship as y ( x)  a  bx, where
a and b are constants.
 Here is a plot of points with noise, showing
a linear relationship, and a straight line that
goes through the points.

Points and Fit
9
8
data points
Fit
7
y
6
5
4
3
2
0
0.5
1
1.5
x
Mar 29, 2010
2
2.5
Least Squares Fit to a Straight Line

Here are several plots with lines through the points. Which one do you
think is the best fit?
Points and Fit
Points and Fit
Points and Fit
10
9
9
8
8
9
data points
Fit
8
7
7
6
6
y
6
y
y
7
5
5
4
4
3
3
5
4
3
2
0
0.5
1
1.5
x
2
2.5
2
0
0.5
1
1.5
2
2.5
2
0
0.5
1
1.5
2
x
x
It is surprisingly easy to see by eye which one fits best, but what does your
brain do to determine this?
 It is minimizing 2!
 Let’s go through the problem analytically.

Mar 29, 2010
2.5
Minimizing Chi-Square

We start with a smooth line of the form
y ( x)  a  bx
which is the “curve” we want to fit to the data. The chi-square for this
situation is
2
2




y

y
(
x
)
1
2   i
     yi  a  bx  

i


 i


To minimize any function, you know that you should take the derivative
and set it to zero. But take the derivative with respect to what?
2
Obviously, we want to find constants a and b that minimize  , so we will
form two equations:
2
1

 1

 2 
     yi  a  bxi    2  2  yi  a  bxi    0,
a
a   i

 i

2
1

x

 2 
     yi  a  bxi    2  i2  yi  a  bxi    0.
b
b   i

 i

Mar 29, 2010
Minimizing Chi-Square

Now we can rearrange these two equations to obtain two equations in two
unknowns (a and b):
yi
xi
1

a

b
 2  2  2,
i
i
xi yi


2
i
 a
i
xi

2
i
 b
xi2

2
i
.
You can solve this set of simultaneous equations any way you wish. One
way is to use Cramer’s Rule of matrix theory, which says
z1  ax1  by1 ,
z2  ax2  by2 .
or
 z1 
 x1 
 y1 

a

b
z 
x 
y 
 2
 2
 2
has solution
a
z1
y1
z2
x1
y2
y1
x2
y2
and b 
x1
z1
x2
x1
z2
.
y1
x2
y2
Ratios of determinants.
Mar 29, 2010
Linear Regression

The solution, then, is

1 
a




1 
b



yi
xi
2
i
2
i
xi yi
xi2
2
i
2
i
1
yi
2
i
xi
2
i
xi yi
2
i
where
2
i
xi
2
i

2
i
 

 
1
xi2
y
x
xy 
1
   2  i2   i2  i 2i 
  i
i
i
i 
xi
2
i
xi2
2
i

xi yi
xi
yi 
1
1

  2  2  2 ,
   i2
i
i
i 
2

x 
  2  2    i2  .
i
i  i 
1
xi2
Note that if the errors are all equal (i.e.  i   ), then when you take the ratio
of these determinants the errors cancel and we get simpler expressions
y x
x y x
1 N
y
b
  x  x y
a
1

i
i
i
2
i
i
i
i
i
i


1
xi2  yi   xi  xi yi 



1
 N  xi yi   xi  yi  ,

 
N
x
i
x
x
i
2
i
 N  xi2    xi  .
Mar 29, 2010
2