Download s 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
Statistical uncertainty in calculation
and measurement of radiation
• “Error of mean” and its application
• Binominal distribution, Poisson
distribution and Gauss distribution
• Least square method
1.1 Variance and standard deviation
Repeat measurement or calculate a quantity x by n times.
Write value of i-th x as xi . The mean is
𝑛
cm, cGy
1
𝑥=
𝑥𝑖
𝑛
(Example of unit)
𝑖=1
Variance s2: A quantity to express fluctuation of each x.
Mean of square of difference between individual x and
mean 𝑥 ,
𝑛
2, cGy2
cm
1
𝑠2 =
𝑥𝑖 − 𝑥 2
𝑛
𝑖=1
Standard deviation s: Square root of variance. Useful
to express fluctuation of x because s and x are same dimension.
𝑠=
1
𝑛
𝑛
𝑥𝑖 − 𝑥
𝑖=1
2
cm, cGy
Practice of mean and variance
• Assume 5 carrots of 6, 7, 8, 9, 10 cm length.
• What is a mean of carrot length?
– 8 cm
• What is a variance of carrot length?
–2
• What is a standard deviation of carrot length?
– 1.41 cm
Instinctive explanation of error of sum and error of mean
• Measure 100 cpm by 1 min 100±10
– N=100 x 1 min =100, σ=1001/2=10
• 100 cpm by 1 min, repeat 4 times 400±20
– N=100 cpm x 1 min x 4 =400,
– σ=4001/2 = 20
– This ”20” can also be obtained by 10 x 41/2 .
Error of sum of xi (sy) is Error of xi times n1/2.
• Go back to “per min” by dividing by 4, 100±5
– This ”5” can also be obtained by 10/41/2 .
Error of average of xi (sx_bar) is Error of xi times 1/n1/2.
Practice of error of sum and mean
• Assume 5 carrots of 6, 7, 8, 9, 10 cm length.
• What is a total length of carrot?
– 40 cm
• What is a error of total length of carrots?
– 3.16 cm
• What is a mean length of carrots?
– 8 cm
• What is an error of mean length of carrots?
– 0.63 cm
1.2 Error of sum (y)
The variance of y is obtained from standard deviation of xi, Δxi
(Propagation of error)
Here, variance of y sy2 is written as Δy2 . Standard deviation of y,
sy is square root of sy2 .
The error of n - xi ,sy , is “error of xi “ times n1/2 .
1.3 Error of average of xi, xav
The variance of xav is obtained from standard deviation of xi, Δxi.
(Propagation of error)
Here, variance of xav, sxav2 as Δxav2. Standard deviation of xav ,
sxav is square root of sxav2 .
Error of average of n-xi , sx_bar, is error of xi times 1/n1/2.
Central limit theorem
• Assume any distribution of average μ and
variance σ2. Sample is taken from that
distribution by n times. The distribution of
average of the sample xav converges to
normal distribution N(μ, σ2/n) when n is
large.
Basic statistics by K.Miyagawa
(in Japanese)
• Mathematical confirmation of Monte Carlo method
1
100
80
Error of sum, y, sy
0.8
Standard deviation, s
sqrt(na)
sqrt(na(1-a))
sqrt(a)
sqrt(a(1-a))
0.6
0.4
0.2
0
n=10000
5000+-71
60
5000+-50
40
20
0
0.2
0.8
0.6
0.4
Hit probability, a
Fig. 3 Standard deviation of x
1
0
0
0.2
0.6
0.4
Hit probability, a
Fig.4 Error of sum, y, sy
0.8
1
Really? →Let’s evaluate s2 numerically.
Numerical evaluation of variance s2
• Excel RAND() function:(0,1) random
– Average of 10 random numbers and variance s2
– Repeat 100 set to get average of s2 .
– Change number of random n=9 ~ n=1 and repeat.
Result of s2 from excel calculation
0.14
s2
0.12
Value of variance
0.1
0.08
0.06
0.04
0.02
0
k00928
0
2
4
6
8
n
10
Certainly, s2 depends on n !
A variance independent of sample number n, s2
• Average of square of difference of each value
and expected value of average μ(=0.5).
• Numerical calculation by Excel
– Calculate variance of 10 random numbers s2 .
– Repeat 100 sets to get average of s2.
– Change n=9 ~ n=1 and repeat.
s2
↓
Result of s2 by Excel calculation
0.14
s2
Value of variance
0.12
s2
s2 is independent of n !
0.1
0.08
0.06
0.04
0.02
0
k00928
0
2
4
6
8
n
10
We can not use s2 as m is unknown in general.
→ We are in trouble !!
For now, what is an expected value of s2 ?
0.14
2
Then, how about
s
0.12
s2 ?
s
1/12
(n-1)/n * (1/12)
2
E(s2)=E(s2) x (n-1)/n
Value of variance
0.1
0.08
0.06
0.04
0.02
0
k00928
0
2
4
6
8
n
10
Variance which is independent of sample number n (2)
We change 1/n to 1/(n-1) in the
equation s2 to get variance which is
independent of sample number n.
0.14
s2
s2
(n-1)/n * (1/12)
1/12
s2*n/(n-1)
0.12
This is called as sampling
variance or unbiased variance in
some textbook.
Value of variance
0.1
0.08
0.06
0.04
0.02
0
k00928
0
2
4
6
8
n
Fig.1 Comparison of variance
10
Proof of unbiased variance
We rewrite formula of variance in which expected value of average, μ, is included.
Rewrite this as a relation of expectation value. (3rd term in right-hand-side is 0)
Variance against μ
= Variance against xav + Variance OF xav
Summary of Chapter 1 “Error of average”
Variance which is independent of sample number (*)
Definition of variance
Equivalent of estimate error of N±N1/2(*)
Variance of average
*Derived from error of average
t distribution
• Variable in central limit theorem is rewritten.
𝑥−𝜇
– The distribution of 𝑧 = 𝜎
converges to standard
𝑛
normal distribution N(0,1) when n is large.
– If σ is known, the interval of μ can be estimated by
z.
– If σ is unknown, its estimated value 𝜎 is used,
1
𝑛−1
–𝜎=
𝑥−𝜇
–𝑡=𝜎
𝑛
𝑛
𝑖=1
𝑥𝑖 − 𝑥
2
Shape of t distribution
t distribution with
degree of freedom=20
t distribution with
degree of freedom=2
Normal distribution
(=t distribution with
d.f.=∞)
t distribution with
degree of freedom=5
From Miyagawa, “Elementary statitics”
The nature of t distribution
1. The shape of t distribution is symmetric in left and right with
0 as its centroid. Thus, its mean is 0.
2. The shape of t distribution resembles to that of standard
normal distribution. But t distribution has lower at its top and
wider in the side. (t distribution converges to normal distribution
as n→∞)
- The reason of fluctuation in Z is only x bar. On the other
hand, fluctuation of
sigma hat also contributes to fluctuation of t.
3. The shape of t distribution depends on n only and does not
depend on
any unknown parameter of population.
→ Numerical table of t distribution was published.
Coffee Break
Student’s t distribution (from wikipedia)
•
•
•
13Jun1876-16Oct1937
Gosset attended Winchester College before reading chemistry and mathematics at
New College, Oxford. Upon graduating in 1899, he joined the brewery of Arthur
Guinness & Son in Dublin, Ireland.
As an employee of Guinness, a progressive agro-chemical business, Gosset
applied his statistical knowledge — both in the brewery and on the farm — to the
selection of the best yielding varieties of barley. Gosset acquired that knowledge by
study, by trial and error, and by spending two terms in 1906 – 07 in the biometrical
laboratory of Karl Pearson. Gosset and Pearson had a good relationship. Pearson
helped Gosset with the mathematics of his papers, including the 1908 papers, but
had little appreciation of their importance. The papers addressed the brewer's
concern with small samples; biometricians like Pearson, on the other hand,
typically had hundreds of observations and saw no urgency in developing smallsample methods.
Another researcher at Guinness had previously published a paper containing trade
secrets of the Guinness brewery. To prevent further disclosure of confidential
information, Guinness prohibited its employees from publishing any papers
regardless of the contained information. However, after pleading with the brewery
and explaining that his mathematical and philosophical conclusions were of no
possible practical use to competing brewers, he was allowed to publish them, but
under a pseudonym ("Student"), to avoid difficulties with the rest of the staff.[1] Thus
his most noteworthy achievement is now called Student's, rather than Gosset's, tdistribution.
Appendix B.1 Writing of “Error of average”
・Elementary Statistics:Standard error of mean
・Measurement and detection of radiation (Tsoulfanidis):
Standard error of mean value
・Radiation detection & measurement (Knoll):No description
・Nuclear Radiation Detection (Price):No description
・Homepage of statistics bureau:Standard error
・Wikipedia : Standard error
・Kaleidagraph:Standard error
How words can be omitted for “Standard error of sample mean”
Sample
Average
Standard
Error
(Void)
(Void)
(Void)
Uncertainty
Search result of words similar to “Error of average”
Section 2
Binomial, Poisson, and Gaussian
distribution
• Outline of distribution
• Sum, Mean, and Variance
• Relation of distributions
Generation of Binomial
distribution by experiment
• Prepare ten samples which have 2 states with same
probability. For example, prepare 10 coins and put mark (i.e.
removable seal) on one side.
• Take several of these and align on the table to count number
of samples with mark (call it as p. Also call number of sample
without mark as q)
– For n=1, p is either 1 or 0.
– For n=2, the number of combination is 4 i.e. pp, pq, qp, qq . The
combination for p=2,1,0 is 1,2,1.
– For n=3, the number of combination is 8, i.e. ppp, ppq, pqp, pqq, qpp,
qpq, qqp, qqq . The combination for p=3,2,1,0 is 1,3,3,1 respectively.
– Investigate this up to n=8 (or n=10 if possible).
Generation of Binomial
distribution by experiment (2)
• Throw samples on the table 10 times and count number of
sample with mark.
– Start from n=1 and continue up to n=10.
– Compare the distribution with that obtained in the previous page.
• Align experiment is a kind of round-robin. On the other hand
throwing is sampling.
• Any combination in align method should appear in sampling
with same probability. Then, both distribution should agree
within a statistical fluctuation.
Calculate Eq.(17) for p=0.5,
n=1 to 10. Compare it with
binomial distribution by experiment.
Variance of x V(x)
0.3
f
n=10, p=0.4
n=20, p=0.2
n=40, p=0.1
Poisson, =4
0.25
Binomial distribution converges to
Poisson distribution
as hit probability decreases
0.2
0.15
0.1
0.05
0
0
2
4
6
x
8
10
Fig. 8 Comparison of binomial and Poisson distribution
0.16
Gauss,m=20 s=10^0.5
Gauss,m=20 s=20^0.5
f
Binomial,n=40,p=0.5
Poisson =20
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0
5
10
15
20
25
30
35
x
Fig.9 Approximation of binomial and Poisson distribution by Gaussian distribution
40
Section 3 Least Square Method
Concept of Least Square Method:
Determine a and b to minimize d12+d22+d32 (Fig.10)
8
y
(x3,y3)
7
d3
6
5
y'=a+b(x-x_av)
4
(x1,y1)
3
d2
d1
2
(x2,y2)
1
0
0
1
2
3
4
5
6
7
x
8