Download Stat exam - IIT Delhi

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Department of Civil Engineering-I.I.T. Delhi
CEL 899: Environmental Risk Assessment
Statistics and Probability Example Part 1
Note: Assume missing data (if any) and mention the same.
Q1. Suppose X has a normal distribution defined as N (mean=5, variance=22) (note the notation used here),
answer the following
(i)
Calculate mean, median, standard deviation, variance
(ii)
Calculate P(X=3), P(X<10)?
(iii)
(iv)
Calculate P (1<X<8), i.e., probability of X obtaining between 1 and 8? Hint: P(X1<X<X2) =
P(X<X2)-P(X<X1)?
(v)
Calculate 5th , 50th , 90th, 95th, and 99th percentile values. Also calculate 90% confidence interval
value (= 95th percentile value – 5th percentile)?
Answer:
(i) Mean = 5; Standard deviation =2; Variance =22 = 4 (Answer)
At median value , 50% of X values are lower than this value and 50% of X values are higher than this.
So, P(X< Median) = P(X> Median) =0.50
So if we determine 50th percentile value, it will be median value for random variable X with N (mean=5,
variance=22).
Given that P(X<median) = 0.5
or P([(X-5)/2]< [(median-5)/2]) =0.5
here Z=[(X-5)/2] and Z* =[(median-5)/2]
So we have P(Z<Z*)=0.5 where Z has a standard normal distribution with mean 0 and standard deviation equal to
1. Now look at your standard normal distribution table to find Ф (Z*)= P(Z<Z*) = 0.5. The table gives Z*= 0
i.e., Z* =[(median-5)/2]=0 => median = 5 (Answer)
Note: For normal distribution, mean = median. But I showed you this approach for calculating any percentile
value. Here we did the calculation for 50th percentile value (i.e, median). Similarly, you can calculate all 5th , 50th,
95th percentile values.
(ii) P(X=3) =?
Put X=3 in the following formula of probability function for normal distribution.
2

X −µ 
exp − 0.5


σ 2π
 σ  

2

1
 3 − 5  

f (3;5,2) =
exp − 0.5


2 2π
 2   = (1/5.01) * exp(-0.5) = (1/5.01)*0.6065

f ( x, µ , σ ) =
=>
1
=> P(X=3) = f(3; 5,2) = 0.1211 (Answer)
Now, P(X<10)=?
First get Z value which has a standard normal distribution with mean 0 and standard deviation equal to 1.
or P([(X-5)/2]< [(10-5)/2]) =?
here Z=[(X-5)/2] and Z* =[(10-5)/2] = 2.5
So we have P(Z<2.5)=?
1
Now look at your standard normal distribution table to find Ф (2.5) as Z*=2.5 is calculated and known here.
=> P(Z<Z*) = 0.9938
So, P(X<10) = P(Z<Z*) =0.9938 (Answer)
(iii) P(1<X<8) = P(X<8)-P(X<1)
First get Z value which has a standard normal distribution with mean 0 and standard deviation equal to 1.
P(X<8)= P([(X-5)/2]< [(8-5)/2]) = P(Z < 1.5] = Ф (1.5) =0.9332
P(X<1)= P([(X-5)/2]< [(1-5)/2]) = P(Z < -2.0] = Ф (-2.0) (or P(Z<-2))
Here, remember the formula: P(Z<Z*)=P(Z>-Z*) (because of symmetry of normal distribution about mean).
so, P(Z<-2) = P(Z>2)=1-P(Z<2)
So, Ф (-2.0) =1- Ф (2.0) =1- 0.9773 = 0.0227
So, P(1<X<8) = P(X<8)-P(X<1) = 0.9332-0.0227 = 0.9105 => (i.e., 0.9105*100 = 91.05% of times X will lie
between 1 and 8). (answer)
Note: Here 91.05% confidence interval is given by (1, 8).
(iv) Calculate all percentile values as we calculated for 50th percentile values in part (i).
Say 5th percentile value = X*
i.e., P(X<X*)=0.05
P(X<X*)= P([(X-5)/2]< [(X*-5)/2])
= P(Z < Z*) = 0.05 [here, Z*= (X*-5)/2]
As P (Z<Z*) = 1-P(Z>Z*)
0.05=1-P(Z>Z*)
P(Z>Z*)=0.95
Now, as P(Z<Z*)=P(Z>-Z*) (because of symmetry of normal distribution about mean), it means that P(Z>Z*)=
P(Z<-Z*).
As we know from table that for P(Z>Z*)=0.95= P(Z<-Z*)
Here, Ф (Z)= 0.95 happens for Z lying between 1.64 and 1.65. So using linear interpolation, Z comes out to be
1.645.
So, it means that –Z* = 1.645
Z*=-1.645
Now as Z*= (X*-5)/2 = -1.645
X* = 2*(-1.645)+5 = 1.71
So 5th percentile value = 1.71
Q2. Look at the following failure data:
Failure no.
1
2
3
4
5
6
7
8
9
10
Operating time 40
98
165
235
312
428
547 720 925 1340
(days) (say X)
Using the operating time data, answer the following
(i) Calculate mean, median, standard deviation, variance, minimum, maximum operating time values
(ii) Develop a frequency histogram for operating time before failure (random variable X) using the
binning approach (called as probability mass function, if X is discrete or probability density function
if X is continuous). Develop this histogram for both cases.
(iii) Develop cumulative distribution functions for two cases: For X as a discrete random variable) and
for X as a continuous variable. Plot.
(iv) Using the cumulative distribution function for X as a discrete, calculate P(X=120 days), P(X< 120
days). Repeat this calculation for the case when X is a continuous variable.
(v) Calculate operating time without failure with 95% confidence (i.e., calculate 95th percentile value)
using the developed cumulative density function when X is a continuous variable case. Also calculate
90% confidence interval value (= 95th percentile value – 5th percentile)? Note: 90% confidence
interval value indicates that your operating time lies between these two ends 90% of the observation
times.
2
Answer:
(i) First arrange the data in increasing order.
Arranged data: X (in days): 40, 98, 165, 235, 312, 428, 547, 720, 925, 1340 (total N=10)
Minimum = 40 days, maximum = 1340 days
Mean =
 1 
X =

 N = 10 
N =10
∑x
i =(1/10) *[40+98+165+235+312+428+547+720+925+1340]
i =1
=> mean = 481 days
As N is even number here, median = average of two middle terms = ½ (5th term + 6th term)
Median =1/2*(312+428)=370 days
Standard deviation (σ)
N =10
σ=
∑
i =1
σ=
(X
)
2
i −X
N −1
[
1
(40 − 481)2 + (98 − 481)2 + ...(1340 − 481)2
10 − 1
Table 1.
Operating
40
time (days)
(say X)
(X-mean)2
194481
]
98
165
235
312
428
547
720
925
1340
146689
99856
60516
28561
2809
4356 57121 197136 737881
Total sum of different (X-mean)2 terms = 1529406
So, standard deviation (σ) = (1529406/9)0.5 = (169934)0.5 = 412.2 days
Variance = σ2 = (412.2)2 = 169909 days2
(ii) Table 2.
Failure no.
1
2
3
4
5
6
7
8
9
10
total sum
operating
time (days)
40
98
165
235
312
428
547
720
925
1340
4810
frequency (X/total sum) F(X)= P(X<x)
[i.e. P(X=x))
40/4810
40/4810
98/4810
138/4810
165/4810
303/4810
235/4810
538/4810
312/4810
850/4810
428/4810
1278/4810
547/4810
1825/4810
720/4810
2545/4810
925/4810
3470/4810
1340/4810
1.0
3
Probability (or frequency) histogram
1.00
P(X=x)
0.80
0.60
0.40
0.20
0.00
40
98
165
235
312
428
547
720
925
1340
X (operating time, days)
Fig.1 Probability (or frequency) histogram (see attached spreadsheet) (X=discrete; thus points are not
connected here)
Probability (or frequency) histogram (X=continuous)
1.00
P(X=x)
0.80
0.60
0.40
0.20
0.00
40
98
165
235
312
428
547
720
925
1340
X (operating time, days)
Fig.2 Probability (or frequency) plot (see attached spreadsheet) (X=continuous; thus points are connected
here)
(iii)
Cumulative Probability Histogram
1.00
P(X=x)
0.80
0.60
0.40
0.20
0.00
40
98
165
235
312
428
547
720
925
1340
X (operating time, days)
Fig.3 Cumulative Probability Histogram (see attached spreadsheet) (X=discrete; thus points are not
connected here)
4
Cumulative Probability Histogram (X=continuous)
1.00
P(X=x)
0.80
0.60
0.40
0.20
0.00
40
98
165
235
312
428
547
720
925
1340
X (operating time, days)
Fig.4 Cumulative Probability Plot (see attached spreadsheet) (X=continuous; thus points are connected
here)
(iv) For X: discrete variable.
P(X=120 days)=?
Refer Table 2. As X is discrete and as per the data, it does not have any value at X=120,
so P( X=120 days) =0 (answer)
Now, P(X< 120 days)= P(X=98) +P(X=40) (as X is smaller than 120 days)
= (98/4810)+(40/4810)=138/4810 (answer)
For X: continuous variable.
P(X=120 days)=?
Refer Table 2. As X is continuous variable, it can take any value including X=120. So X=120 lies between 98 days
and 165 days. Do linear interpolation to determine frequency of getting X=120 days (or P(X=120 days)):
(120-98)/(165-98) = (P(X)-98/4810)/(165/4810-98/4810)=> so, P(X=120) = 120/4810 (answer)
Now, P(X< 120 days)= ?
Refer Table 2. As X is continuous variable, assume linearity holds for F(x) function as well (see Figure 4). So
F(X<=120) lies between F(X<=165) and F(X<=98). Do linear interpolation to determine cumulative probability of
getting operating time < = 120 days:
(120-98)/(165-98) = (F(120)-138/4810)/(303/4810-138/4810)
(22)/(67) = (F(120)-138/4810)/(165/4810)
=> (22/67) *(165/4810) = F-(138/4810)
=> F (X=120)= P(X<=120) = 192.15/4810 (answer)
(v) From table 2:
Failure no.
1
2
3
4
5
6
7
8
9
10
Table 3.
operating time (days)
40
98
165
235
312
428
547
720
925
1340
F(X)= P(X<x)
40/4810 = 0.0083
138/4810 = 0.0287
303/4810=0.0629
538/4810=0.1119
850/4810=0.1767
1278/4810=0.2657
1825/4810=0.3794
2545/4810=0.5291
3470/4810=0.7214
1.0=1.0
5
Assuming X as a continuous variable and its cumulative probability density function is given by F(X) as per Table
3, first determine 5th percentile value and 95th percentile and then calculate 90% confidence interval.
5th percentile value = that X* for which F(X*)=0.05
From table 3, it lies between F(98) =0.0287 and F(165)=0.0629. Say after linear interpolation, 0.05 value of F(X*)
comes for X* = 130 days (assumed for illustration; calculate in homework and exam).
Similarly, F(X**) =0.95 comes out to be 1200 days (say) (again assumed; you should
calculate it).
So 90% confidence interval value for operating time before failure (days)
= 95th percentile value -5th percentile value
=1200 days-130 days = 1070 days (answer)
90% confidence range: (5th percentile value, 95th percentile value) = (130 days, 1200 days)
6
Related documents