Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Department of Civil Engineering-I.I.T. Delhi CEL 899: Environmental Risk Assessment Statistics and Probability Example Part 1 Note: Assume missing data (if any) and mention the same. Q1. Suppose X has a normal distribution defined as N (mean=5, variance=22) (note the notation used here), answer the following (i) Calculate mean, median, standard deviation, variance (ii) Calculate P(X=3), P(X<10)? (iii) (iv) Calculate P (1<X<8), i.e., probability of X obtaining between 1 and 8? Hint: P(X1<X<X2) = P(X<X2)-P(X<X1)? (v) Calculate 5th , 50th , 90th, 95th, and 99th percentile values. Also calculate 90% confidence interval value (= 95th percentile value – 5th percentile)? Answer: (i) Mean = 5; Standard deviation =2; Variance =22 = 4 (Answer) At median value , 50% of X values are lower than this value and 50% of X values are higher than this. So, P(X< Median) = P(X> Median) =0.50 So if we determine 50th percentile value, it will be median value for random variable X with N (mean=5, variance=22). Given that P(X<median) = 0.5 or P([(X-5)/2]< [(median-5)/2]) =0.5 here Z=[(X-5)/2] and Z* =[(median-5)/2] So we have P(Z<Z*)=0.5 where Z has a standard normal distribution with mean 0 and standard deviation equal to 1. Now look at your standard normal distribution table to find Ф (Z*)= P(Z<Z*) = 0.5. The table gives Z*= 0 i.e., Z* =[(median-5)/2]=0 => median = 5 (Answer) Note: For normal distribution, mean = median. But I showed you this approach for calculating any percentile value. Here we did the calculation for 50th percentile value (i.e, median). Similarly, you can calculate all 5th , 50th, 95th percentile values. (ii) P(X=3) =? Put X=3 in the following formula of probability function for normal distribution. 2 X −µ exp − 0.5 σ 2π σ 2 1 3 − 5 f (3;5,2) = exp − 0.5 2 2π 2 = (1/5.01) * exp(-0.5) = (1/5.01)*0.6065 f ( x, µ , σ ) = => 1 => P(X=3) = f(3; 5,2) = 0.1211 (Answer) Now, P(X<10)=? First get Z value which has a standard normal distribution with mean 0 and standard deviation equal to 1. or P([(X-5)/2]< [(10-5)/2]) =? here Z=[(X-5)/2] and Z* =[(10-5)/2] = 2.5 So we have P(Z<2.5)=? 1 Now look at your standard normal distribution table to find Ф (2.5) as Z*=2.5 is calculated and known here. => P(Z<Z*) = 0.9938 So, P(X<10) = P(Z<Z*) =0.9938 (Answer) (iii) P(1<X<8) = P(X<8)-P(X<1) First get Z value which has a standard normal distribution with mean 0 and standard deviation equal to 1. P(X<8)= P([(X-5)/2]< [(8-5)/2]) = P(Z < 1.5] = Ф (1.5) =0.9332 P(X<1)= P([(X-5)/2]< [(1-5)/2]) = P(Z < -2.0] = Ф (-2.0) (or P(Z<-2)) Here, remember the formula: P(Z<Z*)=P(Z>-Z*) (because of symmetry of normal distribution about mean). so, P(Z<-2) = P(Z>2)=1-P(Z<2) So, Ф (-2.0) =1- Ф (2.0) =1- 0.9773 = 0.0227 So, P(1<X<8) = P(X<8)-P(X<1) = 0.9332-0.0227 = 0.9105 => (i.e., 0.9105*100 = 91.05% of times X will lie between 1 and 8). (answer) Note: Here 91.05% confidence interval is given by (1, 8). (iv) Calculate all percentile values as we calculated for 50th percentile values in part (i). Say 5th percentile value = X* i.e., P(X<X*)=0.05 P(X<X*)= P([(X-5)/2]< [(X*-5)/2]) = P(Z < Z*) = 0.05 [here, Z*= (X*-5)/2] As P (Z<Z*) = 1-P(Z>Z*) 0.05=1-P(Z>Z*) P(Z>Z*)=0.95 Now, as P(Z<Z*)=P(Z>-Z*) (because of symmetry of normal distribution about mean), it means that P(Z>Z*)= P(Z<-Z*). As we know from table that for P(Z>Z*)=0.95= P(Z<-Z*) Here, Ф (Z)= 0.95 happens for Z lying between 1.64 and 1.65. So using linear interpolation, Z comes out to be 1.645. So, it means that –Z* = 1.645 Z*=-1.645 Now as Z*= (X*-5)/2 = -1.645 X* = 2*(-1.645)+5 = 1.71 So 5th percentile value = 1.71 Q2. Look at the following failure data: Failure no. 1 2 3 4 5 6 7 8 9 10 Operating time 40 98 165 235 312 428 547 720 925 1340 (days) (say X) Using the operating time data, answer the following (i) Calculate mean, median, standard deviation, variance, minimum, maximum operating time values (ii) Develop a frequency histogram for operating time before failure (random variable X) using the binning approach (called as probability mass function, if X is discrete or probability density function if X is continuous). Develop this histogram for both cases. (iii) Develop cumulative distribution functions for two cases: For X as a discrete random variable) and for X as a continuous variable. Plot. (iv) Using the cumulative distribution function for X as a discrete, calculate P(X=120 days), P(X< 120 days). Repeat this calculation for the case when X is a continuous variable. (v) Calculate operating time without failure with 95% confidence (i.e., calculate 95th percentile value) using the developed cumulative density function when X is a continuous variable case. Also calculate 90% confidence interval value (= 95th percentile value – 5th percentile)? Note: 90% confidence interval value indicates that your operating time lies between these two ends 90% of the observation times. 2 Answer: (i) First arrange the data in increasing order. Arranged data: X (in days): 40, 98, 165, 235, 312, 428, 547, 720, 925, 1340 (total N=10) Minimum = 40 days, maximum = 1340 days Mean = 1 X = N = 10 N =10 ∑x i =(1/10) *[40+98+165+235+312+428+547+720+925+1340] i =1 => mean = 481 days As N is even number here, median = average of two middle terms = ½ (5th term + 6th term) Median =1/2*(312+428)=370 days Standard deviation (σ) N =10 σ= ∑ i =1 σ= (X ) 2 i −X N −1 [ 1 (40 − 481)2 + (98 − 481)2 + ...(1340 − 481)2 10 − 1 Table 1. Operating 40 time (days) (say X) (X-mean)2 194481 ] 98 165 235 312 428 547 720 925 1340 146689 99856 60516 28561 2809 4356 57121 197136 737881 Total sum of different (X-mean)2 terms = 1529406 So, standard deviation (σ) = (1529406/9)0.5 = (169934)0.5 = 412.2 days Variance = σ2 = (412.2)2 = 169909 days2 (ii) Table 2. Failure no. 1 2 3 4 5 6 7 8 9 10 total sum operating time (days) 40 98 165 235 312 428 547 720 925 1340 4810 frequency (X/total sum) F(X)= P(X<x) [i.e. P(X=x)) 40/4810 40/4810 98/4810 138/4810 165/4810 303/4810 235/4810 538/4810 312/4810 850/4810 428/4810 1278/4810 547/4810 1825/4810 720/4810 2545/4810 925/4810 3470/4810 1340/4810 1.0 3 Probability (or frequency) histogram 1.00 P(X=x) 0.80 0.60 0.40 0.20 0.00 40 98 165 235 312 428 547 720 925 1340 X (operating time, days) Fig.1 Probability (or frequency) histogram (see attached spreadsheet) (X=discrete; thus points are not connected here) Probability (or frequency) histogram (X=continuous) 1.00 P(X=x) 0.80 0.60 0.40 0.20 0.00 40 98 165 235 312 428 547 720 925 1340 X (operating time, days) Fig.2 Probability (or frequency) plot (see attached spreadsheet) (X=continuous; thus points are connected here) (iii) Cumulative Probability Histogram 1.00 P(X=x) 0.80 0.60 0.40 0.20 0.00 40 98 165 235 312 428 547 720 925 1340 X (operating time, days) Fig.3 Cumulative Probability Histogram (see attached spreadsheet) (X=discrete; thus points are not connected here) 4 Cumulative Probability Histogram (X=continuous) 1.00 P(X=x) 0.80 0.60 0.40 0.20 0.00 40 98 165 235 312 428 547 720 925 1340 X (operating time, days) Fig.4 Cumulative Probability Plot (see attached spreadsheet) (X=continuous; thus points are connected here) (iv) For X: discrete variable. P(X=120 days)=? Refer Table 2. As X is discrete and as per the data, it does not have any value at X=120, so P( X=120 days) =0 (answer) Now, P(X< 120 days)= P(X=98) +P(X=40) (as X is smaller than 120 days) = (98/4810)+(40/4810)=138/4810 (answer) For X: continuous variable. P(X=120 days)=? Refer Table 2. As X is continuous variable, it can take any value including X=120. So X=120 lies between 98 days and 165 days. Do linear interpolation to determine frequency of getting X=120 days (or P(X=120 days)): (120-98)/(165-98) = (P(X)-98/4810)/(165/4810-98/4810)=> so, P(X=120) = 120/4810 (answer) Now, P(X< 120 days)= ? Refer Table 2. As X is continuous variable, assume linearity holds for F(x) function as well (see Figure 4). So F(X<=120) lies between F(X<=165) and F(X<=98). Do linear interpolation to determine cumulative probability of getting operating time < = 120 days: (120-98)/(165-98) = (F(120)-138/4810)/(303/4810-138/4810) (22)/(67) = (F(120)-138/4810)/(165/4810) => (22/67) *(165/4810) = F-(138/4810) => F (X=120)= P(X<=120) = 192.15/4810 (answer) (v) From table 2: Failure no. 1 2 3 4 5 6 7 8 9 10 Table 3. operating time (days) 40 98 165 235 312 428 547 720 925 1340 F(X)= P(X<x) 40/4810 = 0.0083 138/4810 = 0.0287 303/4810=0.0629 538/4810=0.1119 850/4810=0.1767 1278/4810=0.2657 1825/4810=0.3794 2545/4810=0.5291 3470/4810=0.7214 1.0=1.0 5 Assuming X as a continuous variable and its cumulative probability density function is given by F(X) as per Table 3, first determine 5th percentile value and 95th percentile and then calculate 90% confidence interval. 5th percentile value = that X* for which F(X*)=0.05 From table 3, it lies between F(98) =0.0287 and F(165)=0.0629. Say after linear interpolation, 0.05 value of F(X*) comes for X* = 130 days (assumed for illustration; calculate in homework and exam). Similarly, F(X**) =0.95 comes out to be 1200 days (say) (again assumed; you should calculate it). So 90% confidence interval value for operating time before failure (days) = 95th percentile value -5th percentile value =1200 days-130 days = 1070 days (answer) 90% confidence range: (5th percentile value, 95th percentile value) = (130 days, 1200 days) 6