Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Validity and application of some continuous distributions Dr. Md. Monsur Rahman Professor Department of Statistics University of Rajshahi Rajshshi – 6205 E-mail: [email protected] 1 Normal distribution The first discoverer of the normal probability function was Abraham De Moivre(1667-1754), who, in 1733, derived the distribution as the limiting form of the binomial distribution. But the same formula was derived by Karl Freidrich Gauss(1777-1855) in connection with his work in evaluating errors of observation in astronomy This is why the normal probability is often referred to as Gaussian distribution. 2 X: Normal Variate Density: f ( x) 1 2 exp[ ( 1 2 x 2 ) ], x , , 0 2 E ( X ) ,Var( X ) Standard Normal Variate : Z 2 X X Z 3 Normal distribution 4 Properties of Normal distribution Normal probability curve is symmetrical about the ordinate at x Mean, median and mode of the distribution are equal and each of these is The curve has its points of inflection at x By a point of infection, we mean a point at which the concavity changes All odd order moments of the distribution about the mean vanish The values of respectively 1 and 2 are 0 and 3 5 includes about 68.27% of the population 2 includes about 95.45% of the population 3 includes about 99.73% of the population Application: Many biological characteristics conform to a Normal distribution - for example, heights of adult men and women, blood pressures in a healthy population, RBS levels in blood etc. 6 Validity of Normal Distribution for a set of data Many statistical methods can only be used if the observations follow a Normal Distribution. There are several ways of investing whether observations follow a Normal distribution. With a large sample we can inspect a histogram to see whether it looks like a Normal distribution curve. This does not work well with a small sample, and a more reliable method is the normal plot which is described below. 7 8 X: Normal Variate Density: f ( x) 1 2 exp[ ( 1 2 x 2 ) ], x , , 0 2 E ( X ) ,Var( X ) Standard Normal Variate : Z 2 X X Z 9 CDF OF X : F(X) CDF OF Z : (z ) , P quantile of X : Xp P quantile of Z : Zp ( z ) 1 ( z ) Xp is the solution of F ( X p ) p Zp is the solution of Zp ( Z p ) p X p X p Z p 10 Dataset x1 , x2 ,..., xn • Find empirical CDF values • Arrange the data in ascending order as x(1) , x( 2) ,..., x( n ) • Empirical CDF values are as follows F ( x(i ) ) i 0.5 n , i 1,2,..., n. •Using normal table obtain the corresponding to F ( x(i ) ) z (i ) values 11 •If the given set of observations follow normal distribution, the plot (x, z) should roughly be a straight line and the line point ( ,0 ) z x and has slope •Graphical estimates of obtained. passes through the 1 and . may be •If the data are not come from Normal distribution we will get a curve of some sort. 12 Table 1 : RBS levels(mmol/L) measured in the blood of 20 medical students. Data of Bland(1995), pp. 66 2.2 3.3 3.3 3.4 3.6 3.6 3.7 3.8 3.8 3.8 3.9 4.0 4.1 4.1 4.2 4.4 4.7 4.7 4.8 5.0 Bland,M.(1995): An Introductions to Medical Statistics, second edition, ELBS with Oxford University Press. 13 14 MLE ˆ 3.92mmol / L ˆ .642mmol / L 15 16 •Goodness of Fit Test •We use here Kolmogorov-Smirnov (KS) test for the given data • KS statistic=max |CDF_FIT- CDF_EMP| • For the RBS level data we calculate KS statistic KS(cal)=0.07827 • 5% tabulated value=0.294 • Conclusion: Normal distribution fit is good for the given data 17 Results • Estimated population having RBS within the normal range (3.9 – 7.8mmol/L) is about 51% • Estimated population having RBS below the normal range is about 49% • Estimated population having RBS above the normal range is 0% 18 •Two sample case X 11 , X 12 ,..., X 1m X 21, X 22 ,..., X 2 n • Empirical CDF values of X (11) , X (12) ,..., X (1m ) are as follows: F ( X (1i ) ) i 0.5 m , i 1,2,..., m • Obtain the Z (1i ) values corresponding to F ( X (1i ) ) • Similarly Z ( 2 i ) values are obtained corresponding to F ( X ( 2i ) ) 19 • If the first set of data come from normal distribution with mean ( X 1 , Z1 ) 1 and variance 2 1 , then the plot will roughly be linear and passes through the point ( 1 ,0) with slope 1 1 . • If the second set of data come from normal distribution with mean ( X 2 , Z2 ) 2 and variance 2 2 , then the plot will roughly be linear and passes through the point (2 ,0) with slope 1 2 . 20 • Both the lines parallel indicating different means but equal variances • Both the lines coincide indicating equal means and equal variances • Both the lines pass through the same point on the X-axis indicating same means but different variances 21 Table 2 : Burning times (rounded to the nearest tenth of a minute) of two kinds of emergency flares. Data due to Freund and Walpole(1987), pp. 530 Brand A: 14.9,11.3,13.2,16.6,17.0,14.1,15.4, 13.0,16.9 Brand B: 15.2,19.8,14.7,18.3,16.2,21.2,18.9, 12.2,15.3,19.4 Freund, J.E. and Walpole, R.E.(1987): Mathematical Statistics, Fourth edition, Prentice-Hall Inc. 22 Above plot indicates that both the samples come from normal population with unequal means 23 and variances Log-normal distribution In probability theory, a log-normal distribution is a probability distribution of a random variable whose logarithm is normally distributed. If X is a random variable with a normal distribution, then Y = exp(X) has a log-normal distribution; likewise, if Y is log-normally distributed, then X = log(Y) is normally distributed. It is occasionally referred to as the Galton distribution. 24 Density: f ( x) x 1 2 exp[ ( 1 2 log x 2 ) ], 0 x , , 0 2 Mean = Variance= Median= Mode= 25 Log-normal density function f(x) x 26 Application Certain physiological measurements, such as blood pressure of adult humans (after separation on male/female subpopulations), vitamin D level in blood etc. follow lognormal distribution. Subsequently, reference ranges for measurements in healthy individuals are more accurately estimated by assuming a log-normal distribution than by assuming a symmetric distribution about the mean. 27 Table 3 : Vitamin D levels(ng/ml) measured in the blood of 26 healthy men. Data due to Bland(1995), pp. 113 14 25 30 42 54 17 26 31 43 54 20 26 31 46 63 21 26 32 48 67 22 27 35 52 83 24 Bland,M.(1995): An Introductions to Medical Statistics, Second edition, ELBS with Oxford University Press. 28 29 • MLE ˆ 3.509 ng/ml ˆ .449ng/ml 30 31 •Goodness-of-fit test • KS statistic=max |CDF_FIT- CDF_EMP| • For the vitamin D level data we calculate KS statistic KS(cal)=0.0967 • 5% tabulated value=0.274 • Conclusion: Lognormal distribution fit is good for the given vitamin D data 32 Results • Estimated population having vitamin D level within the normal range (30 – 74 ng/ml) is about 56% • Estimated population having vitamin D level below the normal range is about 40% • Estimated population having vitamin D level above the normal range is about 4% 33 Weibull Distribution Weibull distribution is used to analyze the lifetime data T: Lifetime variable • Density function t 1 f (t ) ( ) t exp[ ( ) ],0 t , , 0 • : • : Shape parameter(<1 or >1 or =1) • CDF • Scale parameter(.632 quantile) : t F (t ) 1 exp[ ( ) ] Reliability (or Survival) function: R(t ) exp[ ( ) ] t 34 •Hazard Function : t 1 h(t ) ( ) •Increasing hazard rate : h(t ) t for 1 •Decreasing hazard rate: h(t ) t for 1 •Constant hazard rate : h(t ) 1 for 1 E (T ) (1 1 ) V (T ) 2 [(1 2 ) {(1 1 )}2 ] t p :p quantile, which is the solution of F (t p ) p •Accordingly, t p [ log( 1 p)] 1 35 Exponential distribution •Weibull distribution reduces to exponential distribution when 1 Density function: f (t ) 1 exp( t ),0 t , 0 • : Scale • CDF • : parameter(.632 quantile) F (t ) 1 exp( t ) Reliability (or Survival) function: R(t ) exp( t ) 36 •Hazard Function : E (T ) Var (T ) h(t ) 1 2 t p : p quantile, which is the solution of F (t p ) p •Accordingly, t p [ log( 1 p)] 37 The red curve is the exponential density The red line is the exp. hazard function 38 Validity of Weibull distribution for a set of data From the Weibull CDF we get log[ log( 1 F (t ))] log( t ) log( ) Y A X , where • Y log[ log( 1 F (t ))] X log( t ) A log( ) • Ordered lifetimes are: • t(1) , t( 2) ,..., t( n ) Y(i ) values are obtained through the empirical CDF values as given below F (t(i ) ) i 0.5 n , i 1,2,..., n 39 • If the data follow Weibull distribution with scale parameter and shape parameter , the plot of (X,Y) will roughly be linear with slope and passes through the point (log( ),0) . • Accordingly, the graphical estimates of and may be obtained. • 40 Table 4: Specimens lives (in hours) of a electrical insulation at 200 o C temperature appear below. Data due to Nelson(1990), pp. 154 2520, 2856, 3192, 3192, 3528 Nelson,W.(1990): Accelerated Testing: Statistical Models, Test Plans, and Data Analyses, John Wiley and Sons. 41 42 • MLE of and • Log-likelihood function of and on observed data t1 , t 2 ,..., t n based ti LogL n log( ) ( 1) log( ) ( ) • MLE of and by maximizing the log-Likelihood with respect to and ti using numerical method. • Graphical estimates may be used as starting values required for the numerical method • The MLEs of and ̂ and ˆ respectively. are denoted by 43 For the insulation fluid data given in table 4 the following results (based on MLEs) are obtained: ˆ 3208.49 hours ˆ 10.61 S .E (ˆ ) 142.56 hours S .E ( ˆ ) 3.78 Estimated median life= 3099.548 hours t ˆ ˆ •ML estimate of R(t) R(t ) exp[ ( ˆ ) ] Time (hour): 3000 3500 3700 Reliability : .6124 .0807 .0107 4000 .0000311 44 Weibull versus Exponential Model •Suppose we want to test whether we accept exponential or Weibull model for a given set of data •The above test is equivalent to test whether the shape parameter of Weibull distribution is unity or not i.e. H 0 : 1 vs H1 : 1 45 •Test Procedure(LR test) •Under H0 the log-likelihood function is l0 n log( ) ti 1 which yields •Maximum of ̂ ti , MLE of 1 n . l0 is given by lˆ0 n log( ˆ ) 1ˆ ti 46 •Similarly, under H1 the maximum of the log-likelihood is given by ˆˆ ˆˆ ˆˆ ti ti ˆˆ l1 n log( ˆˆ ) ( 1) log( ˆˆ ) ( ˆˆ ) where ̂ˆ under ˆ and ˆ are the MLE s of and H1 . ˆˆ ˆ 2(l1 l0 ) follows chi-square •LR test implies distribution with 1 df. ˆˆ ˆ 2 •If 2(l1 l0 ) (1 ,1) , accept (use) exponential Model 47 ˆˆ ˆ • If 2(l1 l0 ) 2 (1 ,1) , accept (use) Weibull model • For the insulation fluid data given in table 4 ˆˆ ˆ 2(l1 l0 ) 2(36.1877 45.1293) 17.87 (.95,1) 3.34 2 Conclusion: Weibull model may be accepted at 5% level of significance 48 Accelerated Life Testing (ALT) for Weibull Distribution • Stress: Temperature, Voltage, Load, etc. • Under operating (used) stress level, it takes a lot of time to get sufficient number of failures • Lifetimes obtained under high stress levels • Aim: (i) To estimate the lifetime distribution under used stress level, say, S 0 (ii) To estimate reliability for a specified time under S 0 (iii) To estimate quantiles under S 0 Sampling scheme(under constant stress testing) • Divide n components into k groups with number of components n1 , n2 ,..., nk respectively, where n k n i 1 i • ni components exposed under stress levels Si • Tij , j-th lifetime corresponding to Si • Obtain the equation for the lifetime corresponding to i-th group 50 •The equation for the lifetime corresponding to i-th group log[ log( 1 F (tij ))] i log( tij ) i log( i ) Yij Ai i X ij • If the data corresponding to the i- th group follow W ( i , i ) , the plot ( X i , Yi ) will roughly be linear with slope i and passes through the point (log( i ),0) • If the plots are linear and parallel, then lifetimes under different stress levels are Weibull with common slope and different scale i which implies that i depends on the stress levels Si 51 • If the k plots are linear and parallel, the lifetimes under different stress levels are Weibull with different slopes i and different scales i which implies that both i and i both depend on the stress levels Si . In this case modeling is difficult. • For the first case the relationship between the life and stress will be identified • Plot log(.632 quantile) against the stress levels • If the plot yields a straight line then the life-stress relationship will be log( ) 0 1S 52 • Estimation of 0 , 1 & using ML method • Likelihood function under stress level ni Li [( )( ) j 1 where i tij 1 i Si tij i exp{( ) }] , i exp( 0 1Si ) k • Total log-likelihood LogL( 0 , 1 , ) Log ( Li ) i 1 53 • Using numerical method MLEs of 0 , 1 & may be obtained • MLE of at S 0 , say, relationship ̂ 0 , is obtained through the ˆ0 exp( ˆ0 ˆ1S0 ) • Hence ML estimate of Weibull density under used stress level S 0 is obtained. Accordingly, estimate of reliability for a specified time, median life and other desired percentiles may also be obtained 54 Table 5: Specimens lives (in hours) of a electrical insulation at three temperatures appear below, data of Nelson(1990), pp. 154 o 200 C 2520 2856 3192 3192 3528 225 o C 816 912 1296 1392 1488 o 250 C 300 324 372 372 444 Nelson,W.(1990): Accelerated Testing: Statistical Models, Test Plans, and Data 55 Analyses, John Wiley and Sons. 56 Above three plots of the data given in table5 are roughly linear and parallel, so the lifetimes under three stress levels are Weibull with common slope and different scale parameters which implies that the scale parameters depend on the stress levels • Arrhenious life-stress relationship (temperature stress) log( ) 0 1 (1 / W ), where W is the temperature in degree kelvin • Temperature in degree kelvin= temperature in degree centigrade plus 273.16 57 58 • Results based on MLEs for the data given in table 5 with respect to the Arrhenious-Weibull model ˆ0 13.39707 ˆ1 10596.9961 ˆ .68566 2 log Lˆ 269.9008 At used stress(180 deg. Centigrade) the following results are obtained ˆ 0 21754.98 and ˆ .68566 Estimated median lifetime=12747.08 hours 59 • Results based on MLEs for the data given in table 5 with respect to the Arrhenious-Exponential model ˆ0 13.17807 ˆ1 10596.98923 2 log Lˆ 273.9037 At used stress(180 deg. Centigrade) the following results are obtained ˆ 0 27080.87 Estimated median lifetime=18771.03 hours 60 • Weibull versus Exponential Model for ALT H0 : 1 vs 2 log Lˆ0 273.9037 2 log Lˆ1 269.9008 H1 : 1 (For Exponential model) (For Weibull model) 2 ˆ ˆ 2(log L1 log L0 ) 4.0029 3.34 (.95,1) Conclusion: Accept Weibull model at 5% level of significance 61