Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics Review • Random variable (r.v.): – quantitative or – categorical (qualitative) – through the distribution of a random variable we understand it: values of the r.v. and the probabilities it takes on these values… – another breakdown of r.v.s is into discrete and continuous. – Examples: sex, number of previous heart attacks, weight, time since heart transplant, … Distributions • The probability distribution of a discrete r.v. is a specification of all the values it takes on along with the probabilities it takes on these values. – examples: Bernoulli r.v. coin toss – indicator r.v. or indicator function of an event A, defined as 1 if x A IA (x) 0 otherwise – what are the values of I? what are the corresponding probabilities? • how do we represent the distribution of a discrete r.v.? – table, list, formula, graph • what properties does the distribution of a discrete r.v. have? – all probabilities are between 0 and 1 – all probabilities sum to 1 • let the dist. of Y be as follows: Y takes on values y1, y2, …,yn with probabilities pn1, p2, …, pn ; then we have 0≤pi≤1 and p 1 i1 i • A continuous r.v. X is described by its probability density function, f(x), which has the property that f(x)≥0 for all x and the total area bounded by its curve and the horizontal (x) axis is 1. Additionally, P(a≤X≤b)= probability X is between a and b = the area under the density curve between a and b. (Sketch!) • Example is the normal r.v. whose density is represented by the familiar “bell-shaped” curve. • often we evaluate probabilities for continuous r.v.s via their cumulative distribution function (cdf) F(x): F(x) P(X x) x f (t)dt area under density curve below x • so P(a≤X≤b)=F(b)-F(a) (Sketch!) • Now define a survival r.v. Y as a continuous r.v. taking its values in the interval from 0 to inf; i.e., its values are thought of as the lifetime or survival time = the time til death (or time til failure if we’re considering an inanimate object). So Y is a positive-valued r.v. with pdf f(y) and cdf F(y) and F(y)=P(Y≤y) • See example 1.1 for lifetime data on failure time in days of the carbon lining of a cell: 1540 1415 660 999 1193 1006 869 1035 797 296 775 1424 1169 1500 728 670 841 • Or example 1.2 where lifetime is time from injection of a carcinogen to time of death in mice - note there is also an explanatory variable (categorical) that splits the mice into two groups, “conventional” and “germ-free”. • Now define the survival (or reliability) function S(y) as S(y) = 1- F(y) = P(Y>y). In terms of the pdf, f, we have S(y) 1 F(y) P(Y y) f (t)dt y • Note the following important properties of the survival function: – S(0) = 1 – S(inf) = 0 – S(b) > S(a) for 0<b<a • So the survival function is a monotone decreasing function on the interval from 0 to infinity (see Fig 1.1 p. 4) • All the r.v.s we study will have a mean and variance: • for discrete r.v.sn X, E ( X ) xi pi and V ( X ) E ( X E ( X )) 2 i 1 • for continuous r.v.s X, E( X ) x f ( x)dx and V ( X ) 2 ( x E ( X )) f ( x)dx • X Bernoulli: E(X)=p where p=P(X=1); what is V(X)? • X Binomial: E(X)=np, where n=# of "trials", and p=prob. "S" on any one "trial"; what is V(X)? • HW: Hand in on Wednesday, August 30 at the beginning of class: Ignore as much as possible the difficulties regarding “censoring” of this data (discussed in Example 1.2, p.2) and do an analysis, using whatever methods you know at this time…use graphically and numerically and statistically appropriate methods to compare the response variable (time to tumor onset) across the two levels of the explanatory variable (“conventional” and “germ-free”).