Download A continuous - People Server at UNCW

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Density of states wikipedia , lookup

Inductive probability wikipedia , lookup

Probability amplitude wikipedia , lookup

Categorical variable wikipedia , lookup

Transcript
Statistics Review
• Random variable (r.v.):
– quantitative or
– categorical (qualitative)
– through the distribution of a random variable
we understand it: values of the r.v. and the
probabilities it takes on these values…
– another breakdown of r.v.s is into discrete and
continuous.
– Examples: sex, number of previous heart
attacks, weight, time since heart transplant, …
Distributions
• The probability distribution of a discrete r.v. is
a specification of all the values it takes on
along with the probabilities it takes on these
values.
– examples: Bernoulli r.v.
coin toss
– indicator r.v. or indicator function of an event A,
defined as
1 if x  A
IA (x)  
0 otherwise
– what are the values of I? what are the
corresponding probabilities?
• how do we represent the distribution of
a discrete r.v.?
– table, list, formula, graph
• what properties does the distribution of
a discrete r.v. have?
– all probabilities are between 0 and 1
– all probabilities sum to 1
• let the dist. of Y be as follows:
Y takes on values y1, y2, …,yn with
probabilities pn1, p2, …, pn ; then we have
0≤pi≤1 and
p 1

i1
i
• A continuous r.v. X is described by its
probability density function, f(x), which
has the property that f(x)≥0 for all x and
the total area bounded by its curve and
the horizontal (x) axis is 1. Additionally,
P(a≤X≤b)= probability X is between a
and b = the area under the density
curve between a and b. (Sketch!)
• Example is the normal r.v. whose
density is represented by the familiar
“bell-shaped” curve.
• often we evaluate probabilities for continuous
r.v.s via their cumulative distribution function
(cdf) F(x):
F(x)  P(X  x) 
x

f (t)dt  area under density curve below x

• so P(a≤X≤b)=F(b)-F(a) (Sketch!)
• Now define a survival r.v. Y as a continuous
r.v. taking its values in the interval from 0 to
inf; i.e., its values are thought of as the
lifetime or survival time = the time til death (or
time til failure if we’re considering an
inanimate object). So Y is a positive-valued
r.v. with pdf f(y) and cdf F(y) and F(y)=P(Y≤y)
• See example 1.1 for lifetime data on
failure time in days of the carbon lining
of a cell:
1540 1415 660 999 1193 1006 869 1035 797
296 775 1424 1169 1500 728 670 841
• Or example 1.2 where lifetime is time
from injection of a carcinogen to time of
death in mice - note there is also an
explanatory variable (categorical) that
splits the mice into two groups,
“conventional” and “germ-free”.

• Now define the survival (or reliability) function
S(y) as S(y) = 1- F(y) = P(Y>y). In terms of the

pdf, f, we have
S(y)  1  F(y)  P(Y  y) 
 f (t)dt
y
• Note the following important properties of the
survival function:
– S(0) = 1
– S(inf) = 0
– S(b) > S(a) for 0<b<a
• So the survival function is a monotone
decreasing function on the interval from 0 to
infinity (see Fig 1.1 p. 4)
• All the r.v.s we study will have a mean
and variance:
• for discrete r.v.sn X,
E ( X )   xi pi and V ( X )  E ( X  E ( X )) 2
i 1
• for continuous
r.v.s X,

E( X ) 


x f ( x)dx and V ( X ) 

2
(
x

E
(
X
))
f ( x)dx


• X Bernoulli: E(X)=p where p=P(X=1);
what is V(X)?
• X Binomial: E(X)=np, where n=# of
"trials", and p=prob. "S" on any one
"trial"; what is V(X)?
• HW: Hand in on Wednesday, August 30 at
the beginning of class:
Ignore as much as possible the difficulties
regarding “censoring” of this data
(discussed in Example 1.2, p.2) and do an
analysis, using whatever methods you
know at this time…use graphically and
numerically and statistically appropriate
methods to compare the response
variable (time to tumor onset) across the
two levels of the explanatory variable
(“conventional” and “germ-free”).