Download The probability framework for statistical inference

The probability framework for statistical inference Population  The group or collection of entities of interest  Here, “all possible” school districts  “All possible” school districts means “all possible” circumstances that lead to specific values of STR (student-teacher ratio), test scores  The set of “all possible” schools districts includes but is much larger than the set of 420 schools districts observed in 1998.  We will think of populations as infinitely large; the task is to make inferences from a sample from a large population Random variable Y  A random variable assigns a number to each member of the population in a particular way.  The adjective “random” refers to the fact that the value the variable takes is determined by a drawing from the population.  The district average test scores and the district STRs are random variables; their numerical values are determined once we choose a year/district to sample. Characterizing Random Variables: - Distribution Moments of the Distribution Joint and Distributions Covariance; Correlation Conditional Moments of the Distribution Population distribution of Y  Discrete Random Variables: The probabilities of different values of Y that occur in the population For ex. Pr[Y = 650], when Y is discrete. This probability is the proportion of elements of the population for which the value of Y is exactly equal to 650.  Continuous Random Variables: The probabilities of sets of these values For ex. Pr[Y  650], when Y is continuous. This probability is the proportion of elements of the population for which the value of Y is less than or equal to 650. “Moments” of the population distribution mean = expected value = E(Y) = Y = long-run average value of Y over repeated realizations of Y For a discrete random variable, the mean is found by a weighted average of each possible value of Y, where the weight assigned to a given value of Y is the probability of that Y. For a continuous random variable the mean is found by integrating over all possible values of Y weighting each value of Y by the “density function” evaluated at that Y. variance = E(Y – Y)2 =  Y2 = measure of the squared spread of the distribution standard deviation = variance = Y Note that – 1. The variance is an expected value of a random variable. 2. The variance is in squared units of Y; the standard deviation is in the same units as Y. Joint distributions  Corresponding to each member of the population there may be more than one value assigned. E.g., test score (Y) and STR (X).  There is a probability distribution for Y (from which we can derive the mean and variance of Y) and a probability distribution for X (from which we can derive the mean and variance of X).  The joint probability distribution for Y and X provides the probability that the random variables Y and X take on the values y and x, respectively (if Y and X are discrete random variables), i.e., Prob(Y=y and X=x), or the probability that the random variables Y and X lie in some subset of R2 (if Y and X are continuous random variables), i.e., Prob(Y<y and X<x).  For example, what is the probability of drawing a district from the population for which the average test score is 650 and the STR is 20?  The marginal distributions of Y and X are simply the individual probability distributions of Y and X, which can be recovered from their joint distribution (although the reverse isn’t true.)  The random variables Y and X are independent if (and only if) their joint distribution factors into the product of their marginal distributions, i.e., Prob(Y=y and X=x) = Prob(Y=y)*Prob(X=x) Prob(Y<y and X<x) = Prob(Y<y) Prob(X<x) for all x and y. The covariance between r.v.’s X and Y is, cov(X,Y) = E[(X – X)(Y – Y)] = XY  cov(X,Y) > 0: X and Y are positively related; when X is above (below) its mean, Y tends to be above (below) its mean. cov(X,Y) < 0 … (We hypothesis that the random variables test score and STR have a negative covariance.)  If X and Y are independently distributed, then cov(X,Y) = 0 (but not vice versa!!) The correlation coefficient is defined in terms of the covariance: corr(X,Y) = cov( X , Z )  XZ = rXZ  var( X ) var( Z )  X  Z  –1  corr(X,Y)  1  corr(X,Y) = 1 mean perfect positive linear association  corr(X,Y) = –1 means perfect negative linear association  corr(X,Y) = 0 means no linear association Conditional distributions  The distribution of Y, given value(s) of some other random variable, X  So, conditional distributions are distributions of “subpopulations,” created from the original population according to some criteria.  Ex: the distribution of test scores, given that STR < 20. (Divide the population into two subpopulations according to their STRs. Then consider the distribution of test scores for each population.) Moments of conditional distributions  conditional mean = mean of conditional distribution = E(Y|X = x) (important notation)  conditional variance = variance of conditional distribution  Example: E(Test scores|STR < 20), the mean of test scores for districts with small class sizes; Var(Test scores|STR < 20), the variance of test scores for districts with small class sizes; The difference in means is the difference between the means of two conditional distributions:  = E(Test scores|STR < 20) – E(Test scores|STR ≥ 20) Other examples of conditional means:  Wages of all female workers (Y = wages, X = gender)  One-year mortality rate of those given an experimental treatment (Y = live/die; X = treated/not treated) The conditional mean is a new term for a familiar idea: the group mean Inference about means, conditional means, and differences in conditional means We would like to know  (test score gap; gender wage gap; effect of experimental treatment), but we don’t know it. (We don’t know it? Didn’t we calculate  last week?) Therefore we must collect and use data by sampling from the population, permiting us to make statistical inferences about .  Experimental data  Observational data Simple random sampling  Choose an individual (district, entity) at random from the population Randomness and data  Prior to sample selection, the value of Y for is random because the individual selected is random  Once the individual is selected and the value of Y is observed, then Y is just a number – not random  The data set is (Y1, Y2,…, Yn), where Yi = value of Y for the ith individual (district, entity) sampled.  Thus, the data set is made up of realized values of n random variables. Implications of simple random sampling Because individuals #1 and #2 are selected at random, the value of Y1 has no information content for Y2. Thus:  Y1, Y2 are independently distributed  Y1 and Y2 come from the same distribution, that is, Y1, Y2 are identically distributed  That is, a consequence of simple random sampling is that Y1 and Y2 are independently and identically distributed (i.i.d.).  More generally, under simple random sampling, {Yi}, i = 1,…, n, are i.i.d

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download The probability framework for statistical inference