Download Users of statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Information theory wikipedia , lookup

Pattern recognition wikipedia , lookup

Hardware random number generator wikipedia , lookup

Fisher–Yates shuffle wikipedia , lookup

Birthday problem wikipedia , lookup

Generalized linear model wikipedia , lookup

Randomness wikipedia , lookup

Probability amplitude wikipedia , lookup

Transcript
Psych 524, 10/10/05
p. 1/8
Random Variables and Probability Distributions (based on Kirk, Ch. 7)
Random Sampling
random sampling: method of drawing samples from a population so that every
possible sample of a particular size has the same probability of being selected
with replacement: sampled element is returned to the population and can
therefore be drawn again
without replacement: sampled element is not returned to the population (most
common sampling method)
examples of random sampling: flip a coin, draw out of a hat/jar, use a spinner,
use a table of random numbers
using a table of random numbers
refer to table D1 in Kirk
basic idea is to determine a random starting place (Kirk suggests tossing a
pencil to see where it lands, but I would suggest labeling 50 objects &
drawing out of a hat…
1st draw: 1 through 25 land you in the corresponding column on the first
page, 26-50 land you in the corresponding column (+25) on the second
page
2nd draw: what you draw here corresponds to the row
if you have up to 99 cases in the population, proceed across (and down) the
table, with each two-digit number on Table D.1 corresponding to a case
number; if you have fewer than 99 cases, skip all case numbers that do not
exist in your population (e.g., if you only have 57 cases, but the table gives
you an 89, go on to the next number in the table)
if you have up to 999 cases, proceed across (and down) the table so that you
use a 3-digit number to select your case, then skip to the next 3-digit
number, etc.; note that the spaces between numbers in the table are
irrelevant, so if the table reads 09 76 63, you would read this as “097” (or
97) and “663”
Random Variables
random variable: numerical quantity whose value is determined by the outcome
of a random experiment; a symbol for a set of numerical events, with each
having a probability
e.g., we could say that the random variable, X, was the symbol for the set of
values that would result from the roll of a die or the measurement of
randomly drawn women’s heights
Psych 524, 10/10/05
p. 2/8
random variables are quantitative
values of random variables are determined by chance
random variables can be discrete or continuous (but remember, we can only
measure continuous variables in discrete units)
probability distribution: a distribution that associates a probability with each
value of a random variable (like a frequency distribution but with probabilities
instead of frequencies)
e.g., regular 6-sided die
e.g., 6-sided die with sides labeled
1,2,2,3,3,4
e.g., roll 6-sided die twice in a row and calculate mean
Expected value of discrete random variable
With frequency distributions, we can calculate several measures of central
tendency. We can calculate a similar index for probability distributions. We refer
to this as expected value, or E(X), which can be computed as:
E(X) = Σp(Xi)Xi
This may look more daunting than it is. Basically, you take each possible value
of the random variable, multiply it by its associated probability, and sum the
results. This is very similar to what you do to get the mean from an
ungrouped frequency distribution. In that case, you multiplied the number of
Psych 524, 10/10/05
p. 3/8
cases observed for each score by that score, summed up, and divided by the
total number of cases. Essentially, all you were doing was weighting each
score by its relative frequency or proportion. That is all we are doing now.
Examples: take a look at each of the examples on page 3; first, make a guess as to
what the expected value should be; then, calculate E(X) based on the above
formula
Expected value of continuous random variable
Calculating the expected value for a continuous random variable (e.g., height) is
not as straight-forward (see the box on the next page for a hint)
Because of this, we must determine the area under portions of the curve defined
by certain intervals (we can say what portion of the population is between
5.9 and 6 feet tall). Calculus is necessary to do this, but most statistical
textbooks have tables that summarize this information (at least a reasonable
approximation); we will cover this in chapter 9.
Standard deviation of a discrete random variable
As with frequency distributions, we can also find the standard deviation of a
probability distribution. Recall that for a frequency distribution:
S   f j ( X i  X )2 / n
We can think of the formula for a probability distribution very similarly. If we
move n into the numerator so that f becomes f/n, it becomes easier to see the
Psych 524, 10/10/05
p. 4/8
association between the above formula and the formula for the standard
deviation of a probability distribution.
   p( X i )( X i  E( X i )) 2
Hint for page 3: What is the probability of sampling someone who is exactly 5.9173567 feet tall?
Examples: Compute the standard deviations for the distributions on page 2
Binomial Distribution
Now we turn to the case of a variable that is not quantitative.
Bernoulli trial
based on an experiment with only two outcomes (e.g., coin flip; success vs.
failure)
three properties:
trial can result in one of two outcomes
probability of “success” remains constant across trials
outcomes of trials are independent
note that the last two criteria are rarely met in real-life situations because
sampling is often done without replacement from a finite (not infinite)
population; but as long as the population is large relative to the sample size,
the basic properties described below are applicable
Psych 524, 10/10/05
p. 5/8
Distributions of Simple and Complex Events
Simple Events
single trial is…
Bernoulli trial
(e.g., coin flip)
“Bernoulli”
Complex Events
(Sampling Distributions)
“binomial”
“binomial”
Discrete random
variable (e.g., die
toss)
Continuous
random variable
(e.g., height
measurement)
Binomial Distribution
binomial distribution: probability distribution resulting from conducting two or
more Bernoulli trials
Example: construct a probability distribution for two tosses of a fair coin
(consider heads as successes)
Although the probability of success (p) may equal the probability of a failure (q),
this is not necessary. Consider, for example, the case where success is defined
as correctly answering a 5-choice multiple-choice question. In this case, p is
1/5 = .2, and q is 4/5 = .8.
Constructing the probability distribution when p and q are unequal (or when the
number of trials is large) becomes a bit more challenging to generate
conceptually. Instead, we can use the following formula, which generates the
probability of obtaining a given number of successes:
p(X=r) = nCrprqn-r
Psych 524, 10/10/05
p. 6/8
In this formula, r is the number of successes, so when you compute p(X=r), you
are computing the probability that you will observe r successes. Again, p in
the right hand portion of the equation refers to the probability of a success,
and q refers to the probability of a failure. As long as you keep track of which
is which, you will obtain the same result (e.g., p can refer to number of heads
or number of tails; it can refer to number of correct responses or number of
incorrect responses). Finally, n refers to the number of trials. Note that the
exponents (r and n-r) will always sum to n. Also, p and q will always sum to
1.
Example: What’s the probability that a student who takes a multiple choice test
and simply guesses will correctly answer 3 or more questions. Assume that
each question has 5 choices.
Another way to obtain these probabilities is to use a binomial table (see separate
handout…also linked on website). Note that you obtain the same answer with
the table.
Expected Value and Standard Deviation of the Binomial Distribution
Formulas for the expected value (~mean) and standard deviation of the binomial
distribution can be derived from the more general formulas given above for
random variables. However, this derivation is somewhat complex, so it is
now shown here. The resulting formulas are very straight-forward:
E(X) = np, where p is the probability of success
  npq , where p and q are the probabilities of success and failure,
respectively
Example: compute the expected value and standard deviation for the multiple
choice problem above; do your answers make sense?
Psych 524, 10/10/05
p. 7/8
Multinomial Distribution
This distribution is similar to the binomial distribution; however, instead of there
being two possible outcomes, the multinomial distribution results when there
are 3 or more possible outcomes (e.g., red, white, blue marbles).
Now, we are not interested in number of successes or failures but, instead, the
probability that some distinct outcome will occur:
Replacement is assumed.
p( X 1  n1 andX 2  n2 and ...andX k  nk ) 
n!
( p1 ) n1 ( p 2 ) n2 ...( p k ) nk
n1!n2 !...nk !
nk represents the number of each type that are drawn
pk represents the probability associated with each type
Example: What’s the probability of drawing one white, two blue, and one red pair
of socks from a drawer containing 5 white, 4 blue, and 1 red pair of socks, if
the socks are put back into the drawer after each draw?
The Chi-Square distribution is often used to approximate the multinomial
distribution. We will return to this topic later.
Hypergeometric Distribution
A similar problem can be imagined where replacement is not assumed.
p( X 1  n1andX 2  n2 and ...andX k  nk ) 
(t1 Cn1 )( t2 Cn2 )...( tk Cnk )
( t Cn )
nk represents the number of each type that are drawn
tk represents the number of each type that are in the population
Example: What’s the probability of drawing one white, two blue, and one red pair
of socks from a drawer containing 5 white, 4 blue, and 1 red pair of socks, if
the socks are not put back into the drawer after each draw?
Psych 524, 10/10/05
p. 8/8
Application of the Binomial Distribution: The Sign Test
Assume that you are interested in determining whether brothers and sisters
instigate the same proportion of physical aggression when they are
interacting with each other. We will discuss null-hypothesis testing in
much more detail later, but, for now, assume that we are drawing random
samples of brother-sister pairs from the population of interest. The
observer decides whether the brother was more aggressive (let’s arbitrarily
call this a “success”) or the sister was more aggressive (let’s call this a
“failure”). We will now add up all of our successes and failures (these
will sum to the total number of brother-sister pairs). Assume that we
sample 20 brother-sister pairs and find that, in 13 cases, brothers instigate
physical aggression more often than sisters. Can we conclude that
brothers are more physically aggressive? Well, if brothers and sisters
were equally aggressive in the population (p = .5), We will establish a