* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 235_lecture7_080225
Survey
Document related concepts
Transcript
Psyc 235: Introduction to Statistics http://www.psych.uiuc.edu/~jrfinley/p235/ DON’T FORGET TO SIGN IN FOR CREDIT! Independent vs. Dependent Events • Independent Events: unrelated events that intersect at chance levels given relative probabilities of each event • Dependent Events: events that are related in some way • So... how to tell if two events are independent or dependent? Look at the INTERSECTION: P(AB) • if P(AB) = P(A)*P(B) --> independent • if P(AB) P(A)*P(B) --> dependent Random Variables • Random Variable: variable that takes on a particular numerical value based on outcome of a random experiment • Random Experiment (aka Random Phenomenon): trial that will result in one of several possible outcomes can’t predict outcome of any specific trial can predict pattern in the LONG RUN Random Variables • Example: • Random Experiment: flip a coin 3 times • Random Variable: # of heads Random Variables • Discrete vs Continuous finite vs infinite # possible outcomes • Scales of Measurement Categorical/Nominal Ordinal Interval Ratio Data World vs. Theory World • Theory World: Idealization of reality (idealization of what you might expect from a simple experiment) Theoretical probability distribution POPULATION parameter: a number that describes the population. fixed but usually unknown • Data World: data that results from an actual simple experiment Frequency distribution SAMPLE statistic: a number that describes the sample (ex: mean, standard deviation, sum, ...) So far... • Graphing & summarizing sample distributions (DESCRIPTIVE) • Counting Rules • Probability • Random Variables • one more key concept is needed to start doing INFERENTIAL statistics: SAMPLING DISTRIBUTION Binomial Situation • Bernoulli Trial a random experiment having exactly two possible outcomes, generically called "Success" and "Failure” probability of “Success” = p probability of “Failure” = q = (1-p) Examples: Coin toss: “Success”=Heads p=.5 Heads Tails Robot Factory: “Success”=Good Robot p=.75 Good Robot Bad Robot Binomial Situation • Binomial Situation: n: # of Bernoulli trials trials are independent p (probability of “success”) remains constant across trials • Binomial Random Variable: X = # of the n trials that are “successes” Binomial Situation: collect data! Population: Bernoulli Trial: one coin toss Outcomes of all possible coin tosses (for a fair coin) Success=Heads p=.5 Let’s do 10 tosses n=10 (sample size) Sample: X= .... Binomial Random Variable: X=# of the 10 tosses that come up heads (aka Sample Statistic) Binomial Distribution p=.5, n=10 0.30 0.25 probability 0.20 0.15 This is the SAMPLING DISTRIBUTION of X! 0.10 0.05 0.00 0 1 2 3 4 5 # of successes 6 7 8 9 10 Sampling Distribution • Sampling Distribution: Distribution of values that your sample statistic would take on, if you kept taking samples of the same size, from the same population, FOREVER (infinitely many times). • Note: this is a THEORETICAL PROBABILITY DISTRIBUTION Binomial Situation: collect data! Population: Bernoulli Trial: one coin toss Outcomes of all possible coin tosses (for a fair coin) Success=Heads p=.5 0.3 Let’s do 10 tosses n=10 (sample size) 0.25 Sampling Distribution probability 0.2 0.15 0.1 0.05 0 0 1 2 3 4 5 6 7 8 9 10 # of successes Sample: X= 3 5 6 .... Binomial Random Variable: X=# of the 10 tosses that come up heads (aka Sample Statistic) Binomial Situation: collect data! Population: Bernoulli Trial: one coin toss Outcomes of all possible coin tosses (for a fair coin) Success=Heads p=.5 0.3 0.25 Sampling Distribution probability 0.2 0.15 Let’s do 10 tosses n=10 (sample size) 0.1 0.05 0 0 1 2 3 4 5 # of successes Sample: X= 3 6 7 8 9 10 Binomial Random Variable: X=# of the 10 tosses that come up heads (aka Sample Statistic) Binomial Formula P(X k) P(exactly k many successes) specific # of successes you could get probability of success n k nk P(X k) p (1 p) k Binomial Random Variable combination called the Binomial Coefficient n n! k k!(n k)! specific # of failures probability of failure Binomial Formula 0.3 0.25 Sampling Distribution probability 0.2 0.15 0.1 0.05 0 0 1 2 3 4 5 6 7 8 9 10 # of successes 3 p(X=3) = Hmm... what if we had gotten X=0?... pretty unlikely outcome... fair coin? Remember this idea.... p=.5 n=10 More on the Binomial Distribution • X ~ B(n,p) Expected Value and Variance for X~B(n,p) X np these are the parameters for the sampling distribution of X X2 np(1 p) Standard Deviation : X np(1 p) Ex: # heads in 5 tosses of a coin: # heads in 5 tosses of a coin: X~B(5,1/2) Expectation 2.5 Variance Std. Dev. 1.25 1.12 Let’s see some more Binomial Distributions • What happens if we try doing a different # of trials (n) ? • That is, try a different sample size... Binomial Distribution, p=.5, n=5 0.35 0.3 probability 0.25 0.2 0.15 0.1 0.05 0 0 1 2 3 # of successes 4 5 Binomial Distribution, p=.5, n=10 0.3 0.25 probability 0.2 0.15 0.1 0.05 0 0 1 2 3 4 5 # of successes 6 7 8 9 10 Binomial Distribution, p=.5, n=20 0.2 0.18 0.16 probability 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0 1 2 3 4 5 6 7 8 9 10 11 # of successes 12 13 14 15 16 17 18 19 20 Binomial Distribution, p=.5, n=50 0.12 0.1 0.06 0.04 0.02 # of successes 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0 0 probability 0.08 Binomial Distribution, p=.5, n=100 0.09 0.08 0.07 0.05 0.04 0.03 0.02 0.01 # of successes 99 96 93 90 87 84 81 78 75 72 69 66 63 60 57 54 51 48 45 42 39 36 33 30 27 24 21 18 15 12 9 6 3 0 0 probability 0.06 Whoah. • Anyone else notice those DISCRETE distributions starting to look smoother as sample size (n) increased? • Let’s look at a few more binomial distributions, this time with a different probability of success... Binomial Robot Factory • 2 possible outcomes: Good Robot 90% Bad Robot 10% You’d like to know about how many BAD robots you’re likely to get before placing an order... p = .10 (... “success”) n = 5, 10, 20, 50, 100 Binomial Distribution, p=.1, n=5 0.7 0.6 probability 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 # of successes 4 5 Binomial Distribution, p=.1, n=10 0.45 0.4 0.35 probability 0.3 0.25 0.2 0.15 0.1 0.05 0 0 1 2 3 4 5 # of successes 6 7 8 9 10 Binomial Distribution, p=.1, n=20 0.3 0.25 probability 0.2 0.15 0.1 0.05 0 0 1 2 3 4 5 6 7 8 9 10 11 # of successes 12 13 14 15 16 17 18 19 20 Binomial Distribution, p=.1, n=50 0.2 0.18 0.16 0.12 0.1 0.08 0.06 0.04 0.02 # of successes 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0 0 probability 0.14 Binomial Distribution, p=.1, n=100 0.14 0.12 0.08 0.06 0.04 0.02 # of successes 99 96 93 90 87 84 81 78 75 72 69 66 63 60 57 54 51 48 45 42 39 36 33 30 27 24 21 18 15 12 9 6 3 0 0 probability 0.1 Normal Approximation of the Binomial If n is large, then X ~ B(n,p) {Binomial Distribution} can be approximated by a NORMAL DISTRIBUTION with parameters: np np(1 p) 0.3 0.25 probability 0.2 0.15 0.1 0.05 0 Normal Distributions • (aka “Bell Curve”) • Probability Distributions of a Continuous Random Variable (smooth curve!) • Class of distributions, all with the same overall shape • Any specific Normal Distribution is characterized by two parameters: mean: standard deviation: different means different standard deviations Standardizing • “Standardizing” a distribution of values results in re-labeling & stretching/squishing the x-axis • useful: gets rid of units, puts all distributions on same scale for comparison • HOWTO: simply convert every value to a: Z SCORE: z x Standardizing • Z score: z x • Conceptual meaning: how many standard deviations from the mean a given score is (in a given distribution) • Any distribution can be standardized • Especially useful for Normal Distributions... Standard Normal Distribution • has mean: =0 • has standard deviation: =1 • ANY Normal Distribution can be converted to the Standard Normal Distribution... Standard Normal Distribution Normal Distributions & Probability • Probability = area under the curve intervals cumulative probability [draw on board] • For the Standard Normal Distribution: These areas have already been calculated for us (by someone else) Standard Normal Distribution So, if this were a Sampling Distribution, ... Next Time • More different types of distributions Binomial, Normal t, Chi-square F • And then... how will we use these to do inference? • Remember: biggest new idea today was: SAMPLING DISTRIBUTION