* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lecture notes
Survey
Document related concepts
Transcript
Stats 3: Standardized Scores, Hypothesis Testing Cohen Chpts 4/5 Joe got 70% on his physics test. Is that good? A: s.d. about 5. B: s.d. about 20 How many standard deviations is X from the mean? Class A: s.d. = 5, z=….. Class B: s.d. = 20, z=…. What are the units of z? z is a standardized score. Which is better? 70 in (60,20) or 60 in (50,10)? Properties of Standardized Scores: Mean of the distribution is 0, std. dev. is 1 ---Regardless of whether the original distribution was normal or not If the original distribution is normal, so is the standardized distribution. The shape of a distribution is not changed by changing to standard scores. The Standard Normal Distribution Symmetrical Mean 0, s.d. 1 Z Mean Beyond to z z .98 .3365 .1635 .99 .3389 .1611 1.0 .3413 .1587 1.01 .3438 .1562 Area under the curve (normal distribution) translates directly into probability measures. Probability of X is a number between 0 and 1 (1 is certainty). For continuous scales (real numbers), we can only provide probability estimates for ranges, not specific values (e.g. height between 60 and 62 inches) Regard the area under the curve as being 1, we can identify some subset of that, and ask, how likely is it that a sample from this distribution will lie within that subset: what proportion of the area under the curve lies within that range…. Using tables to calculate area under the normal curve Sampling Distribution of the Mean Typically, we work with groups of subjects, rather than single subjects (when might this not be true?) We wish our sample to stand-in for the population. We want our sample to be somehow typical of the underlying population. How typical is the mean of our sample? If we use a sample of 10 subjects, and we repeat our experiment 100 times, with a different 10 subjects each time, how much variability will there be in our results? Does sample size matter? Standard Error of the Mean Sampling Distribution of the Mean Approaches a Normal Distribution as N gets big = Central Limit Theorem Finding raw scores corresponding to a given area Joining Mensa requires an IQ in the top 2% of the population (and an arrogance quota in the top 1%). If IQs are normally distributed, mean=100, sd=16, what IQ is required for entry? Groups and Individuals How typical is Group G1 compared to some population of groups? (teams, subject pools, etc) Same as standard scores for individuals, but we must use the SEM instead of the standard deviation. Intuition: Any individual is liable to be ‘off the scale’. A group ought to be more reliable. Why do casinos like you to place a lot of little bets? Z-scores: some caveats…. We can go from z-score to area (probability) only if we know the underlying distribution is normal. (We usually don’t know this) Samples must be independent and random A note on probabilities P(A or B) = P(A) + P(B) [mutually exclusive events] P(A or B) = P(A) + P(B) - P(A and B) [overlapping events] P(A and B) = P(A) * P(B) [independent events] If one tenth of the people in the world are Chinese, one twentieth are Indian, and half are male, what is P(Chinese)? P(Chinese or Indian)? P(Chinese and Indian)? P(Chinese and Male)? P(Chinese or Male)? Independent Events Two events are independent if p(A and B) = p(A)*p(B) e.g. Two successive flips of a coin: p(HH) = 0.5*0.5 = 0.25 Gambler’s fallacy assumes non-independence Non-independence: Conditional Probability P(A and B) = p(A)*p(B|A) p(B|A) = p(B) given A e.g. Probability of being dealt 2 hearts in a row (Compare this with sampling with replacement) . End of Chpt 4 Basic Hypothesis Testing We do an experiment and obtain a result, x. We would like to know what the probability is that this arose by chance. Fictitious example: Mathematical aptitude is measured in the USA using SAT scores, which have a mean of 500, and an s.d. of 100. I have a psychic who can predict mathematical aptitude based on reading auras. He selects 25 people who he claims will have higher average math aptitude. The average aptitude in this group is 530. Wow? Meet the Skeptic: Dr Null Dr Null is always the first to examine your results. He always claims that you obtained your result by chance. It is highly unlikely that any sample of 25 will have a mean SAT of exactly 500. About half the time it will be higher, and half the time it will be lower. You got 530 by chance. How do we decide? Dr Null could always be right. How much risk do we take in rejecting Dr Null’s case? Peculiarities of the present case: We grant that our sampling is random (all members of the population equally likely to be selected). We grant that our samples are independent (choosing P1 does not affect our choice of P2) We grant that we know the mean and s.d. of the population (500, 100). Dr Null’s plan of attack Dr Null decides to go out and also sample 25 people. He gets a mean SAT of 490. Rats. So he does it again, and gets a mean SAT of 540. Ah ha! I told you it was just chance. What can we do? Dr Null is sampling from the population, 25 people at a time. His samples have an expected mean of 500 (the population mean) and an expected standard deviation of …… 20 (????) Our score was 530. Which is a z-score of….. 1.5 (???) How likely is it that we got a score this large (or larger) by chance, if Dr Null is correct? 0.0668% of the time, we might expect a result this large or larger (p=0.0668) Selecting alpha: Choosing your risk level Scientists have a rule of thumb: If the chance of Dr Null beating you is less than 1:20, we will take you seriously and ignore his protests (for now). We set α=0.05 Here, p>0.05, so we cannot reject the null hypothesis (drat). We do not have a statistically significant result. Caution: statistically significant does not mean significant!! Using a test statistic Z here is a test statistic: Based on one or more sample statistic(s) Follows a well-known distribution Large z-score --> lower p-level Usually only report one of p<{0.05,0.01,0.001} We do not report exact p values. Large z-scores allow us to confidently reject the null hypothesis…. …but do not guarantee that the alternative is interestingly different. Large z-scores are easier to get with large samples (why?) Two Types of Error The null hypothesis is… Do we reject the null? yes no true false Type I Error (False alarm) p=α Correct Rejection (Hit) p=1-β Correct Failure To Reject p=1-α Type II Error (Miss) p=β The choice of α is a trade-off between Type I and Type II errors. What are the costs of a false alarm and of a miss for the following: •A pilot emerges from the fog and estimates whether her position is suitable for landing •A doctor estimates whether a fuzzy spot could be a tumor •You receive a letter with some white powder inside With our psychic, we had a clear directional hypothesis. …We were only looking for a large SAT score Often, we must be open to scores which are either larger or smaller than expected under the null hypothesis Two tailed test Doing a simple hypothesis test: 1. State your hypothesis 1. Define the null: 2. And its alternative 2. Select a test and significance level 3. Get some data 4. Find region of rejection (critical value) 5. Calculate test statistic (z) and compare to critical value 6. Interpret the result Assumptions of the One-Sample z-Test The Sample is drawn randomly The variable measured has a normal distribution in the population Std dev of sample is same as std dev of population (impossible to test?) …Rarely used Reporting: “A one-tailed test showed that SAT scores were less than the population mean (z=-2.76, p<0.05).” One-sample z-Test in R R does not provide a standard z-test. To find the area under the normal curve corresponding to all scores below a specific value, use pnorm: pnorm(1.96) [1] 0.9750021 pnorm(-1.96) [1] 0.02499790 Labs and R…… feedback? Basic skill list: 1. Start R, enter commands 2. Change working directory 3. Get help(!!!) 4. Read in commands (source…) 5. Read in data (read.table) 6. Data subsetting: dat$x, dat[i], dat[i,j], dat[i,], dat[1:5,], etc 7. List creation: c(a.b.c) 8. Basic plotting: hist(…, breaks), boxplot(x,y,z,names=….) 9. Generation: rnorm(x,y,z), runif(x,y,z) 10. Useful functions: seq, rep, summary, par…..