Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Bootstrapping (statistics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
History of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Law of large numbers wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Gibbs sampling wikipedia , lookup
Categorical variable wikipedia , lookup
Descriptive statistics 922 What do we need to run an experiment? Hypothesis (Linguistic) Participants Task (stimuli = questions, responses = answers) Results Conclusions Key terms: stimulus design, response measure Example Show me the cat that bit the dog Show me the cat that the dog bit Picture from: Friedmann &Novogrodsky (2001) Design Number of conditions Within subject / between subject How many items to each participant Order of items Measure Response Variables Scales Analysis Descriptive Inferential Variables Any experimental category that has a value that can vary. Anything that is not constant and can change over time, or be different in different people is a variable Variables can take many forms Variables can be manipulated and observed Properties of Variables Continuous variable – along a continuum with equal intervals (e.g., age, height, weight, grade in a test) Ordinal variables – rating along a continuum with estimated intervals (e.g., evaluation) Discrete variables (categorical, nominal) – divide to categories (e.g., language, yes/no, correct/incorrect) Types of Variables Independent variables – Characteristics of the subject (Participant variable) Conditions chosen by the experimenter Dependent variables – what the experiment measures (e.g., degree of success) Intervening variables – variables which are not measured or manipulated, but could influence the results (e.g., concentration, intelligence) Scales Nominal Ordinal Interval Ratio Scales Nominal Ordinal Interval Ratio Two things with the same number are similar (same name) Scales Nominal Ordinal Interval Ratio Four is more than three (but not the same as three from two) Scales Nominal Ordinal Interval Ratio Four is more than two (but not twice) Scales Nominal Ordinal Interval Ratio Four is more than three, same as three from two, and is twice two Which scale are the following variables rated on? Height Celsius degrees TV channel number Grades in an exam (1-100) Psychological rating (anxiety on a scale of 110) Time (13:00, 14:00) Time (one hour, two hours, three hours) Phone number Rating places in a race Variables and Scales: summary Choose an appropriate task Measure responses Be aware of the variables and their properties Choose the mathematical operations appropriate for the scale Factorial design Tests all possible combinations, e.g., a 2x2 design – one participant variable and one independent variable with two conditions. Subject relatives TLD SLI Object Relatives Practical questions for offline tasks How many subjects? At least 25 How many categories? 2x2 How many items? More subjects >> fewer items. For 25 – 6 items per category For 50 – 3 is enough For case studies and within subject analysis at least 10. SIMPLE NUMERICAL COMPUTATIONS Ratio The relation between two nominal variables N Nouns 80 V/N ratio: 60/80=3/4 Verbs 60 N/V ratio: 80/60=4/3 Other words Total 50 190 Example Goofy said that the Troll had to put two hoops on the pole to win. Does the Troll win? Musolino (2004) Ratio N Yes 8 No 12 Didn’t answer Total 10 37 Yes/no ratio: 8/12=2/3 Proportion Relation between a group and its part (Verb/Word, Pronouns/Subject position). Ratio out of the total Verb/Word proportion: 60/190=1/3=0.31 Percentage (%) Relative proportion out of a hundred Verb percentage (out of all words): 100*(60/190) =31% Rate The relative frequency (for population out of a 1000) 7% of children have SLI >> 0.07 * 1000 = 70 70 children out of a 1000 have SLI Frequency Count the number of times a score occurs. How many times a value of a variable occurs? Example Show 10 pictures, and check for number of “correct” response Is every bunny eating a carrot? Roeper, Strauss and Zurer Pearson (2004) Picture correct 1 1 2 1 3 0 4 0 5 0 6 0 7 1 8 1 9 1 10 1 Total 6 Frequency Count the number of times a score occurs Child Score 1 8 2 8 3 6 4 6 5 6 6 6 7 2 8 2 Frequency Raw score Frequency Child Score Score Frequency 1 8 2 2 2 8 6 4 3 6 8 2 4 6 5 6 6 6 7 2 8 2 Frequency=how many children got this score Frequency graph Score on the test is the horizontal axis (X-axis) Frequency is on the vertical axis (Y-axis) Percentile Grade Frequency 100 90 80 70 60 50 Total N 2 5 10 8 4 1 30 cumulative frequency 30 28 23 13 5 1 percentile 100% 93% 77% 43% 17% 3% The cumulative frequency - how many scores are below a particular point in the distribution Percentile = 100(Cumulative Frequency/Total N) Frequency polygon (the curve) Frequency distribution N of student 12 10 8 6 4 2 0 50 60 70 80 90 100 Grade The frequency polygon (the curve) is a picture of the data Types of distributions (Fig. 4.3 &4.4, pp. 113-116) Peak Tails A bell shaped curve - a symmetric distribution, a unimodal distribution (one midpoint, one peak), normal distribution Pointy distribution (Leptokutic) Flat distribution (Platykutic) In skewed distribution the tail is skewed in one direction: Positively skewed distribution - most scores are low, the tail is directed towards the high (positive) scores which skewed the distribution Negatively skewed distribution - most scores are high, the tail is directed towards the low (negative) scores which skewed the distribution Bimodal distribution - a double peaked curve Descriptive Statistics - Some definitions Min (the lowest score) and Max (the highest score) Range – the range of observed values. Range = Max-Min But the range changes with the extreme scores (unstable but useful informal measure). Mode - most frequently obtained score Mean (average) – average of a set of numbers Median – the middle score of a group (when odd) or the average of the two middle scores (when even) In a bell curve (normal) distribution mode, mean and median will be the same Mode Grade Frequency 50 1 60 4 70 8 80 10 90 5 100 2 total 30 Which grade is most frequent? Highest in “frequency” column Mean (average) Grade Frequency 50 1 60 4 70 8 80 10 90 5 100 2 total 30 Compute a sum of all grades Divide by number of grades Mean (average) Grade x times 50x1 50 60x4 240 70x8 560 80x10 800 90x5 450 100x2 200 total 2300 mean 2300/30 76.66 Median Grade Frequency 50 1 60 4 70 8 80 10 90 5 100 2 total 30 Order all grades in a row according to value The grade in “the middle” of the row is the median Median Grade Frequency 50 1 60 4 70 8 80 10 90 5 100 2 total 30 We have a row of 30 grades: 50,60,60,60,60,70… Half of 30 is 15 The grade in the 15th position is the median Median Grade Frequency 50 1 60 4 70 8 80 10 90 5 100 2 total 30 Slight complication: we have 15 grades on both sides of the median Compute mean of the grades in the 15th and 16th positions Questions: Are both curves the same? How? Are they different? How? We need to measure the accuracy of the mean. (Figure from Hatch & Farhady 1982, p.56) Variability Coming attractions How to draw valid statistical inferences? We have to look at the relation between our sample and the population Today we looked at where the ‘center’ of the data is – what is the big picture Look at variance, how the data is distributed Deviation The distance between a score and the Mean (see Table 4.2, p. 125), how much a score deviates from the average Sum of squared errors (SS) Variance Average error in the sample, average error in the population Variance in the sample = SS/N 33.7143/7=4.8163 Variance in the population = SS/(N-1) 33.7143/6=5.6191 Why N-1? Degree of freedom (read box 4.5, page 129) Standard deviation (SD) The average distance between a score and the Mean (square root of the Variance) SD= √5.6191 = 2.37 What can SD tell us about the distribution (pointy distribution vs. flat distribution)? Standard Error (SE) How well does the sample represent the population? Different samples of the population might yield different means. The SE is the average of the SDs of the means of several samples. Large value - big difference, small value- small difference. SE = SD/√ N Confidence Interval The limits within which 95% or 99% of the samples fall Lower boundary = Mean-2SE Upper boundary = Mean+2SE Inferential statistics z-score and T-score How can we use the standard deviation (SD) to compare two samples? two exams? two tests? We translate the raw scores into distance in SD from the mean, by subtracting the mean from the raw score and dividing by the SD. So for Table 4.2: 1-3.57 8-3.57 --------- = -1.08 --------- = 1.86 2.37 2.37 These scores are z-scores. Some zscores are negative and some are positive. Why? So for Table 4.2: 1-3.57 8-3.57 --------- = -1.08 --------- = 1.86 2.37 2.37 These scores are z-scores. Some zscores are negative and some are positive. Why? If you prefer a scale with only positive numbers, you can use the T-score T score = 10 * z-score +50 10 * -1.08 +50 = 39.2 10*1.86+50 = 68.6 A few words on Covariance and Pearson correlation Covariance - how much two variables co-vary? Cov = (X - X) (Y- Y) But we are interested in sets of scores so we need to sum up all the individual covariance and divide, as always by N-1. Σ (X-X)(Y-Y) COVxy= ---------------------N-1 What do we need covariance for? To measure correlations (Pearson correlation coefficient is considered the best way to estimate correlation between X & Y). Since the two samples do not have the same SD, we must adjust the covariance to the amount of variation COVx y r= -------------SDx * SDy What does r mean? Positive r - positive correlation Negative r - negative correlation Small r - small correlation Big r - big correlation inferential statistics.xls Effect size We can use correlations to measure experimental effect size r2 - the coefficient of determination - is the fraction of the variance that is accounted for by a linear correlation. r=0.1 (small effect) - only 1% of the variance is accounted for by our task (1%=.01=r2) r=0.3 (medium effect) - 9% of variance is accounted for by our task (9%=.09=r2) r=0.5 (large effect) - 25% of variance is accounted for by our task (25%=0.25=r2) r = 1 A perfect effect Probability How probable it is to get a certain correlation? How probable is it to get a certain score? How probable is it to get a certain mean? How probable is it that two samples are the same/different? Playing "Head or Throwing a dice. tails?" Probability can be calculated by dividing the number of desired events by the number of possible outcomes. Or by relaying on SD What is the probability of getting a score above the mean? What is the probability of getting a score which is up to 1SD above the mean? up to 1SD from the mean? (For every zscore there is a probability) Confidence Interval The limits within which 95% of the samples fall Lower boundary = Mean-2SE Upper boundary = Mean+2SE Hypothesis testing How likely is it (how probable is it) that our hypothesis is right? The probability that some results could happen by chance is less than 5% (or 1%) p<0.05 (or p<0.01) - the level of significance Null hypothesis - there is no difference between our sample and the population Positive hypothesis - the sample does better than the population. Negative hypothesis - the sample worse better than the population Alternative hypothesis - the sample is different but there is no direction. p<0.05 (Figures from Hatch & Farhady 1982, p.87) p>0.05 If the data falls in the shaded area of 8.5 - the null hypothesis is confirmed If the data falls in the shaded area of 8.6 - the null hypothesis is rejected If the data falls in the shaded higher tail of 8.6 the scores are higher than the population and the null hypothesis is rejected If the data falls in the shaded negative tail of 8.6 - the scores are lower than the population and the null hypothesis is rejected Since there is no direction specified by the null hypothesis, we must consider both tails - thus we use a two tailed test (with .025 in each tail). If we test a directional hypothesis, the level of significance applies to one tail only. (Figures from Hatch & Farhady 1982, p.88) A score in the shaded area in 8.7 confirms the _____________ hypothesis A score in the shaded area in 8.8 confirms the _____________ hypothesis