Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Summary from last week Descriptive Statistics Exercises Descriptive data analysis in SPSS Mouse experiment continued Exercise tomorrow Tuesday 13-15, room 4a58 (same as normal) 4 AIMS OF SCIENCE: Reliability: Results can be replicated by others Validity: Results show what we intend them to show Generalizability: Results have a wider application than merely the participants and the circumstances of the test Importance: Results should be important (subjective). Results are never important if not reliable, valid and generalizable Experiments are a useful tool for establishing cause and effect - but other methods (e.g. observation) are also important in science. A good experimental design ensures that the only variable that varies is the independent variable chosen by the experimenter - the effects of alternative confounding variables are eliminated (or at least rendered unsystematic by randomisation). Disadvantages of the experimental method: Intrusive - participants know they are being observed, and this may affect their behaviour. Experimenter effects Not all phenomena are amenable to experimentation, for practical or ethical reasons (e.g. post-traumatic stress disorder, near-death experiences, effects of physical and social deprivation, etc.) Some phenomena (e.g. personality, age or sex differences) can only be investigated by methods which are, strictly speaking, quasiexperimental. Good experimental designs maximise validity Internal validity: Extent to which we can be sure that changes in the dependent variable are due to changes in the independent variable [meteor kills dinosaurs]. External validity (ecological validity, generalizability): Extent to which we can generalise from our participants to other groups (e.g. to real-life situations). Research methods: Observational methods No manipulation of variables Quasi-experimental methods When we cannot do a real experiment True experimental methods Manipulation of IVs, objective measurement of effect of manipulation TRUE EXPERIMENTAL DESIGNS 2 types: Between-groups versus within-subjects designs Between-groups (independent measures) Each subject participates in only one condition of the study. Within-subjects (repeated measures) Each subject does all of the conditions in a study. Mixed designs Mixture of both approaches MULTI FACTORIAL DESIGNS If two or more Independent Variables [factor = IV] Advantage: Can observe how IVs interact E.g. Meteors and Mad Sharks Disadvantage: For between-groups = lots of participants needed 4+ IVs = hugely complicated statistics Rather run several experiments ... Whenever possible, use true experimental designs Get at least one score per participant Get ratio data Use repeated-measures design whenever possible – need fewer participants than between-groups designs Include an extra independent variable if possible – more data Don´t get too ambitious! Populations and samples Frequenzy distributions Mode, Median, Mean Standard Deviation Confidence intervals Descriptive statistics are used to describe datasets They form the first analyses that we do when working with an unexplored dataset We are interested in answering questions about populations A population is a collection of people, and can be general to very specific Everyone on the planet Everyone with dark hair Everyone living in Copenhagen aged 21, playing the cello It is not practical to collect data from everyone in our target population So we sample it Sample Population Samples are used to make a guess about what results we would get, if we used the entire population The smaller the sample, the higher the chance of variation in their behavior compared to the population One of the first operations we perform having obtained new data from a sample of people, is to summarize them This is done to figure out the general patterns within the data Two choices: Calculate a summary statistic, which tells us something about the scores collected Draw a graph – for the same purpose The simplest graph suammrizes how many times each score collected occurs: A frequency distribution (or histogram) Frequency of errors made 9 Histogram shows that most people had 6+ errors 8 7 Frequency of errors 6 5 4 3 2 1 0 1 2 3 4 5 6 Number of errors made 7 8 9 10 In this example we have the following scores: Number of We can now calculate the frequency of each score e.g. 8 of 40 participants had 8 errors errors 1 2 3 4 5 6 7 8 9 10 Frequency 2 4 2 2 2 4 4 8 6 6 Types of distributions: Frequency distributions come in different shapes and sizes We need to be able to describe them In an ideal world, all scores would be distributed symmetrically around the centre of all scores. This is called the normal distribution It is characterized by a symmetrical, bellshaped curve The majority of the scores lie around the center of the distribution The further away we get from the centre, the frequency of the score occuring decreases At the far ends, the odds of a score occuring is very small indeed Two main deviations from the normal distribution: Skewed distributions These are not symmertrical, and have the most frequent scores clustered towards one end Positive skew Negative skew Distributions also vary in their pointy-ness This is called kurtosis – it reflects how scores cluster towards either the tails of the distribution, or towards the center Apart from drawing graphs, we can calculate summary statistics Frequency distributions indicate that the center of the scores is important We want a single value to sum up our data (to roughly tell us what the result was of our experiment) The Range: The difference between the highest and lowest scores. (i.e. range = highest - lowest). Advantages: Quick and easy to calculate, easy to understand. Disadvantages: Unduly influenced by extreme scores: 3, 4, 4, 5, 100. Range = (100-3) = 97. 3, 4, 4, 5, 5. Range = (5-3) = 2. Conveys no information about the spread of scores between the highest and lowest scores. e.g. 2, 2, 2, 2, 2, 20 and 2, 20, 20, 20, 20, 20 have exactly the same range (18) but very different distributions. The Mode: The most frequent score in a set of scores. 6, 11, 22, 22, 96, 98. Mode = 22 Advantages of the mode: (i) Simple to calculate, easy to understand. (ii) The only average which can be used with nominal data. Disadvantages of the mode: (i) May be unrepresentative and hence misleading. e.g.: 3, 4, 4, 5, 6, 7, 8, 8, 96, 96, 96. Mode is 96 - but most of the scores are low numbers. (ii) May be more than one mode in a set of scores. e.g.: 3, 3, 3, 4, 4, 4, 5, 7, 9 has two modes! Bimodal or multimodal distributions The Median: When scores are arranged in order of size, the median is either (a) the middle score (if there is an odd number of scores) 4, 5 ,6 ,7, 8, 8, 96. Median = 7. or (b) the average of the middle two scores (if there is an even number of scores). 4, 5, 6, 7, 8, 8, 96, 96. Median = (7+8)/2 = 7.5. Advantages of the median: (i) Resistant to the distorting effects of extreme high or low scores. Disadvantages of the median: (i) Ignores scores' numerical values, which is wasteful if data are interval or ratio. (ii) More susceptible to sampling fluctuations than the mean. (iii) Less mathematically useful than the mean. Quartiles The three values that split the sorted data into four equal parts. Second Quartile = median. Lower quartile = median of lower half of the data Upper quartile = median of upper half of the data The Mean: Add all the scores together and divide by the total number of scores. e.g. (3+4+4+5+6) / 5 = 22 / 5 = 4.4 X X N Advantages of the mean: (i) Uses information from every single score. (ii) Resistant to sampling fluctuation - i.e., varies the least from sample to sample. (Important since we normally want to extrapolate from samples to populations). Disadvantages of the mean: (i) Susceptible to distortion from extreme scores. e.g.: 4, 5, 5, 6 : mean = 5. 4, 5, 5, 106: mean = 30. (ii) Can only be used with interval or ratio data, not with ordinal or nominal data. The mean is a model of what happens in the real world: the typical score It is not a perfect representation of the data How can we assess how well the mean represents reality? Slide 36 How do we know if the mean is a good description of our dataset? Example: 10, 10, 10, 0.1, 0.1, 0.1 – mean = 5.05 This is not very descriptive of the frequency distribution! Problem: The mean can be influenced by extreme scores To evaluate the mean, we need to see how it relates to the actually recorded scores, i.e. how scores deviate from the mean 6 5 4 3 2 1 0 0 1 2 3 4 5 6 The deviation from the scores to the mean allows us to estimate the accuracy of the mean as a representation of the scores There are several ways of doing this. Sum of squared errors (SS): All differences between mean and score, squared A good mean produces a low SS. The problem is that the more scores we have, the larger SS becomes! We divide by number of samples: = variance (s2) We can use the variance to compare the accuracy of the mean across samples with different numbers of observations Problem: Variance is in ”units squared” To get back to the unit of our original score, we take the square root of the variance : = standard deviation (s) Standard deviation shows the accuracy of the mean Sum of squared errors, variance and standard deviation all measure the same thing: The accuracy of the mean The scores are proportionate – a large SS will result in large s2 which will result in a large s. The mean is most accurate when the scores are similar, less accurate if the scores are very dissimilar). Complications in using the mean and SD: We usually obtain the mean and SD from a sample – not the parent population. Sometimes we are content to describe our sample per se, but sometimes we want to extrapolate to the population from our sample. Population Sample A sample mean is a good estimate of the population mean. A sample SD tends to underestimate the population SD Therefore, when using the sample SD as a description of the sample, divide by n (number of scores). When using the sample SD as an estimate of the population SD, divide by n-1 (to make the SD larger than it would otherwise have been). sample SD as description of a sample: sample SD as an estimate of the population SD: X X 2 s n sample mean as description of a sample: X x n population SD if you measure every member of the population: X X 2 s n 1 X X 2 N sample mean as an population mean estimate of (“mu”): population mean: X x n X N The variance and SD tell us something about the frequency distribution Mean is center of the distribution; the smaller the SD, the closer scores to the center: Imagine we collect 1.000.000 samples of data about how many meteors it takes to kill a Trex, calculating the mean for each From the means and SD´s, we can calculate the boundaries within which those samples lie – e.g. 2 to 25 We can now say that we are reasonably sure that any other sample will have a mean between 2 and 25 Often we want to describe how ”sure” we are – often we want to be 95% sure Say that 95% of our samples fall between 3 and 24. [3-24] is known as a confidence interval Calculating the confidence interval Lower boundary = mean-2*SE Upper boundary = mean+2*SE Mean is always at the centre of the confidence interval The more accurate the mean, the smaller the confidence interval Example Mean meteor count from our 1 million samples: 10 Standard error = 2.5 95% confidence interval: Lower boundary: = 10-(2*2.5) = 5 Upper boundary: = 10+(2*2.5) = 15 So 95% of all sample means should lie between 5-15 meteors We can now describe the sample, but: How well does our sample represent the population? HEIGHT OF ALL ADULT WOMEN IN ENGLAND high µ = 63 in. σ = 2 in. frequency of raw scores σ low 59 61 63 65 Height (inches) 67 Sample of 100 adult women from England X 64.2 high s = 2.5 inches N = 100 frequency of raw scores s low 64.2 in If we take repeated samples, each sample has a mean height, a standard deviation (s), and a shape/distribution. s1 s2 X2 s3 X3 Samples . . . . . . Due to random fluctuations, each sample is different - from other samples and from the parent population. These differences are predictable - we can use samples to make inferences about their parent populations. X1 X 30 X 25 X 33 X 30 X 29 Often we have more than one sample of a population This permits the calculation different sample means, whose value will vary, giving us a sampling distribution Sampling distribution = 10 Mean = 10 SD = 1.22 4 3 M = 10 M=9 M = 11 M=9 2 1 M = 10 M=8 Frequency M = 12 0 6 M = 10 M = 11 7 8 9 10 11 Sample Mean 12 13 14 The sampling distribution informs about the behavior of samples from the population We can calculate SD for the sampling distribution This is called the Standard Error of the Mean (SE) SE shows how much variation there is within a set of sample means Therefore also how likely a specific sample mean is to be erroneous, as an estimate of the true population mean means of different samples actual population mean SE = SD of the sample means We can estimate SE via one sample x n Estimate SE = SD of the sample divided with the square root of the sample size (n) If the SE is small, our obtained sample mean is more likely to be similar to the true population mean than if the SE is large x n Increasing n reduces the size of the SE A sample mean based on 100 scores is probably closer to the population mean than a sample mean based on 10 scores (!) Variation between samples decreases as sample size increases – because extreme scores become less important to the mean 2 2 X 0.20 100 10 Suppose the n = 16 instead of 100 2 2 X 0.50 16 4 The distribution of sample means is normally distributed ... No matter what the shape of the original distribution of raw scores in the population. This is due to the Central Limit Theorem This holds true only for sample sizes of 30 and greater Means: odds of sample means being similar is very high Example: Annual income of American citizens. This distribution is positively skewed. Many people in the lower and medium income bracket; very few are ultra rich. Suppose we take many samples of size N = 50. The sampling distribution of the mean will be normal. Given the distribution is normal, we can do interesting things This is because the normal distribution is symmetrical For example, various proportions of scores fall within certain limits of the mean 68% fall within the range of the mean +/- 1 standard deviation 95% within +/- 2 standard deviations Etc. - more on this next week! Z-scores Standardising a score with respect to the other scores in the group. Expresses a score in terms of how many standard deviations it is away from the mean. The distribution of z-scores has a mean of 0 and SD = 1. Score XX z s Sample mean SD Going beyond the data: Z-scores Using z-scores, we can represent a given score in terms of how different it is from the mean of the group of scores. SD = 2 μ = 63 Xi = 64 How to calculate z-score: zX Xi 64 63 1 0.50 - SD from the mean 2 2 We can do the same thing to calculate the relationship of a sample mean to the population mean: μ = 63 64 X (1) we obtain a particular sample mean; (2) we can represent this in terms of how different it is from the mean of its parent population. zX X 64 63 1 2.00 2 2 4 N 16 If we obtain a sample mean that is much higher or lower than the population mean, there are two possible reasons: (1) Our sample mean is a rare "fluke" (a quirk of sampling variation); (2) Our sample has not come from the population we thought it did, but from some other, different, population. The greater the difference between the sample and population means, the more plausible (2) becomes Example: The human population I.Q. is 100. A random sample of people has a mean I.Q. of 170. high frequency of sample means low population mean I.Q. (100) sample mean I.Q. (170) There are two explanations: (1) the sample is a fluke: By chance our random sample contained a large number of highly intelligent people. (2) the sample does not come from the population we thought they did: Our sample was actually from a different population e.g., aliens masquerading as humans. This logic can be extended to the difference between two samples from the same population: We compare two groups of people: An experimental group and a control group. Experimental group get a "wolfman" drug. Control group get a harmless placebo. Dependent Variable: Number of dog-biscuits consumed. At the start of the experiment, they are two samples from the same population ("humans"). At the end of the experiment, are they: (a) still two samples from the same population? (i.e., still two samples of "humans" - our experimental treatment has left them unchanged) OR: (b) now samples from two different populations one from the "population of humans" and one from the "population of wolfmen"? We can decide between these alternatives as follows: The differences between any two sample means from the same population are normally distributed, around a mean difference of zero. Most differences will be relatively small, since the Central Limit Theorem tells us that most samples will have similar means to the population mean (similar means to each other). If we obtain a very large difference between our sample means, it could have occurred by chance, but this is very unlikely - it is more likely that the two samples come from different populations.