Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
School of Psychology Dpt. Experimental Psychology Design and Data Analysis in Psychology I English group (A) Salvador Chacón Moscoso Susana Sanduvete Chaves 1 Milagrosa Sánchez Martín Lesson 5 Sampling and sampling distribution 2 1. Introduction The statistical inference presents two categories: Estimation theory (lesson 6): Given an index in the sample, the aim is to infer the value of the index in the population. Two kinds of estimation: Punctual estimation: it provides a single value. Estimation by intervals: it facilitates a range of values. Decision theory (lesson 8): Procedure to make decisions in the field of statistical inference. 3 1. Introduction ESTIMATION THEORY STATISTICS PARAMETERS 4 2. Phases of the inferential process 1. 2. 3. 4. 5. Obtain a sample randomly. Calculate the statistics (indexes in the sample): X , S, p Construct a sampling distribution (means or proportions; the possible results that can be found taking different samples). Choose a probability model (e.g., if we throw a dice, there are six possible results, and they are equiprobable). The most used in psychology is the normal law. Calculate the corresponding parameters (indexes in the population) based on the 5 statistics. 3. Sampling error The value of the statistic will be closer to the value of the parameter depending on the degree of representativeness of the sample studied. For example, it depends on: The sample size. The similarity-difference between participants. The sampling procedure. Nevertheless, there will be always some discrepancy between statistic and parameter. This is the sampling error. Solution: The precise value of the sampling error is unknown. Using the inference, we will know with a certain confidence that this error does not exceed a limit. 6 3. Sampling error. Calculation: Sample X p S Statistics (Latin letters) Population μ π σ Parameters (Greek letters) 7 3. Sampling error. Calculation: The sampling error is the difference between a statistic and its corresponding parameter. X e p 8 3. Sampling error There are two main concepts related to the sampling error: 1. Accuracy: the precision with which a statistic represents the parameter. 1. Reliability: the measure of the constancy of a statistic when you calculate it for several samples of the same type and size. 9 3. Sampling error Accuracy: example. What estimator is more accurate? X1 = 47 = 50 X2 = 54 10 3. Sampling error X1 = 47 e1 47 50 3 X2 = 54 e2 54 50 4 e1 e2 X 1 is more accurate. 11 3. Sampling error Reliability: example. X1 76 X2 78 X3 75 X4 77 X 1 20 X 2 40 X 3 60 X 4 80 What group of means is more reliable? 12 3. Sampling error Reliability: example. The first group of means is more reliable because variation between them is lower. 13 3. Sampling error The lower the sampling error is, the more probable is that the estimator in a sample presents the same value as the parameter. 14 4. Sampling distribution Definition: it is a distribution of theoretical probability that establishes a functional relation between the possible values of a statistic, based on a sample of size n and the probability associated with each one of these values, for all the possible samples of size n, extracted from a particular population. The construction of a sampling distribution presents three phases: 15 4. Sampling distribution PHASE 1. Collect all the samples of the same size n, extracted randomly from the population under study. S1 Population S2 S3 Sk 16 4. Sampling distribution PHASE 2. Calculate the same estimator in each sample. S1 X1 S2 X2 S3 X3 Sn Xn We will find different values of the estimator (e.g., the mean) 17 in the different samples. 4. Sampling distribution PHASE 3. Group these measures in a new distribution. X1 X2 X3 Mean of means Xn 18 4. Sampling distribution In general, the sampling distribution will differ from the distribution of the population. The variance of the statistic provides a measure of dispersion of the particular sampling values with respect to the expected value of the statistic, considering all the possible samples of size n. The standard deviation of the sampling distribution is called standard error of the estimator. We are only going to study the sampling distribution of two statistics: 4.1. The mean. 4.2. The proportion. 19 4.1. Sampling distribution of the mean Mean or expected value X X Standard error X X n S n 1 20 4.1. Sampling distribution of the mean Distribution of the population Sampling distribution X X X 21 4.1. Sampling distribution of the mean. Characteristics 1. The statistics obtained in the samples are grouped around the parameter of the population. 2. The bigger n is, the closer to the parameter the statistics are. 3. In large samples, the graphic representation presents the following characteristics: 22 4.1. Sampling distribution of the mean. Characteristics a) It is symmetric. The central vertical axis is the parameter . b) The bigger n is, the narrower the Bellshaped curve is. c) It takes the form of the normal curve. 23 4.1. Sampling distribution of the mean. Characteristics 4. Its mean matches with the real mean in the population. X 5. It is more or less variable. If its change is small (i.e., has a small sigma), means differ little from each other, and it is very reliable. 24 4.1. Sampling distribution of the mean. Standardization s X X X Z S Sample X Z Population X X Z X X X X S n n 1 25 Sampling distribution 4.1. Sampling distribution of the mean. Standardization Standardization allows to calculate probabilities (if you know the probability model that has the distribution). We can consider normal distribution when n≥30. 26 4.1. Sampling distribution of the mean X Means N=∞ N≠∞ X Based on σ Based on S S n 1 n n N n N 1 N n S N (n 1) correction 27 4.1. Sampling distribution of the mean. Example 1 We applied a test to a population and we obtained a mean (μ) of 18 points and a standard deviation (σ) of 3 points. Assuming that the variable is normally distributed in the population: a) Which raw scores do delimit the central 95% of the participants of that population? b) Which raw scores do delimit the central 99% of the average scores in samples of 225 participants, obtained randomly? 28 4.1. Sampling distribution of the mean. Example 1 a) Which raw scores do delimit the central 95% of the participants of that population? 95% 0.475 0.475 Z1=-1.96 Z2=1.96 29 4.1. Sampling distribution of the mean. Example 1 30 4.1. Sampling distribution of the mean. Example 1 Xi X 1 18 Z 1.96 1.96 * 3 X 1 18 3 5.88 X 1 18 5.88 18 X 1 X 1 12.12 Xi X 2 18 Z 1.96 1.96 * 3 X 2 18 3 5.88 X 2 18 5.88 18 X 2 X 2 23.88 31 4.1. Sampling distribution of the mean. Example 1 95% X1=12.12 X2=23.88 The raw scores that delimit the central 95% of the participants are 12.12 and 23.88. 32 4.1. Sampling distribution of the mean. Example 1 b) Which raw scores do delimit the central 99% of the average scores in samples of 225 participants, obtained randomly? 99% 0.495 0.495 -2.58 2.58 33 4.1. Sampling distribution of the mean. Example 1 34 4.1. Sampling distribution of the mean. Example 1 Z Xi X X 1 18 2.58 2.58 * 0.2 X 1 18 0.2 0.516 X 1 18 0.516 18 X 1 X 1 17.484 X Z n Xi X 3 3 0.2 225 15 X 2 18 2.58 2.58 * 0.2 X 2 18 0.2 0.516 X 2 18 0.516 18 X 2 X 2 18.516 35 4.1. Sampling distribution of the mean. Example 1 99% X 1 17.484 X 2 18.516 17.484 and 18.516 delimit the central 99% of the average scores in samples of 225 participants. 36 4.1. Sampling distribution of the mean. Example 2 Calculate the probability of extracting a sample of 81 participants with mean equal or lower than 42, from a population whose mean () is 40 and standard deviation () is 9. 37 4.1. Sampling distribution of the mean. Example 2 Z Xi X X n 42 40 2 2 1 1 9 9 1 81 9 Z 2 p 0.4772 0.5 ? 40 X 42 P X 42 P Z 2 0.5 0.4772 0.9772 38 4.1. Sampling distribution of the mean. Example 3 In a sampling distribution of means with samples of 49 participants, the means of the central 90% of the samples are between 47 and 53 points. Calculate: a) The raw scores that delimit the central 95% of the means. b) The standard deviation of the population (σ). c) The raw scores that delimit the central 95% of the means, when the sample size is 81. 39 4.1. Sampling distribution of the mean. Example 3 a) The raw scores that delimit the central 95% of the means. 53 47 50 2 Xi 53 50 Z 1.64 90% X 0.45 0.45 X 1 47 X 2 53 1.64 X 3 X X 3 1.829 1.64 Z 2 1.64 40 4.1. Sampling distribution of the mean. Example 3 95% Z Xi X 0.475 0.475 X 1 50 1.96 1.829 1.96 *1.829 X 1 50 Z1=-1.96 Z2=1.96 3.585 X 1 50 3.585 50 X 1 X 1 46.415 Z Xi X X 2 50 1.96 1.96 *1.829 X 2 50 1.829 3.585 X 2 50 3.585 50 X 2 X 2 53.585 41 4.1. Sampling distribution of the mean. Example 3 b) The standard deviation of the population (σ). X 1.829 n 49 1.829 * 7 12.803 42 4.1. Sampling distribution of the mean. Example 3 c) The raw scores that delimit the central 95% of the means, when the sample size is 81. 95% -1.96 1.96 12.803 X n 81 12.803 1.423 9 43 4.1. Sampling distribution of the mean. Example 3 Z Xi X X 1 50 1.96 1.96 *1.423 X 1 50 1.423 2.789 X 1 50 2.789 50 X 1 X 1 47.211 Z Xi X X 2 50 1.96 1.96 *1.423 X 2 50 1.423 2.789 X 2 50 2.789 50 X 2 X 2 52.789 44 4.2. Sampling distribution of proportions p = x/n, being x the number of participants that presented a characteristic and n, the sample size. We can consider normal distribution when Πn ≥5 and (1- Π)n ≥5 45 4.2. Sampling distribution of proportions Mean or expected value p p Standard error p p (1 ) n p(1 p) n 46 4.2. Sampling distribution of proportions. Standardization Z pi - P pi - (1 - ) n p 47 4.2. Sampling distribution of proportions Proportions p p Based on σ Based on S N=∞ (1 ) N≠∞ (1 ) N n p (1 p) n n n N 1 p (1 p ) correction N n N (n 1) 48 4.2. Sampling distribution of proportions. Example 1 In a population, the proportion of smokers was 0.60. If we chose from this population a sample of n=200, which is the probability of finding 130 or fewer smokers in that sample? 49 4.2. Sampling distribution of proportions. Example 1 0,6 0,5 = 0.60 0.60 0,4 0,3 0,2 0,1 0 0.60 = 0.60 (1 ) 0.40 Can we consider these data from a normal distribution? n 5 200 0.60 120 n (1 ) 5 200 0.40 80 50 4.2. Sampling distribution of proportions. Example 1 130 p 0.65 200 Z p (1 ) n p p(1 p) n 0.65 0.60 0.60 * 0.40 200 143 . Z 1.43 P 0.4236 P( p 0.65) PZ 1.43 0.5 0.4236 0.9236 51 4.2. Sampling distribution of proportions. Example 2 In a election to choose president, a candidate obtained the 45% of the votes. If you would choose randomly and independently a sample of 100 voters, which is the probability of obtaining that the candidate received more than the 50% of the votes? 52 4.2. Sampling distribution of proportions. Example 2 Z p (1 ) n 0.50 - 0.45 1 0.45* 0.55 100 p (1 ) n P(P 0.50) = P(Z 1) = 0.50-0.3413 = 0.1587 53 4.2. Sampling distribution of proportions. Example 3 The 30% of the students in Seville passed a concrete test. Extracting samples of 100 students from this population, calculate: a) The values that delimit the central 99% of the proportions of these samples. b) The percentage of samples that have a proportion equal or higher than 0.35 of students that passed the test. 54 4.2. Sampling distribution of proportions. Example 3 n 5 100 0.3 30 n (1 ) 5 100 0.7 70 a) Calculate the values that delimit the central 99% of the proportions of these samples. 99% 0.495 0.495 -2.58 2.58 55 4.2. Sampling distribution of proportions. Example 3 p - p - Z P (1 - ) n P1 0.3 2.58 0.3(1 0.3) 100 p1 0.3 2.58 0.045 2.58 * 0.045 p1 0.3 0.116 p1 0.3 p1 0.116 0.3 p1 0.184 p 2 0. 3 2.58 0.3(1 0.3) 100 p2 0.3 2.58 0.045 2.58 * 0.045 p2 0.3 0.116 p2 0.3 p2 0.116 0.3 p2 0.416 56 4.2. Sampling distribution of proportions. Example 3 b) The percentage of samples that have a proportion equal or higher than 0.35 of students that passed the test. Z ? 0.3 p .35 p - P 0.35 - 0.3 0.05 Z 1.11 P 0.3665 0.045 0.045 P( p 0.35) 0.5 0.3665 0.1335 The 13.35% of samples have a proportion equal or higher than 0.35 57