Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sampling Sampling Distributions • Sample is subset of population used to infer something about the population. • Probability – know the likelihood of selection • Nonprobability – likelihood unknown Random sampling: • each member of population has an equal and independent chance of being selected. • Equal – no bias of one person chosen rather than another • Independent – choice of one person does not influence choice of next • RANDOM and HAPHAZARD are NOT the same thing. • True random sampling is a very systematic structured selection Simple random • • • • Define population List members of population Assign numbers to each member Random selection (eg random number table) • If you have the whole population this works Systematic sampling • Select every kth value but have random start point • Population list in random order • Does not have equal chance of selection Stratified random selection • If some characteristic of the population needs to considered – eg gender, religion • Need a profile of the population • Know proportions in each category and select sample to match BUT must use random selection Cluster sampling • Units of individual selection at random • Eg dorm, clinic, school • Not independent • Bias possible Nonprobability sampling • Convenience – very common • Quota – selects profile but not random selection – first 10 sign up… Other samples • Matched – precision match (eg twins) • Range – categorize then assign • Cohort samples – common in development studies Type of sampling When it should be used Probability sampling Advantages Disadvantages Population’s members are similar to each other Systematic Population’s sampling members are similar to each other Stratified random Heterogeneous sampling population – several groups Ensures a good representation Time consuming and tedious Ensures a good representation, no random number table Ensures a good representation of all strata in population Easy and convenient Less random Simple random sampling Cluster sampling Population consists of units rather than individuals Time consuming and tedious Possibility that members of units are different from one another – decreasing sampling effectiveness Nonprobability sampling Convenience sampling Quota sampling Sample is captive Easy and inexpensive Strata present Some and stratified but representation of sampling not all strata in possible population Questionable representation Questionable representation Two factors count • Random selection • Size of sample Landon vs FDR (1936) Digest Predict election Gallup Predict Digest Gallup predict election Result 43% 44% 56% 62% FDR 10 million surveys (2.4 m) 3000 50,000 • When a selection procedure is biased taking a large sample does not help. • It just repeats the same mistake over and over. Sampling Distribution of Means The distribution of sample means is the collection of all the possible random samples of a particular size (n) that can be obtained from a population. • in probability terms we have all possible outcomes and can determine the probability of any one outcome • the sample means clump around the population mean (as you would expect if the samples are representing the population) Central limit theorem states: • For any population with mean μ (mu) and standard deviation σ (sigma), the distribution of the sample means for a sample size n will approach a normal distribution with a mean μ and standard deviation of σ/√n (standard error) as n approaches infinity. What does it mean? • for any population the distribution of sample means will approach normal ( the original population does not need to be normal) • the distribution of sample means rapidly approaches n>30 gives a good approximation weblink Standard Error • The difference between one sample mean and the population mean. • σ/√n • What influences standard error? • Population standard deviation – the closer your sample is clustered around the mean the closer it will be to estimating the population mean. • Sample size – generally the larger the sample the more representative. Histogram consistent Tues Stroop Mean = 19.6 4 3.5 3 2.5 frequency 2 1.5 1 0.5 0 15 20 Time (seconds) 25 30 N=2 10 samples Mean =19.06 One sample Mean =16.7 1 2 0.9 1.8 0.8 1.6 0.7 1.4 0.6 1.2 0.5 1 0.4 0.8 0.3 0.6 0.2 0.4 0.1 0.2 0 10 12 14 16 18 20 0 14 22 16 20 18 16 14 100 samples Mean =19.97 12 10 8 6 4 2 0 14 16 18 20 22 24 26 28 18 20 22 24 26 N=4 One sample Mean =22.8 10 samples Mean =19.4 1 0.9 0.8 2 0.7 1.8 0.6 1.6 0.5 1.4 0.4 1.2 0.3 1 0.2 0.8 0.1 0.6 0 16 18 20 22 24 26 28 0.4 0.2 0 17.5 20 18 100 samples Mean =19.56 16 14 12 10 8 6 4 2 0 16 17 18 19 20 21 22 23 24 25 18 18.5 19 19.5 20 20.5 21 21.5 22