* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Probability Essentials Chapter 3
Survey
Document related concepts
Transcript
Random Sampling and Sampling Distributions Chapter 6 “He stuck in his thumb, Pulled out a plum and said ‘what a good boy am I!’” old nursery rhyme MGMT 242 Topics and Goals for Chapter 6 • Random Sampling • Sample Statistics and Relation to Population Parameters • Sampling Distribution for Sample Mean-”The Central Limit Theorem” • Checking Normality-The Normal Probability Plot – – – – samples from normal distributions positively skewed distributions negatively skewed distributions distributions with outliers MGMT 242 Populations and Samples • A population is a large collection (theoretically, for the mathematician, infinite) of the individuals or items of interest (e.g. consuming public, machine line production items, etc.) • To measure characteristics of the population we have to take a sample (smaller number). • If we take a random sample, it is equally likely that any member of the population will be included in the sample. MGMT 242 Random Sampling • Sample represents population only if each member of population equally likely to be included in sample. • Types of random sampling (see also Chapter 16): – Simple Random Sampling (SRS)-sample whole population – Stratified Random Sampling divide population into groups and sample from each group; for example, in polls, divided country into four geographical regions and sample from each – Cluster Sampling Divide population into groups and take a sample of a few groups from the total--e.g., looking at hospital performance, sample patients in few hospitals randomly chosen from all hospitals in the state. MGMT 242 Sample Statistics • Sample Mean: xbar = (1/N) xi , where “xbar” is x with a bar over it; the sum is taken over all values of the random variable X measured in the sample of N units. xbar is an estimator of the population mean, . • Sample Standard Deviation: s = {[1/ (N-1)] (xi- xbar)2 }(1/2 s is an “unbiased estimate” of the population standard deviation, . Note that for large samples (large N), N-1 N MGMT 242 Sampling Distribution for Sample Means: The Central Limit Theorem--1 • In general (which means almost always), no matter what distribution the population follows, the distribution of the sample means follows a normal distribution with • mean µsample means (for the population of sample means) equal to µ, the mean for the parent population, and • standard deviation of the means sample means= /N. This means that the larger the sample size, the more accurately we estimate the mean. MGMT 242 Sampling Distribution for Sample Means: The Central Limit Theorem-2 • The histogram on the left is for a sample from a uniform distribution (0 to 100). The sample mean is 50.2 and the sample standard deviation is 29.3 (100/12) 35.000 30.000 20.000 15.000 10.000 5.000 96.57 83.25 69.93 56.61 43.29 29.97 16.65 0.000 3.34 Frequency 25.000 MGMT 242 Sampling Distribution for Sample Means: The Central Limit Theorem-2 35.000 25.000 20.000 15.000 10.000 5.000 96.57 83.25 69.93 56.61 43.29 29.97 16.65 0.000 3.34 Fre que ncy 30.000 • The histogram on the left is for the means of 150 samples, each size 9 (N = 9). The average of these 150 means is 49.4 and the standard deviation of these 150 sample means is 9.8 which is about (100/[129]), the population standard deviation of the mean. MGMT 242 Normal Probability Plots (“P-plots”) • The procedure to get this plot, which tests whether data follow a normal distribution procedure, is the following: – 1) order the N data; – 2) assign a rank from 1--the lowest--to N--the highest value; – 3) find the centile score of the mth data point from the relation centile score = m/(N+1)--e.g the 1st data point out of 100 has a fraction approximately 1/101 lower; the 100th data point has a fraction 100/101 lower; – 4) find the z-value (standard normal variate) corresponding to the centile score (this would be the z-score or N-score). – 5) plot the observed points versus the z-score; • If the points fall approximately on a straight line, the distribution is a normal distribution. MGMT 242 Normal Probability Plots (“P-plots”) Examples Exam 2 scores were negatively skewed (range: 49-100, Q1=90, median=92, Q3= 94 rank ordered value z-score Exam 2 1 0.02 2 0.04 3 0.05 4 0.07 5 0.09 6 0.11 7 0.13 8 0.14 9 0.16 etc. …. 49 72 78 79 81 85 86 87 89 -2.10 -1.80 -1.61 -1.47 -1.35 -1.24 -1.15 -1.07 -0.99 Exam 2 120 100 80 60 40 20 0 -3.00 -2.00 -1.00 MGMT 242 0.00 1.00 2.00 3.00 Normal Probability Plots (“P-plots”) Examples (cont.) Normal Probability Plot 3.00 2.00 1.00 Nscore This Pplot for Exam 2 scores is from the “Statplus” addin; note that the axes are interchanged from the previous (conventional) order: Nscore is y-axis, actual score is x-axis 0.00 -2.00 rank ordered value z-score Exam 2 -3.00 1 0.02 2 0.04 3 0.05 4 0.07 5 0.09 etc. …. 49 72 78 79 81 -4.00 -2.10 -1.80 -1.61 -1.47 -1.35 7 -1.00 3 4 8 333 8 5 -5.00 49 59 69 79 Exam 2 MGMT 242 89 99 Qualitative Appearance of P-plots Distribution Type Conventional Appearance StatPlus Appearance Normal Distribution Approximately straight line Approximately straight line Positively Skewed (Like Ex. 5.7a, lab) Bends up at high end Bends down at high end Negatively Skewed (Like Exam 2 scores) Bends down at high end Bends up at high end Outliers Bends up at low end, (Like rectangular dist.) bends down at high end (S-shaped) MGMT 242 Bends down at low end, bends up at high end