Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
5-1 Sampling Theory Chapter 5 Theory & Problems of Probability & Statistics Murray R. Spiegel 5-2 Outline Chapter 5 Population X mean and variance - µ, 2 Sample mean and variance X, ^s2 Sample Statistics X mean and variance x , x 2 2 , ^s mean and variance sˆ sˆ 2 2 5-3 Outline Chapter 5 Distributions Population Samples Statistics Mean Proportions Differences and Sums Variances Ratios of Variances 5-4 Outline Chapter 5 Other ways to organize samples Frequency Distributions Relative Frequency Distributions Computation Statistics for Grouped Data mean variance standard deviation 5-5 Population Parameters A population - random variable X probability distribution (function) f(x) probability function - discrete variable f(x) density function - continuous variable f(x) function of several parameters, i.e.: mean: , variance: 2 want to know parameters for each f(x) 5-6 Example of a Population 5 project engineers in department total experience of (X) 2, 3, 6, 8, 11 years company performing statistical report employees expertise based on experience survey must include: average experience variance standard deviation 5-7 Mean of Population average experience mean: 2 3 6 8 11 30 6 years 5 5 5-8 Variance of Population (x i ) 2 2 variance: n 2 2 2 2 2 ( 2 6 ) ( 3 6 ) ( 6 6 ) ( 8 6 ) ( 11 6 ) 2 5 16 9 0 4 25 10.8 5 2 5-9 Standard Deviation of Population standard deviation: s.d . 2 s.d . 10.8 10.8 3.29 5-10 Sample Statistics What if don’t have whole population Take random samples from population estimate population parameters make inferences lets see how How much experience in company hire for feasibility study performance study 5-11 Sampling Example manager assigns engineers at random each time chooses first engineer she sees same engineer could do both lets say she picks (2,2) mean of sample X= (2+2)/2 = 2 you want to make inferences about true µ 5-12 Samples of 2 replacement she will go to project department twice pick engineer randomly potentially 25 possible teams 25 samples of size two 5 * 5 = 25 order matters (6, 11) is different from (11, 6) 5-13 Population of Samples All possible combinations are: (2,2) (2,3) (2,6) (2,8) (2,11) (3,2) (3,3) (3,6) (3,8) (3,11) (6,2) (6,3) (6,6) (6,8) (6,11) (8,2) (8,3) (8,6) (8,8) (8,11) (11,2) (11,3) (11,6) (11,8) (11,11) 5-14 Population of Averages Average experience or sample means are: Xi (2) (2.5) (3) (5) (6.5) (2.5) (3) (4.5) (5.5) (7) (3) (4.5) (6) (7) (8.5) (5) (5.5) (7) (8) (9.5) (6.5) (7) (8.5) (9.5) (11) 5-15 Mean of Population Means And mean of sampling distribution of means is : (2) (2.5) (3) (5) ... (11) 150 X 6 25 25 This confirms theorem that states: E( X ) X 6 5-16 Variance of Sample Means variance of sampling distribution of means (Xi -X)2 (2-6)2 (2.5-6)2 (3-6)2 (2.5-6)2 (3-6)2 (5-6)2 (6.5-6)2 (4.5-6)2 (5.5-6)2 (7-6)2 (3-6 ) (4.5-6)2 (6-6)2 (7-6)2 (8.5-6)2 (5-6 )2 (5.5-6)2 (7-6)2 (8-6)2 (9.5-6)2 (6.5-6 )2 (7-6)2 (8.5-6)2 (9.5-6 )2 (11-6)2 5-17 Variance of Sample Means Calculating values: 16 12.25 9 1 0.25 12.25 9 2.25 0.25 1 9 2.25 0 1 6.25 1 0.25 1 4 12.25 0.25 1 6.25 12.25 25 5-18 Variance of Sample Means variance is: 2 ( X X ) 135 2 i X 5.4 n 25 Therefore standard deviation is X 5.4 2.32 5-19 Variance of Sample Means These results hold for theorem: n 2 2 X Where n is size of samples. Then we see that: 10.8 5.40 n 2 2 2 X 5-20 Math Proof X mean X = X1 + X2 + X3 + . . . Xn n E(X) = E(X1) + E(X2)+ E(X3) + . . . E(Xn) n E(X) = + + + . . . n E(X) = 5-21 Math Proof X variance X = X1 + X2 + X3 + . . . Xn n Var(X) = 2x = 2x + 2x + 2x + . . . 2x n2 = 5-22 Sampling Means No Replacement manager picks two engineers at same time order doesn't matter order (6, 11) is same as order (11, 6) 10 choose 2 5!/(2!)(5-2)! = 10 10 possible teams, or 10 samples of size two. 5-23 Sampling Means No Replacement All possible combinations are: (2,3) (2,6) (2,8) (2,11) (3,8) (3,11) (6,8) (6,11) (3,6) (8,11) corresponding sample means are: (2.5) (3) (5) (6.5) (4.5) (5.5) (7) (7) (8.5) (9.5) mean of corresponding sample of means is: 2.5 3 5 ... 9.5 X 6 10 5-24 Sampling Variance No Replacement variance of sampling distribution of means is: ( X i X ) ( 2.5 6) (4 6) ... (9.5 6) 4.05 2 2 X 2 2 n 2 10 standard deviation is: (Xi X) X 4.05 2.01 2 2 X n 5-25 Theorems on Sampling Distributions with No Replacements 1. 2. X 6 2 N n 10.8 5 2 10.8 3 2 X 4.05 n N 1 2 5 1 2 4 5-26 Sum Up Theorems on Sampling Distributions Theorem I: Expected values sample mean = population mean E(X ) = x = : mean of population Theorem II: infinite population or sampling with replacement variance of sample is E[(X- )2] = x2 = 2/n 2: variance of population 5-27 Theorems on Sampling Distributions Theorem III: population size is N sampling with no replacement sample size is n then sample variance is: 2 N n 2 x n N 1 5-28 Theorems on Sampling Distributions Theorem IV: population normally distributed mean , variance 2 then sample mean normally distributed mean , variance 2/n X Z N(0,1) n 5-29 Theorems on Sampling Distributions Theorem V: samples are taken from distribution mean , variance 2 (not necessarily normal distributed) standardized variables X Z n asymptotically normal 5-30 Sampling Distribution of Proportions Population properties: * Infinite * Binomially Distributed ( p “success”; q=1-p “fail”) Consider all possible samples of size n statistic for each sample = proportion P of success 5-31 Sampling Distribution of Proportions Sampling distribution of proportions of: mean: P p std. deviation: P pq n p(1 p) n 5-32 Sampling Distribution of Proportions large values of n (n>30) sample distribution for P approximates normal distribution finite population sample without replacing standardized P is Pp Z pq n 5-33 Example Proportions Oil service company explores for oil according to geological department 37% chances of finding oil drill 150 wells P(0.4<P<0.6)=? 5-34 Example Proportions P(0.4<P<0.6)=? Pp Z pq n P(0.4-0.37 < P-.37 < 0.6-0.37) =? (.37*.63/150).5 (pq/n).5 (.37*.63/150).5 5-35 Example Proportions P(0.4<P<0.6)=P(0.24<Z<1.84) =normsdist(1.84)-normsdist(0.24)= 0.372 Think about mean, variance and distribution of np the number of successes 5-36 Sampling Distribution of Sums & Differences Suppose we have two populations. Population XA XB Sample of size nA nB Compute statistic SA SB Samples are independent Sampling distribution for SA and SB gives mean: SA SB variance: SA2 SB2 5-37 Sampling Distribution of Sums and Differences combination of 2 samples from 2 populations sampling distribution of differences S = SA +/- SB For new sampling distribution we have: mean: S = SA +/- SB variance: S2 = SA2 + SB2 5-38 Sampling Distribution of Sums and Differences two populations XA and XB SA= XA and SB = XB sample means mean: variance: XA+XB = XA + XB = A + B 2 XA X B nA nB Sampling from infinite population Sampling with replacement 2 A 2 B 5-39 Example Sampling Distribution of Sums You are leasing oil fields from two companies for two years lease expires at end of each year randomly assigned a new lease for next year Company A - two oil fields production XA: 300, 700 million barrels Company B two oil fields production XB: 500, 1100 million barrels 5-40 Population Means •Average oil field size of company A: XA 300 700 500 2 •Average oil field size of company B: XB 500 1100 800 2 XA XB 500 800 1300 5-41 Population Variances Company A - two oil fields production XA: 300, 700 million barrels Company B two oil fields production XB: 500, 1100 million barrels XA2 = (300 – 500)2 + (700 – 500)2/2 = 40,000 XB2 = (500 – 800)2 + (1100 – 800)2/2 = 90,000 5-42 Example Sampling Distribution of Sums Interested in total production: XA + XB Compute all possible leases assignments Two choices XA, Two choices XB XAi XBi {300, 500} {300, 1100} {700, 500} {700, 1100} 5-43 Example Sampling Distribution of Sums XAi XBi {300, 500} {300, 1100} {700, 500} {700, 1100} Then for each of the 4 possibilities – 4 choices year 1, four choices year 2 = 4*4 samples 5-44 Example Sampling Distribution of Sums Samples Year 1 Year 2 Year 1 Year 2 Year 1 Year 2 Year 1 Year 2 XAi 300 300 300 300 300 700 300 700 XBi 500 500 500 1100 500 500 500 1100 XAi 300 300 300 300 300 700 300 700 XBi 1100 500 1100 1100 1100 500 1100 1100 5-45 Example Sampling Distribution of Sums Samples Year 1 Year 2 Year 1 Year 2 Year 1 Year 2 Year 1 Year 2 XAi 700 300 700 300 700 700 700 700 XBi 500 500 500 1100 500 500 500 1100 XAi 700 300 700 300 700 700 700 700 XBi 1100 500 1100 1100 1100 500 1100 1100 5-46 Compute Sum and Means of each sample Means Year 1 Year 2 Year 1 Year 2 Year 1 Year 2 Year 1 Year 2 XAi+XBi Mean 800 800 800 800 1100 1400 800 1000 1200 800 1300 1800 XAi+XBi Mean 1400 1100 800 1400 1400 1400 1400 1300 1200 1400 1600 1800 5-47 Compute Sum and Means of each Sample Means Year 1 Year 2 Year 1 Year 2 Year 1 Year 2 Year 1 Year 2 XAi+XBi Mean 1200 1000 800 1200 1300 1400 1200 1200 1200 1200 1500 1800 XAi+XBi Mean 1800 1300 800 1800 1600 1400 1800 1500 1200 1800 1800 1800 5-48 Mean of Sum of Sample Means Population of Samples {800, 1100, 1000, 1300, 1100, 1400, 1300, 1600, 1000, 1300, 1200, 1500, 1300, 1600, 1500, 1800} _______ XAi+XBi = (800 + 1100 + 1000 + 1300 + 1100 + 1400 + 1300 + 1600 + 1000 + 1300 + 1200 + 1500 + 1300 + 1600 + 1500 + 1800) 16 = 1300 5-49 Mean of Sum of Sample Means This illustrates theorem on means _____ (XA+XB)= 1300= XA+ XB = 500 + 800 = 1300 _____ What about variances of XA+XB 5-50 Variance of Sum of Means Population of samples {800, 1100, 1000, 1300, 1100, 1400, 1300, 1600, 1000, 1300, 1200, 1500, 1300, 1600, 1500, 1800} 2 = {(800 - 1300)2 + (1100 - 1300)2 + (1000 - 1300)2 + (1300 - 1300)2 + (1100 - 1300)2 + (1400 - 1300)2 + (13001300)2 + (1600 - 1300)2 + (1000 - 1300)2 + (1300 - 1300)2 + (1200 - 1300)2 + (1500 - 1300)2 + (1300 - 1300)2 + (1600 1300)2 + (1500 - 1300)2 + (1800 - 1300)2}/16 = 65,000 5-51 Variance of Sum of Means This illustrates theorem on variances 2 XA X B nA nB 2 A 2 B 40000 90,000 65,000 2 2 5-52 Normalize to Make Inferences on Means XA XB A B A B na nB 2 2 5-53 Estimators for Variance Two choices 2 2 2 ( X X ) ( X X ) ... ( X X ) 2 n S2 1 n use for populations ( X 1 X ) ( X 2 X ) ... ( X n X ) Ŝ n 1 2 2 2 2 2 ˆ E (S ) unbiased better for smaller samples 2 5-54 Sampling Distribution of Variances All possible random samples of size n each sample has a variance all possible variances give sampling distribution of variances sampling distribution of related random variable nS 2 ( n 1)Ŝ 2 ( X 1 X ) 2 ( X 2 X ) 2 ... ( X n X ) 2 2 2 2 5-55 Example Population of Samples All possible teams are: (2,2) (2,3) (2,6) (2,8) (2,11) (3,2) (3,3) (3,6) (3,8) (3,11) (6,2) (6,3) (6,6) (6,8) (6,11) (8,2) (8,3) (8,6) (8,8) (8,11) (11,2) (11,3) (11,6) (11,8) (11,11) 5-56 Compute Variance for Each Sample sample variance corresponding to each of 25 possible choice that manager makes are: ^s2 0 0.25 4 9 20.25 .25 0 2.25 6.25 16 4 2.25 0 1 6.25 9 6.25 1 0 2.25 6.25 2.25 0 20.25 16 (2 6.5) 2 (11 6.5) 2 20.25 2 5-57 Sampling Distribution of Variance Population of Variances mean variance distribution (n-1)s2/2 2n-1 5-58 What if Unknown Population Variance? X is Normal (, 2) to make inference on means we normalize X Z n 5-59 Unknown Population Variance ( n 1) Ŝ 2 2 2 X n ( n 1) Ŝ 2 2 X t n 1 Ŝ n 5-60 Unknown Population Variance X P ( t n 1 , c 1 t n 1 , c 2 ) Ŝ n Use in the same way as for normal except use different Tables α = 0.05 n = 25, =tinv(0.05,24)= 2.0639 -2.06 2.06 X P( 2.0639 2.0639) 1 0.05 Ŝ n 5-61 Uses t -statistics Will use for testing means, sums, and differences of means small samples when variable is normal substitute sample variance in for true X X Z t n1 ŝ n n 5-62 Uses t -statistics sums and differences of means X 1 X 2 ( 1 2 ) N( 0,1) 2 2 1 2 n1 n 2 unknown variance X 1 X 2 ( 1 2 ) t n n 2 2 2 n1 - 1ŝ1 n 2 - 1ŝ2 n1 n 2 n1 n 2 - 2 n1 n 2 1 2 5-63 Uses 2 statistic Inference on Variance Large sample test ( n 1) Ŝ 2 2 2 5-64 F Statistic Inferences 22/12 2df1/df1 = 2df2/df2 ŝ Fdf 1,df 2 ŝ 2 1 2 2 2 2 2 1 ( n1 1)ŝ12 2 1 ( n1 1) ( n 2 1)ŝ 22 2 ( n 2 1) 2 F Statistic 5-65 Other tests groups of coefficients 5-66 Other Statistics . Medians med 1.2533 2n n n > 30, sample distribution of medians nearly normal if X is normal med . 5-67 Frequency Distributions If sample or population is large difficult to compute statistics (i.e. mean, variance, etc) Organizing RAW DATA is useful arrange into CLASSES or categories determine number in each class Class Frequency or Frequency Distribution 5-68 Frequency Distributions - Example Example of Frequency Distribution: middle size oil company portfolio of 100 small oil reservoirs reserves vary from 89 to 300 million barrels 5-69 Frequency Distributions - Example arrange data into categories create table showing ranges of reservoirs sizes number of reservoirs in each range Reserves 50-100 101-150 151-200 201-250 251-300 TOTAL Number of Fields 4 21 42 27 6 100 5-70 Frequency Distributions - Example Class intervals are in ranges of 50 million barrels Each class interval represented by median value e.g. 200 up to 250 will be represented by 225 Can plot data histogram polygon This plot is represents frequency distribution 5-71 Frequency Distributions Plotted Example 50-100 101-150 151-200 201-250 251-300 TOTAL 45 40 35 No. of Fields Reserves Number of Fields 4 21 42 27 6 100 30 25 20 15 10 5 0 25 75 125 175 Reserves (mmb) 225 275 325 5-72 Relative Frequency Distributions and Ogives number of individuals - frequency distribution - empirical probability distribution percentage of individual - relative frequency distribution empirical cumulative probability distribution - ogive 5-73 Percent Ogives Cumulative Frequency Distribution OGIVE for oil company portfolio of reservoirs 100 90 80 70 60 50 40 30 20 10 0 25 75 125 175 225 275 Reserves (mmb) Shows percent reservoirs < than x reserves 325 5-74 Computation of Statistics for Grouped Data can calculate mean and variance from grouped data outcome frquency X1 X2 X3 … Xk TOTAL f1 f2 f3 … fk n 5-75 Computation of Statistics for Grouped Data take 420 samples of an ore body measure % concentration of Zinc (Zn) frequency distribution of lab results 5-76 Computation of Statistics for Grouped Data % Weight 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50 Frequency 2 5 11 21 33 41 53 42 38 31 34 % Weight 1.55 1.60 1.65 1.70 1.75 1.80 1.85 1.90 1.95 2.00 Frequency 28 14 22 18 15 4 2 2 3 1 TOTAL 420 5-77 Computation of Statistics for Grouped Data mean will then be: fx x f1x1 f2x 2 ... fk x k n n n fi f1 f2 ... fk i i And in our example: fx x i i n 1.00 * 2 1.05 * 5 ... 1.45 * 31 ... 2.00 * 1 1.40 420 5-78 Computation of Statistics for Grouped Data variance will then be: fi ( x i x ) f1 ( x1 x ) f2 ( x 2 x ) ... fk ( x k x ) S 2 2 2 2 n n 2 5-79 Computation of Statistics for Grouped Data And in our example: 2 f ( x x ) S2 i i n 2 2 2 2 ( 1 . 00 1 . 40 ) 5 ( 1 . 05 1 . 40 ) .... 1 ( 2 . 00 1 . 40 ) 2 S 420 2 S 0.0365 5-80 Computation of Statistics for Grouped Data Similar formula are available for higher moments: mr fi ( x i x ) n r f1 ( x1 x ) f2 ( x 2 x ) ... fk ( x k x ) n r fx m i i r n r r f x f2x 2 ... fk x k n r 1 1 r r r 5-81 Sum up Chapter 5 Population X mean and variance - µ, σ2 distribution A Sample statistic from sample usually mean and variance X, ^s2 5-82 Sum up Chapter 5 Sample Statistics X mean and variance x, x 2 ^s2 mean and variance ^s2, ^s 2 Distribution 5-83 Sum Up Chapter 5 Samples Statistics Mean X ~ µ, σ2/n Distribution X X Z t n1 ŝ n n 5-84 Sum Up Chapter 5 Samples Statistics Proportions P ~ p, p(1-p)/n n>30 Distribution Pp Z pq n 5-85 Sum Up Chapter 5 Samples Statistics Differences and Sums X1+/- X2 ~ 1 + 2, 12/n1 + 22/n2 Distribution X 1 X 2 ( 1 2 ) N( 0,1) 2 2 1 2 n1 n 2 X 1 X 2 ( 1 2 ) t n n 2 2 2 n1 - 1ŝ1 n 2 - 1ŝ2 n1 n 2 n1 n 2 - 2 n1 n 2 1 2 5-86 Sum Up Chapter 5 Samples Statistics Variances Distribution n2 1 ( n 1) Ŝ 2 2 Mean = n-1 Variance = 2(n-1) 5-87 Sum Up Chapter 5 Samples Statistics Ratios of Variances ŝ Fdf 1,df 2 ŝ 2 1 2 2 2 2 2 1 5-88 Sum up Chapter 5 Other ways to organize samples Frequency Distributions Relative Frequency Distributions Computation Statistics for Grouped Data mean variance standard deviation 5-89 THAT’S ALL FOR CHAPTER 5 THANK YOU!!