Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Foundations of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
History of statistics wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Gibbs sampling wikipedia , lookup
Student's t-test wikipedia , lookup
IS 310 Business Statistics CSU Long Beach IS 310 – Business Statistics Slide 1 Sampling and Sampling Distributions In many instances, one cannot study an entire population. Main reasons are cost, time and effort involved in studying the entire population. Often, it is not even necessary to study each and every element of the population. Consider a manufacturing assembly line that produces thousands or millions of items of a product. To determine the quality of this product, is it necessary to inspect each item of the product? The answer is obviously no. In such a case, one selects a subset of the population, called a sample, and inspects each item in the sample. Based on the findings from the sample, one makes conclusion about the entire population. For example, if one finds 3 percent of the items in the sample as defective, the conclusion is made that 3 percent of the items in the population is defective. IS 310 – Business Statistics Slide 2 Sampling and Sampling Distribution Consider another example. Goodyear tire manufacturer wants to know the mean (or average) life of its new brand of tires. One way is testing and wearing out each tire manufactured. Obviously, this does not make sense. Goodyear takes a sample of tires, tests and wears out each of these tires and then calculates the mean (or average) life of the sampled tires. Suppose, the mean life is calculated as 42,000 miles. Based on this sample, it is concluded that the mean life all new brand of tires (that is population) is 42,000 miles. IS 310 – Business Statistics Slide 3 Sampling and Sampling Distribution In the previous two examples, we dealt with Mean (or average) and Proportion IS 310 – Business Statistics Slide 4 How to Select a Sample There are several methods to select a sample from a population. One of the most common sampling methods is Simple Random Sampling. This sampling is accomplished in many ways: using a random number table or putting all names in a hat and pulling a name from the hat until sample size is reached. Refer to Table 7.1 (10-Page 261; 11-Page 269). This is a Random Number Table. IS 310 – Business Statistics Slide 5 Simple Random Sampling: Finite Population Finite populations are often defined by lists such as: • Organization membership roster • Credit card account numbers • Inventory product numbers A simple random sample of size n from a finite population of size N is a sample selected such that each possible sample of size n has the same probability of being selected. IS 310 – Business Statistics Slide 6 Simple Random Sampling: Finite Population Replacing each sampled element before selecting subsequent elements is called sampling with replacement. Sampling without replacement is the procedure used most often. In large sampling projects, computer-generated random numbers are often used to automate the sample selection process. IS 310 – Business Statistics Slide 7 Sample and Point Estimation Now that we know how to select a sample, let’s use the sample to estimate population characteristics (mean, and proportion). Using sample data to estimate a population mean or proportion is known as Point Estimation IS 310 – Business Statistics Slide 8 Point Estimation In point estimation we use the data from the sample to compute a value of a sample statistic that serves as an estimate of a population parameter. We refer to x as the point estimator of the population mean . s is the point estimator of the population standard deviation . p is the point estimator of the population proportion p. IS 310 – Business Statistics Slide 9 Point Estimation Example Problem Refer to Table 7.2 (10-Page 265; 11-Page 274). Using the sample data of this table, we can calculate the point estimates for population mean, population standard deviation and population proportion. _ _ x = 51,814 s = 3,348 p = 0.63 IS 310 – Business Statistics Slide 10 Sampling Distributions If we take several samples and calculate the point estimates, these estimates will be different. Each sample will provide a different value for: _ _ x s and p Refer to Table 7.4 (10-Page 268; 11-Page 277). Since these values are different, they are random variables. They have means or expected values, standard deviations and probability distributions. IS 310 – Business Statistics Slide 11 Sampling Distributions _ If we consider the case of mean ( x ), the probability _ _ distribution of x is called Sampling Distribution of x. IS 310 – Business Statistics Slide 12 _ Sampling Distribution of x _ Now that we know that x have different values, what _ are the Expected Value of x and its standard deviation? _ E( x ) = µ Formula 7.1 (10-Page 270; 11-Page 279) σ =σ/√n _ x Formula 7.2 (10-Page 271; 11-Page 280) This is called standard error of the mean IS 310 – Business Statistics Slide 13 Central Limit Theorem Central Limit Theorem is a very important concept in statistics. If we select random samples of size n from a population, the sampling distribution of the sample _ mean (x) can be approximated by a normal distribution as the sample size becomes large IS 310 – Business Statistics Slide 14 Use of Central Limit Theorem Problem #26 (10-Page 279; 11-Page 288) Given: µ = $939 σ = 245 n = 30 (1st case) = / √n = 245/√30 = 44.71 x _ a. P( 914 < x < 964) Convert 914 and 964 to z-values z = (914 – 939)/44.71 = - 0.56 z = (964 – 939)/44.71 = 0.56 _ P( 914 < x < 964) = P( -0.56 < z < 0.56) = 0.4246 b. The probability value increases with a larger sample size. IS 310 – Business Statistics Slide 15 Differences Between Chapter 6 and Chapter 7 Chapter 6: P(100 < x < 200) Use the following formula: z = (x - µ)/σ Chapter 7: _ P( 100 < x < 200) Use the following formula: _ z = [(x - µ)/(σ/√n)] IS 310 – Business Statistics Slide 16 Differences Between Chapter 6 and Chapter 7 Sample Problem: The regular savings accounts of a large bank have a mean balance of $750 (µ = 750) and a standard deviation of $120 (σ = 120). A sample of 36 accounts is selected. Find the following: a. Probability of any single account balance being between $720 and $780. b. Probability of the mean of a sample of 36 accounts being between $720 and $780. IS 310 – Business Statistics Slide 17 Differences Between Chapter 6 and Chapter 7 In first part of the problem, we deal with Chapter 6 P( 720 < x < 780) = ? z = (720-750)/120 z = (780-750)/120 = - 0.25 = 0.25 P(-0.25 < z < 0.25) = 0.1974 In the second part of the problem, we deal with Chapter 7 _ P(720 < x < 780) = ? z = (720-750)/(120/√36) z = (780-750)/(120/√36) = -1.5 = 1.5 P(-1.5 < z < 1.5) = 0.8664 IS 310 – Business Statistics Slide 18 Relationship Between Sample Size and Sampling Distribution If we look at Formula 7.2 (Page 280), we know that the standard error of the mean will be lower if we increase the size of the sample. Lower the standard error of the mean, the better is the estimate of the population mean. Using the example of EAI managers, let’s use a sample size of 100 rather than 30. The standard error of the mean is reduced to 400 from 730.3. IS 310 – Business Statistics Slide 19 Sample Problem Problem # 15 (10-Page 266-267; 11-Page 276) a. Point estimate of the mean cost per treatment with Herceptin _ x = (4376+4798+5578+6446+2717+4119+4920+4237+4495+3814) / 10 = 4550 b. Point estimate of the standard deviation of the cost per treatment with Herceptin Cost Per Sample Mean Deviation Squared Deviation Treatment from Mean from Mean 4376 4550 - 174 30,276 4798 4550 248 61,504 5578 4550 1028 1,056,784 6446 4550 1896 3,594,816 2 2717 4550 - 1833 3,359,889 S = 9,068,620 / (10-1) 4119 4550 - 431 185,761 = 1,007,624.44 4920 4550 370 136,900 S = 1003.805 4237 4550 - 313 97,969 4495 4550 - 55 3,025 3814 4550 - 736 541,696 IS 310 – Business Statistics Slide 20 More Sample Problem Problem # 16 (10-Page 267; 11-Page 276) Given: n = 50 a. Estimate of the proportion of Fortune 500 companies based in NY = 5/50 = 0.1 or 10 percent c. Estimate of the proportion of Fortune 500 companies not based in NY, CA, MN or WI = 36/50 = 0.72 or 72 percent IS 310 – Business Statistics Slide 21 Other Sampling Methods Stratified Random Sampling Population is divided into groups, called strata. Samples are selected from each strata. Useful in applications where populations are diverse. Examples are household incomes. Cluster Sampling If population is spread over a large geographical area, cluster sampling is ideal. Think about universities in the US. If we want to select a sample from all universities, Cluster Sampling can be employed. IS 310 – Business Statistics Slide 22 Other Sampling Methods Systematic Sampling If we select every nth element from a population, we are using Systematic Sampling. Useful in assembly line where every 10th or 15th element can be chosen to make a sample. Convenience Sampling When we select a sample mainly for convenience reasons, we are using Convenience Sampling. Think about a professor who chooses students in a study to form a sample. Judgment Sampling When an expert selects a sample using his judgment, this is known as Judgment Sampling. IS 310 – Business Statistics Slide 23 End of Chapter 7, Part A IS 310 – Business Statistics Slide 24