Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Survey

Transcript

Statistics 103 Probability and Statistical Inference Instructions for lab 3 Lab Objective To explore data with histograms, boxplots, and summary statistics. Lab Procedures Open the file agpop from the course directory. This file is taken from the 1992 U.S. Census of Agriculture. It contains data on agricultural characteristics of all 3,078 counties in the United States. Variables include: acres92 (number of acres devoted to farming in 1992) farms92 (number of farms in 1992) largef92(the number of farms with more than 1,000 acres) smallf92 (the number of farms with fewer than 9 acres), and similar variables for the 1987 and 1982 censuses. Also included are county and state names, and a variable indicating the county's region of the country (West, Northeast, North Central, South). For more information on the Census of Agriculture, including data from the 1997 census, you can visit the web site of the National Agricultural Statistics Service. Normal Approximation 1. Construct a histogram for the number of farms in 1992 across all counties. Describe the distribution by answering the following questions. a. What is the mean and median? b. Is the distribution right-skewed or left skewed? c. What is the inter-quartile range? 2. Often we wish to have data that follow a Normal distribution. One way to handle skewed data is to transform the data. a. Create a new variable that transforms farms92 by taking the square-root. Call it SqFarms92. b. Construct a histogram for the new variable and answer the following questions. i. What is the mean, median, standard deviation for the transformed variable? ii. Is the transformed variable more symmetric? What summary statistics support this claim? c. Transform the mean and median of SqFarms92 back to the original scale by squaring the statistics. Are they similar to the mean and median calculated using the raw data farms92? Why? 3. Assume that SqFarms92 follows approximately a Normal distribution with mean and standard deviation estimated in question 2(ii). Using the Answer the following questions. You will need to use a Z-Table either from the textbook or online. a. Approximately, what proportion of counties had more than 500 farms? (Note this is on the original scale). b. Approximately, what proportion of counties had between 500 to 800 farms? c. Find an interval of the number of farms that include that middle 50% of the distribution. 4. In question 3, we used the Normal distribution to approximate the distribution of farms92 after a square-root transformation. However, in this analysis, we actually have all the data! a. Repeat question 3(a) by looking at the percentiles of farms92. For example, one way to do this is to create a new variable, farms92_greater500, that takes the value 1 if farms92 is greater than 500. Then simply calculate the proportion of farms92_greater500 equal to 1. b. How well does the Normal approximation perform? Binomial Probability 5. The counties are divided into four regions. a. How many counties are in the South? What is the proportion? 6. There are 3078 counties in total and we are interested in taking a sample of 5 counties without replacement. a. What is the probability of having at least 1 in the South? 7. Because our original sample is so large, let’s assume the probability of a county being in the South stays constant as estimated in 5(a), even though we are sampling without replacement. Using the Binomial distribution formula to answer the following questions. a. In a sample of 5 counties what is the probability of having at least 1 county in the South? b. In a sample of 10 counties, what is the probability of having 4 counties in the South? 8. Let’s draw some samples ourselves. Click on Table Subset and choose Random Sample Size and type in 10. Make sure you select All Columns and click okay. A new data table will appear. Repeat the above 10 times and record the number of counties in the South you got. a. Is it close to the estimate in 7(b)?