Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ST 432 HW1 SP2017 2.3 The sampling design depends on a careful definition of the population of interest. As it would be almost impossible to get a listing of all cars owned by residents of a city, a better option would be to restrict the population of cars to something like "cars that use city parking lots on a working day" or "cars that belong to people visiting the malls on a weekend." Then, a listing of parking lots or sections of parking lots could serve as frames for collections of cars. 2.23 (a) One rating point represents one percent of the viewing households, or 95.1 million x 0.01 = 951,000 households based on the fact that the sampled population is households. (b) As a percentage, a share is larger than a rating because the denominator of the rating is the total number of sampled households, while the denominator of a share is the total number of sampled households that actually have a TV set turned on (viewing households). 3.4 (c) 95.1 million x 0.217 = 20.64 million households could have been viewing this show (d) Much of the data collected by Nielsen depends upon people in the sampled households either pushing a button on a People Meter or writing in a diary to record what they are watching. This is far from a fool-proof system. A sampling distribution is a distribution of all possible values of a statistic. 3.14 (a) E( x) xp( x) = 2(.443) + 3(.229) + 4(.200) + 5(.086) + 6(.028) + 7(.014) = 3.069 This approximation to the mean will be too large since the distribution of family size is right-skewed. (b) (c) (d) 2 V ( x) ( x ) 2 p( x) (2 3.069) 2 (.443) 1.458; V ( x) 1.207 The distribution of the sample data would reflect that of the population. Most of the data values would pile up around 2 and 3, with a few larger values. The distribution of the sample would be skewed toward the larger values, with a center at approximately 3.07 and a standard deviation of approximately 1.21. The sample mean x has approximately a normal distribution with mean E ( x ) 3.07 and standard deviation SD( x ) 3.15 Page 1 (a) (7 3.069) 2 (.014) n 1.21 20 0.0605 The scatter plot shows that SAT and Percent are negatively correlated, with a slightly curved pattern suggesting that the average score drops quickly as the percentages begin to increase and then levels off for higher percentages. The decreasing scores with increasing percentage taking the exam makes practical sense; in states with small percentages only the very best students are taking the ST 432 HW1 SP2017 exam. (b) The correlation coefficient is -0.877, but this is not a good measure to use here because of the curvature in the pattern. Correlation measures the strength of a linear relationship between two variables. Scatter plot between Average Score and Percent Average State SAT vs Percent HS Seniors Taking SAT 1250 A v e r a g e S A T 1200 S 1150 c 1100 o r 1050 e 1000 950 0 20 40 60 Percent Seniors Taking SAT Page 2 80 100 ST 432 HW1 SP2017 3.20 A histogram of 200 sample means from samples of size 5 each are shown in the second histogram below; the first histogram below is the histogram of the population. The histogram of the sample means is somewhat skewed because the population distribution of teachers per state is highly skewed. Even so, the mean of the sampling distribution is 58,820, quite close to the population mean of 59,856. The standard deviation of the sampling distribution is 25,643, quite close to the theoretical value of 28,465. Histogram of population: number of teachers in each of the 50 states. Results for this problem will vary from student to student. Page 3 ST 432 HW1 SP2017 Page 4