Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
HOMEWORK #3 C HAPTER 4 : D E SCRIBIN G N UMERICAL DATA CHAPTER 4 PROBLEM 54 Cars. A column in this data file gives the engine displacement in liters of 509 vehicles sold in the United States. These vehicles are 2012 models, are not hybrids, have automatic transmissions, and lack turbochargers. Another column in this data file cars gives the rated combined fuel economy (in miles per gallon) for 509 vehicles sold in the United States. CHAPTER 4 PROBLEM 54 (A) Produce a histogram of these data. Describe and interpret the histogram. • The histogram extends from 10 to 37.5 with 2.5 sized bins. The histogram peaks in the bin 20-22.5. The histogram is right skewed. CHAPTER 4 PROBLEM 54 (B) Compare the histogram to the boxplot. What does the histogram tell you that the boxplot does not, and vice versa? • The boxplot tells you the mean, median, IQR, and that there is an outlier. The histogram shows more about the shape of the distribution and where observations actually locate. CHAPTER 4 PROBLEM 54 (C) Find the mean and standard deviation of the rated mileages. How are these related to the histogram, if at all? • The mean = 20.03 and standard deviation =4.81. The mean is the middle and the SD explains the deviation from the mean. CHAPTER 4 PROBLEM 54 (D) Find the coefficient of variation and briefly interpret its value. CV=24.03 • A higher CV (100+) denotes variation. These data are not very spread out. CHAPTER 4 PROBLEM 54 (E) Identify any unusual values (outliers). Do you think that these are coding errors? • There is one outlier at 37 mpg which is the Scion iQ. This probably isn’t a blunder or rogue but truly an interesting outlier CHAPTER 4 PROBLEM 54 (F) Government standards call for cars to get 27.5 MPG. What percentage of these vehicles meet this goal? (Are all of these vehicles cars?) • We have created a variable which is one when the mileage is at least 27.5 and 0 otherwise. Then we simply need to sum the variable and divide by the sample size. This gives approximately 8%. CHAPTER 4 PROBLEM 57 Information Industry. This data table includes several characteristics of 428 companies classified as being in the information industry in 2010. One column gives the total revenue of the company, in millions of dollars. CHAPTER 4 PROBLEM 57(A) Find the median, mean, and standard deviation of the total revenue of these companies. What units do these summary statistics share? • Mean=2240.16 • Median=219.81 • SD=9622.30 • They are all $1,000s CHAPTER 4 PROBLEM 57(B) Describe the shape of the histogram and boxplot. What does the White Space Rule have to say about the histogram? • It is all white space. The data is highly concentrated on the lower end and there are some outliers that are very very high. These outliers conceal much of the data. CHAPTER 4 PROBLEM 57(C) Do the data have any extreme outliers? Identify the company if there’s an extreme outlier. • AT&T, Verizon, and Microsoft are all extreme outliers. CHAPTER 4 PROBLEM 57(D) What do these graphs of the distribution of net sales tell you about this industry? Is this industry dominated by a few companies, or is there a level playing field with many comparable rivals? • There are several dominant companies at the top and there are many less competitive companies fighting at the bottom. CHAPTER 4 PROBLEM 59 Tech Stocks. These data give the monthly returns on stocks in three technology companies: Dell, IBM, and Microsoft. For each month from January 1990 through the end of 2005 (192 months), the data give the return earned by owning a share of stock in each company. The return is the percentage change in the price, divided by 100. CHAPTER 4 PROBLEM 59(A) a. Describe and contrast histograms of the three companies. Be sure to use a common scale for the data axes of the histograms to make the comparison easier and more reliable. CHAPTER 4 PROBLEM 59(A) • The histograms, boxplots, and violin plots show that Dell has the highest median and IQR. Microsoft has the second highest median and IQR. IBM has lowest median and interquartile range. Microsoft has the most outliers. CHAPTER 4 PROBLEM 59(B) Find the mean, SD, and coefficient of variation for each set of returns. Are means and SDs useful summaries of variables such as these? • The means and standard deviations are regularly used to characterize the expected returns and risks of equity market data. Because this type of data often deviate from the assumptions of a normal distribution, we should exercise case when interpreting them. CHAPTER 4 PROBLEM 59(C) What does comparison of the coefficients of variation tell you about these three stocks? • The CVs tell us that Dell varies least, then Microsoft, and IBM varies the most. In this case, however, because the means are so close to zero, the CV’s are not good indicators of risk or scale. Coefficients of variation are valuable only when the means are not close to zero. CHAPTER 4 PROBLEM 59(D) Investors prefer stocks that grow steadily. In that case, what values are ideal for the mean and SD of the returns? For the coefficient of variation? • Investors would prefer smaller CVs that denote less variability. Investors would also like to see positively skewed data as well, which leans towards growth. In this case, because the means are so close to zero, the CV’s are not good indicators of risk or scale. CHAPTER 4 PROBLEM 59(E) It is common to find that stocks that have a high average return also tend to be more volatile, with larger swings in price. Is that true for these three stocks? • Yes. The highest means/medians have the highest SD.