Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lab 2. Normal probability plots and scatterplots www.nmt.edu/~olegm/283labs/Lab2stat.pdf Note: the menus and other things you will read or type on the computer are in italics. Attach the printouts whenever needed. In this Lab, we will discuss the statistical methods based on scatterplots: normal probability plot and the scatterplot for exploring the relationship between two variables. 1 Normal probability plot Normal probability plot (n.p.p.) helps check if a distribution is close to normal, i.e. has a particular bell shape. (This is similar to the Normal quantile plot discussed in the book.) Any deviation from this shape will reflect on n.p.p. as a departure from the straight line. We can also check the shape informally, i.e. looking at the histogram. However, some shapes (like heavy-tailed distributions below) are hard to spot this way. To make a normal probability plot, use Graph → Probability plot → Simple; in the Variables window, select the variable you need. In the problems below, we’ll examine some data and see how particular features of a distribution reflect in the n.p.p. Problem 1 Make a normal probability plot for mounting holes problem (corresponds to 1.145 in the textbook), see holes.txt. The variable given is the distance between the holes in thousandths of an inch. The normality here would imply, for example, that it’s equally likely for the distance to be too short or too long, and very large errors are rare. (a) What is the most notable deviation from normality in these data? (b) How would you propose to fix it? Problem 2 In cases (a)-(c) describe the departures from normality and how they reflect in the n.p.p. (a) heavy-tailed distribution/ outliers on both ends: Open the data set Internet.txt (monthly fees for Internet access in 2000). Make a histogram and a normal probability plot. Describe the behavior that you see. 1 (b) Uniform distribution. Generate 100 “random numbers” from uniform distribution on the interval [0,1]: Calc → Random Data → Uniform; Generate 100 rows of data; store in columns C2. Make a histogram and a normal probability plot. Describe. (c) A skewed distribution. Open the data set Guinea.txt (survival times for Guinea pigs). Make a histogram and a normal probability plot. Describe. 2 Scatterplots and correlation Scatterplots describe relationships between pairs of numerical variables. Data in the file wine.txt describe the relationship between wine consumption and heart disease death rates (deaths per 100,000 people) for 19 developed nations. To make a scatterplot, use Graph→ Scatterplot→ Simple; select wine consumption into X and heart disease into Y cells. Problem 3 Answer the questions below (a) Are there any outliers? (b) Clusters of countries? (c) Is there a linear pattern? (d) How strong is the relationship? (e) Italy’s wine consumption is 7.9 (liters of alcohol from wine, per person per year). What is its heart disease rate? (f) Compute the correlation coefficient using Stat → Basic Statistics→ Correlation and bringing both variables into Variables box. 2 (g) Does it appear that drinking more wine would reduce a person’s risk of heart disease? 1 2.1 Effect of Linear transformations We will investigate the effect that linear transformations have on the correlation. Open the data set Sevilleta.txt. It contains average daily temperatures (in Celsius) at Sevilleta National Wildlife Refuge for the months of September and October, 2002. Changing the unit of measurement For example, to change temperatures from Celsius to Fahrenheit, we need to use the formula ◦ F = ◦ C ∗ 1.8 + 32 We will create a new variable called temp F. Go to Calc→ Calculator; and type the arithmetic expression into the Expression window. Problem 4 (a) make a scatterplot of temp_C versus temp_F. (Do not print.) What is the value of correlation coefficient? Explain why it is the way it is. (b) Make a scatterplot of Y= temp_F versus X = Day. What kind of association do you observe? Describe in words what happens. (c) Compute correlations of both temp_C and temp_F with Day. What did you observe? 1 About cause and effect, read the discussion at http://www.nmt.edu/~olegm/283labs/SciAmWine.pdf 3 Problem 5: Exploration The file DJIret.txt contains the values of returns on Dow Jones Industrials index (DJI), where Return = 100% × N ew value − Old value Old value (a) Make a histogram and a Normal Probability plot of the data. Would you describe this distribution as heavy-tailed? symmetric? skewed?2 Is the nonnormality easy to spot using the histogram alone? (b) Does the today’s return affect the tomorrow’s return? Make a scatterplot of the return series with itself, only shifted by one day. [To obtain the shifted series, you can simply copy and paste the numbers into a new column.] Is it possible to predict tomorrow’s return based on today’s return? 2 Heavy-tailed distribution is generally cited as an extra source of risk when buying stocks. This means that large gains or large losses are more frequent when dealing with the individual stocks, or, in this case, with a stock index. For example, returns of ±2σ occur more frequently than 5% of the time promised by the 95% rule for Normal distribution. 4