Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
StatTools Assignment #1, Winter 2007 – This assignment has three parts. Before beginning this assignment, be sure to carefully read the General Instructions document that is located on the StatTools Assignments page of the course web site: http://fisher.osu.edu/departments/management-sciences/courses/BM330. Objectives: 1. Explore some of the StatTools features that you will need in the future to complete upcoming assignments. a. “Accessing” data – manual input, simulation, and file retrieval b. Creating Graphs – scatter plots, histograms c. Transforming data using the data utilities d. Generating statistics from data 2. Review basic ideas of data exploration/summarization. Before any statistical analysis is performed, the analyst should have at least a basic understanding of the nature of the data being analyzed. This includes identification of the variable(s) involved and the appropriate units of measurement for the variable(s), as well as recognition of whether the available data constitutes a population or a sample. For each variable it includes knowledge of the shape, center, and spread of the distribution of the data. Data that are reasonably symmetric can be described using the average (mean) and standard deviation. However, the mean and standard deviation do not provide a good summary of data that is skewed or has outliers. Data that is skewed or has outliers should generally be described using the five-number summary: Min, Q1, Median, Q3, Max. Linear relationship between two variables is best described by correlation. Of course, the best method for initial identification of shape or pattern is to create a graph of the data. Appropriate graphs include, but are not limited to, histograms, probability or quantile plots, box plots, and scatter plots. Read chapters 1 and 2 for a basic review of these ideas before beginning this assignment if necessary. We do not include data exploration/summarization as a self-contained topic for discussion in Business Management 330; you spent considerable time on this in Stat 133. However, the procedures and tools that we do discuss in Business Management 330 apply only to data with certain characteristics, so we will regularly require some form of data exploration to verify that our data has the required characteristics 3. Draw conclusions from the information generated and effectively communicate those conclusions in written form. Part I. Manual data entry, graphing, and transforming data using Data Utilities Requirements: 1. ACCESS DATA. Manually input the following data set into an Excel spreadsheet as two columns. Accuracy matters. The data has been displayed in row form, here, only to save space. Y X 12.4 1 14.3 1 15.2 1 23.5 2 22.3 2 24.1 2 26.1 3 25.6 3 27.5 3 27.8 4 28.9 4 29.1 4 Once you have entered this data into the Excel spreadsheet, declare it as a StatTools Data Set. 2. USE the DATA UTILITIES to TRANSFORM the data. Transform the X values into logX values and store them in C3 (StatTools automatically puts the result in the next available column). See below for directions. Label this new column as logX. You must include a printout of the spreadsheet containing the transformed data as part of your assignment. StatTools > Data Utilities > Transform C. D. Be sure to choose X as the variable to transform (see C.), and be sure that the transformation function is set to log (see D.). Then click “OK.” 3. GRAPH. Create scatter plots of Y vs. X and Y vs. logX, and add trend lines to both. To create a scatter plot (you will do two of these), StatTools > Summary Graphs > Scatter plot E. You must specify which variable will be on the vertical axis (Y) and which will be on the horizontal axis (X) (see E.). To add the trend line (again, you will do this on both graphs), select a plot by double clicking on it. The option Chart will appear on the top tool bar. Chart > Add Trend Line Then select “Linear” as the type of line to add and click “OK”. Do this for your second graph as well. This will add a regression line to the data that makes comparisons easier from graph to graph. Place these graphs close to each other for easy comparison in your assignment submission. 4. INTERPRET and COMMUNICATE CONCLUSIONS. What observations do you make when comparing the two plots? What did the transformation accomplish? Be sure your response is typed and is in sentence or paragraph form (one or two sentences should suffice). Language and grammar comprise a portion of your assignment grade. Part II. Simulating data and graphing. Requirements: 1. ACCESS DATA. a. Simulate three random samples of size 15 taken from a Normal population with the mean of 100 and standard deviation of 10. To simulate is to duplicate the features of a real phenomenon, usually in a simpler form. In this case we want to duplicate the results of random sampling without incurring the expense and time of manual sampling. The computer can quickly create many samples according to our instructions and place them in columns that we specify. We can then observe behaviors and characteristics of the samples. To simulate random numbers drawn from a Normal population, we need to use the following Excel formula: =NORMINV(RAND(), mu, sigma) where the parentheses following RAND are empty, mu is the numerical population mean, and sigma is the numerical population standard deviation. For this example, we would type the above formula into the function box of a given cell and specify the mu and sigma (these can also be cell references as well -- $A$5, for example). Once the cell has generated a randomly drawn number from a Normal population with a specified mean and standard deviation, we can use Excel’s click and drag feature to copy the formula to however many boxes we need (15 in this case). To do this, we would click on the lower right-hand corner of the cell and drag down however many rows we need. Note – the data in the spreadsheet will still be in function form, so every time we perform an operation on the data, it will re-draw random numbers and our data values will change. We can override this, however, by copying our entire data set and pasting it over itself as values: highlight the data set > Copy > Paste Special > Values. Don’t forget to declare you data once you have finished with this! Once you have generated three columns of data, label the columns appropriately (e.g., 15Sample1, 15Sample2, 15Sample3). b. Repeat the simulation process to obtain 3 random samples of size 100. 2. GRAPH. a. Create a histogram for each of the three samples of size 15. To do this, click: StatTools > Summary Graphs > Histogram b. Repeat the graphing process for the three samples of size 100. 3. INTERPRET and COMMUNICATE CONCLUSIONS. Summarize what you observed in the graphs. Compare the histograms to the parent shape (Normal) and compare the histograms from samples of size 15 to the histograms from samples of size 100. Be sure your responses are typed and in sentence or paragraph form. Language and grammar comprise a portion of your assignment grade. Note – You do not have to print out and submit your data. You are creating random data. No two individuals should have the same results. We will know if you use someone else’s results. Do, however, include both sets of histograms and the interpretation in your report. Part III. [Exercise 2.108 on page 164 of your text] Accessing data from remote file and date summarization/exploration Requirements: 1. INPUT DATA. Read the data from Table 2.13 into an Excel Spreadsheet (it is labeled as “ta02_013.dat”). The data to be analyzed can be accessed from the disc that accompanies the text book or from the publisher’s website (see “Supplements and Media for Students” on pages xxvi and xxvii of the textbook). It may be posted on the Carmen Content page, as well. 2. GRAPH and GENERATE STATISTICS. Use StatTools to generate the following information (you can ignore the questions that the book poses. You need to only answer the following): a. Draw a histogram of the selling price data and of square footage data. b. Have StatTools calculate any appropriate summary measures for both selling price and square footage. You must choose which summary measure are most appropriate to report for each variable based on what you learn from the histograms in part a, and on what you learned in Stat133. c. Compute the covariance and the coefficient of correlation to summarize the strength of linear relationship between Selling Price and Square Footage. StatTools > Summary Statistics > Correlation and Covariance > OK d. Generate a scatter plot of the square footage of a house vs. the price. 3. INTERPRET and COMMUNCATE CONCLUSIONS. Type responses to the following questions. Be sure that your responses are in sentence or paragraph form. Again, you should only need a few sentences to answer each question. Language and grammar comprise a portion of your assignment grade. a. Using the statistics and graphs generated from parts (a) and (b) only from above, what observations can we make about buying a new house in Ames, IA? (i.e. – what have we learned from this analysis?) b. Using what we calculated in parts (d) and (e) above, what have you learned about the relationship between the square footage of a home and the price of that home? c. Which information would be more useful for someone in the market for a new home in Ames, IA, the information in parts (a) and (b) or the information in parts (c) and (e)? Why? Final Requirement for the complete assignment: Compile and staple together your cover page, the typed responses to all questions, and the annotated StatTools results (be sure to include all spreadsheets, graphs, summary statistics, and typed answers). You DO NOT need to include these instructions with what you turn in, but be sure to clearly label your responses. If answers are out of order or unlabeled, you may receive no credit for the given questions. Note: Don’t forget that a detailed explanation of protocol for the StatTools assignments is posted on the course web page. Read it carefully and follow the instructions.