Download StatTools Assignment #1 - Fisher College of Business

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Forecasting wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
StatTools Assignment #1, Winter 2007 – This assignment has three parts.
Before beginning this assignment, be sure to carefully read the General Instructions
document that is located on the StatTools Assignments page of the course web site:
http://fisher.osu.edu/departments/management-sciences/courses/BM330.
Objectives:
1. Explore some of the StatTools features that you will need in the future to complete upcoming
assignments.
a. “Accessing” data – manual input, simulation, and file retrieval
b. Creating Graphs – scatter plots, histograms
c. Transforming data using the data utilities
d. Generating statistics from data
2. Review basic ideas of data exploration/summarization.
Before any statistical analysis is performed, the analyst should have at least a basic
understanding of the nature of the data being analyzed. This includes identification of the
variable(s) involved and the appropriate units of measurement for the variable(s), as well as
recognition of whether the available data constitutes a population or a sample. For each variable
it includes knowledge of the shape, center, and spread of the distribution of the data.
Data that are reasonably symmetric can be described using the average (mean) and
standard deviation. However, the mean and standard deviation do not provide a good summary
of data that is skewed or has outliers. Data that is skewed or has outliers should generally be
described using the five-number summary: Min, Q1, Median, Q3, Max. Linear relationship
between two variables is best described by correlation. Of course, the best method for initial
identification of shape or pattern is to create a graph of the data. Appropriate graphs include, but
are not limited to, histograms, probability or quantile plots, box plots, and scatter plots. Read
chapters 1 and 2 for a basic review of these ideas before beginning this assignment if necessary.
We do not include data exploration/summarization as a self-contained topic for
discussion in Business Management 330; you spent considerable time on this in Stat 133.
However, the procedures and tools that we do discuss in Business Management 330 apply only to
data with certain characteristics, so we will regularly require some form of data exploration to
verify that our data has the required characteristics
3. Draw conclusions from the information generated and effectively communicate those
conclusions in written form.
Part I.
Manual data entry, graphing, and transforming data using Data Utilities
Requirements:
1. ACCESS DATA. Manually input the following data set into an Excel spreadsheet as two
columns. Accuracy matters. The data has been displayed in row form, here, only to save space.
Y
X
12.4
1
14.3
1
15.2
1
23.5
2
22.3
2
24.1
2
26.1
3
25.6
3
27.5
3
27.8
4
28.9
4
29.1
4
Once you have entered this data into the Excel spreadsheet, declare it as a StatTools Data Set.
2. USE the DATA UTILITIES to TRANSFORM the data. Transform the X values into logX
values and store them in C3 (StatTools automatically puts the result in the next available column).
See below for directions. Label this new column as logX. You must include a printout of the
spreadsheet containing the transformed data as part of your assignment.
StatTools > Data Utilities > Transform
C.
D.
Be sure to choose X as the variable to transform (see C.), and be sure that the transformation
function is set to log (see D.). Then click “OK.”
3. GRAPH. Create scatter plots of Y vs. X and Y vs. logX, and add trend lines to both.
To create a scatter plot (you will do two of these),
StatTools > Summary Graphs > Scatter plot
E.
You must specify which variable will be on the vertical axis (Y) and which will be on the
horizontal axis (X) (see E.).
To add the trend line (again, you will do this on both graphs), select a plot by double clicking on
it. The option Chart will appear on the top tool bar.
Chart > Add Trend Line
Then select “Linear” as the type of line to add and click “OK”. Do this for your second graph as
well. This will add a regression line to the data that makes comparisons easier from graph to
graph.
Place these graphs close to each other for easy comparison in your assignment submission.
4. INTERPRET and COMMUNICATE CONCLUSIONS. What observations do you make
when comparing the two plots? What did the transformation accomplish? Be sure your response
is typed and is in sentence or paragraph form (one or two sentences should suffice). Language
and grammar comprise a portion of your assignment grade.
Part II.
Simulating data and graphing.
Requirements:
1. ACCESS DATA.
a. Simulate three random samples of size 15 taken from a Normal population with the mean
of 100 and standard deviation of 10. To simulate is to duplicate the features of a real
phenomenon, usually in a simpler form. In this case we want to duplicate the results of
random sampling without incurring the expense and time of manual sampling. The
computer can quickly create many samples according to our instructions and place them
in columns that we specify. We can then observe behaviors and characteristics of the
samples.
To simulate random numbers drawn from a Normal population, we need to use the
following Excel formula:
=NORMINV(RAND(), mu, sigma)
where the parentheses following RAND are empty, mu is the numerical population mean,
and sigma is the numerical population standard deviation. For this example, we would
type the above formula into the function box of a given cell and specify the mu and sigma
(these can also be cell references as well -- $A$5, for example). Once the cell has
generated a randomly drawn number from a Normal population with a specified mean
and standard deviation, we can use Excel’s click and drag feature to copy the formula to
however many boxes we need (15 in this case). To do this, we would click on the lower
right-hand corner of the cell and drag down however many rows we need.
Note – the data in the spreadsheet will still be in function form, so every time we perform
an operation on the data, it will re-draw random numbers and our data values will change.
We can override this, however, by copying our entire data set and pasting it over itself as
values: highlight the data set > Copy > Paste Special > Values. Don’t forget to declare
you data once you have finished with this!
Once you have generated three columns of data, label the columns appropriately (e.g.,
15Sample1, 15Sample2, 15Sample3).
b. Repeat the simulation process to obtain 3 random samples of size 100.
2. GRAPH.
a. Create a histogram for each of the three samples of size 15. To do this, click:
StatTools > Summary Graphs > Histogram
b. Repeat the graphing process for the three samples of size 100.
3. INTERPRET and COMMUNICATE CONCLUSIONS. Summarize what you observed in
the graphs. Compare the histograms to the parent shape (Normal) and compare the histograms
from samples of size 15 to the histograms from samples of size 100. Be sure your responses are
typed and in sentence or paragraph form. Language and grammar comprise a portion of your
assignment grade.
Note – You do not have to print out and submit your data. You are creating random data. No
two individuals should have the same results. We will know if you use someone else’s results.
Do, however, include both sets of histograms and the interpretation in your report.
Part III. [Exercise 2.108 on page 164 of your text]
Accessing data from remote file and date
summarization/exploration
Requirements:
1. INPUT DATA. Read the data from Table 2.13 into an Excel Spreadsheet (it is labeled as
“ta02_013.dat”). The data to be analyzed can be accessed from the disc that accompanies the text
book or from the publisher’s website (see “Supplements and Media for Students” on pages xxvi
and xxvii of the textbook). It may be posted on the Carmen Content page, as well.
2. GRAPH and GENERATE STATISTICS. Use StatTools to generate the following
information (you can ignore the questions that the book poses. You need to only answer the
following):
a. Draw a histogram of the selling price data and of square footage data.
b. Have StatTools calculate any appropriate summary measures for both selling price and
square footage. You must choose which summary measure are most appropriate to report
for each variable based on what you learn from the histograms in part a, and on what you
learned in Stat133.
c. Compute the covariance and the coefficient of correlation to summarize the strength of
linear relationship between Selling Price and Square Footage.
StatTools > Summary Statistics > Correlation and Covariance > OK
d. Generate a scatter plot of the square footage of a house vs. the price.
3. INTERPRET and COMMUNCATE CONCLUSIONS. Type responses to the following
questions. Be sure that your responses are in sentence or paragraph form. Again, you should
only need a few sentences to answer each question. Language and grammar comprise a portion
of your assignment grade.
a. Using the statistics and graphs generated from parts (a) and (b) only from above, what
observations can we make about buying a new house in Ames, IA? (i.e. – what have we
learned from this analysis?)
b. Using what we calculated in parts (d) and (e) above, what have you learned about the
relationship between the square footage of a home and the price of that home?
c. Which information would be more useful for someone in the market for a new home in
Ames, IA, the information in parts (a) and (b) or the information in parts (c) and (e)?
Why?
Final Requirement for the complete assignment: Compile and staple together
your cover page, the typed responses to all questions, and the annotated StatTools results (be sure
to include all spreadsheets, graphs, summary statistics, and typed answers). You DO NOT need
to include these instructions with what you turn in, but be sure to clearly label your responses. If
answers are out of order or unlabeled, you may receive no credit for the given questions.
Note: Don’t forget that a detailed explanation of protocol for the StatTools assignments is posted
on the course web page. Read it carefully and follow the instructions.