Download Fathom and Curve of best fit for Normal Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Fathom and Curve of best fit for Normal Data
Investigating the Age of the Wealthy.
1. Use data from DASL by visiting: http://lib.stat.cmu.edu/DASL/ Select Data Subjects >> Economics >>
Next 10 >>Billionaires 92 Datafile (if instructions do not work, then use the following link:
http://lib.stat.cmu.edu/DASL/Datafiles/Billionaires92.html .)
2. Import Data into Fathom (Method 2). Right click on URL and copy. In Fathom, from File Menu, select
Import from URL. Right click on address box and select paste. Select OK. A case box should appear with
Gold Balls in it. Select table of values and check to see if data was imported. If not, use Method 1 from the
cancer activity.
3. Graph wealth depending on age. To make an accurate scatter plot, there must only be numbers in the table of
values. If you scroll down to case 98, there is an asterisk or a blank in the cell. This is another type of data
clean up that you will have to check for. Delete all cases that have at least one asterisk or blank and delete
the graph too.
I. Analysing One-Variable Data
4. Select graph and place on desktop. Drag age and put it on the x-axis. Notice the shape of the dot plot. This
looks to be a normal distribution. To verify, lets calculate the mean and median of the data. If they are equal,
then it is a normal distribution.
5. Right mouse click on graph. Select Plot Value. Type on the screen mean(age). Press OK. Repeat this process to
calculate the median. Notice they are approximately the same and they are in the middle of the dot plot.
6. Change this graph to a histogram by clicking on Dot Plot and selecting Histogram. It is easy to see the mode
if you double click on the x-scale and change binWidth to 1. The mode is 68. The age interval is output at
screen bottom. This age data is not exactly normal. To return to a computer centered graph, select Rescale
Graph Axes from Graph Menu.
II. To obtain a visual confirmation that the data follows a Normal Distribution look at a Normal Quartile Plot
7. A normal Quantile Plot shows the distribution of continuous numeric data. It plots the z-scores (the difference
between a value and the mean divided by the standard deviation) associated with the percentile of each case. If the
data are Normal, the plot should show a straight line.
8. Change your histogram to a normal quantile plot by clicking on histogram and selecting normal quantile. Notice
that your age data are very close to the straight line shown on the plot.
Notice that if you interchange the axis, the
slope = S.D. and the vertical intercept = mean.
III. To check whether the distribution of the data has the following properties of a Normal Probability distribution
50% of the data falls on each side of the mean
About 68% of the data falls within one standard deviation of the mean
About 95% of the data falls within two standard deviations of the mean
About 99.7% of the data falls within three standard deviations of the mean
9. Convert the graph to a dot plot where each dot represents one data value. Select plot value from the Graph menu.
To compute the sample standard deviation for the data enter the formula s(age). Select OK.
10. To sort the age values: Select the Age column in the table. Right click on the table and select Sort Ascending.
11. Count the data in each interval and divide by 225 to get the percentage within each interval as follows: For
example, for the interval of data that falls within one standard deviation on either side of the mean,
50.5  age  77.56 , shift select the desired interval in the table. Click on case 35, press shift, and click on case
193. Notice that the selected entries are counted on the bottom of the screen and highlighted on the graph. Divide
this count of 159 by 225. So, about 70.67% of the data are in the interval x  1 . Notice again that this data is
not exactly normal. Repeat this process to calculate the percentage of data that are in the two larger
intervals x  2 and x  3 .
IV. The Normal Curve: Plotting a normal curve on top of a histogram
12. With a histogram showing, choose Scale->Density from the Graph menu. This will normalize the area of the
histogram to one unit squared so a normal curve will match with it. Choose Plot Function from the Graph menu.
13. In the formula editor, type: normalDensity (x, mean(age ), s(age)). Click OK.
14. Remember to save your work before attempting to print it.
Select the text tool: What bias may result from rounding off the mean and standard deviations for the
calculations in step 11? Reflect how this data could be useful: What is the data telling us?
V. Further: Using Sliders to Analyse Data
15. Create another graph on the desktop and put age on the x-axis and wealth on the y-axis.
16. Bring down 4 sliders from the toolbar (two icons left of the A). Label a, b, k, and d.
17. Right click on the graph. Select Plot Function. Click on + sign beside Function. Click on + sign beside
distribution. Click on + sign beside Normal. Double click on normal Density (the description of this tool is at
the bottom of the Expression for function screen).
18. Type in the letters a, k, b, and d as shown on the screen capture below. Select OK.
19. Using the sliders, try to fit a curve of best fit. Drag the slider on the scale to change the value of a, b, k, and
d. Remember to save your work before attempting to print it.
Select the text tool: Reflect how this data could be useful: What is the data telling us?
Create a question that could be investigated by analysing the data.
Note 1 : The current list for the wealthiest 100 folks: http://www.forbes.com/wealth/billionaires/list .
Select Continue to site.