Download MATH 140 Lab 3: Normality, z−scores and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
MATH 140 Lab 3: Normality, z−scores and Correlation.
Problem 1. SAT scores again
Obtain the SAT data-file from the course web page. This data file is available at:
http://www.csub.edu/∼sbehseta/lab140fall04.htm You need to double-click on the link: GA.sav.
(a) Create a histogram for ”Verbal” scores. To generate histograms, in your SPSS Graphs
menu, select Histogram. Next, select ”Verbal” and place it in the Variable: box. Hit
”OK”. Interpret the resulting histogram.
(b) Repeat the same procedure this time for ”Math” scores.
(c) Repeat the same procedure for ”GPA”.
(d) Obtain boxplots for the three variables.
(e) Calculate the five-number summary (min, max, median, Q1 and Q3 ) for the three variables.
(f) It seems that the distribution of the MATH scores is the closest to normality. Let’s assume
that the Math scores follow a normal distribution with the mean and standard deviation of
the second column’s mean and standard deviation. Given this assumption and Relying on
the 68 − 95 − 99.7 rule answer the following questions:
(1) Approximately, what proportion of Math scores is below 588?
(2) Approximately, what proportion of Math scores is above 720?
(3) Approximately, what proportion of Math scores is between 588 and 720?
(4) Approximately, what proportion of Math scores is below 522?
(5) Approximately, what proportion of Math scores is above 786?
(6) Approximately, what proportion of Math scores is between 522 and 786?
1
(g) For the following parts, you need to calculate the appropriate z-values using table A:
(1) Approximately, what proportion of Math scores is below 567?
(2) Approximately, what proportion of Math scores is above 750?
(3) Approximately, what proportion of Math scores is between 567 and 750?
(4) Approximately, what proportion of Math scores is below 540?
(5) Approximately, what proportion of Math scores is above 766?
(6) Approximately, what proportion of Math scores is between 500 and 700?
(h) Now let’s try to calculate the z−values using SPSS. First, open a new SPSS data editor. To
do this we need to go to: File → New → Data. In the first row of the first column of
the data editor type: 1. This is just a formality for the software to recognize our new data
environment.
Go to: Transform → Compute. In the Compute variable box, scroll down over the
Functions: and choose on CDF.N ORM AL(q, mean, stddev). The function should appear
in your Numeric Expression box with the difference that it comes with three question
marks. Click on the first ? and type −1.96. Also, make sure you delete ?. In the second slot,
type: 0. In the third slot, type: 1. Make sure both ? are deleted. Next, and in the target
value box, type: test. Now, click OK. You just told the software to calculate the area below
-1.96 in a standard normal curve whose mean and standard deviation is 0 and 1 respectively.
The result which appears in the second column should match with the associated number
in your z−table. Beware that you can see the detailed result on the top box in the data
editor when you put the mouse on the appropriate cell. Also note that SPSS has a tendency
of rounding the numbers inside the spread-sheet. Double check the SPSS number with your
reading from table A.
(i) Now that you know how to calculate the areas under the z−curve answer the following
questions using SPSS:
What proportion of a standard Normal curve (z−curve) is found in each of the following
regions?
(1) z < −1.645
2
(2) z < −2
(3) z < −1.96
(4) z > −2.05
(5) z < 2.25
(6) −1 < z < 1.15
(7) |z| > 0.5
(8) |z| < 1.28
(j) Now, answer the following questions regarding the Math scores and only using SPSS:
(1) Approximately, what proportion of Math scores is below 500?
(2) Approximately, what proportion of Math scores is above 700?
(3) Approximately, what proportion of Math scores is between 540 and 740?
(4) Approximately, what proportion of Math scores is below 600?
(5) Approximately, what proportion of Math scores is above 654?
(6) Approximately, what proportion of Math scores is between 654 and 720?
We will use the SAT file again. Open GA.MTW. We have already opened the data file.
It is available but is hidden somewhere! A simple way of bring it back is to go to: File →
Recently Used Data → GA.sav. The other option of course is to go to the course webpage
and double click on it.
(k) Next, we want to create a scatterplot between the pairs of the three variables. To do this.
got to: Graphs → Scatter ... → Simple → Define. Choose Math for the Y Axis: and
Verbal for the X Axis. Hit OK. You should have the scatterplot fot Verbal versus Math.
(l) Repeat the procedure above for the pairs of Math versus gpa and Verbal versus gpa. Make
sure that gpa is in the Y Axis:.
(m) Based on the three scatterplots, comment on: the linearity, the direction , and the strength
of the three pairs of relationships.
3
(n) Let’s measure the correlation between Math and gpa. A direct way of doing this is to go
to: Analyze → Correlate → Bivariate. Make sure that Pearson is checked in the
Bivariate Correlations box. Then move gpa and Verbal to the Variables: box. Hit OK.
The correlation coefficient is 0.485. Find it!
(o) Repeat the same procedure this time to obtain the correlation coefficients between Math and
gpa, and also between Math and Verbal.
(p) Remember that Pearson’s correlation coefficient between any two variables can be obtained
using the formula:
P
Zx Zy
n−1 .
In that formula, Zx stands fot the z−scores of the first variable, and Zy are the z−scores for
the second variable. We can manually calculate the formula. Remember that you created
z−scores in the previous lab. So, you should be able to handle this part by your own!
(q) Create z−scores for the Math variable and place it in the fourth column of the spreadsheet.
(r) Create z−scores for the gpa variable and place it in the fifth column of the spreadsheet.
(s) Multiply the fourth column by the fifth column and place the result in the sixth column. To
do this you should use the Transform → Compute option.
(t) Obtain the sum of the sixth column using Analyze → Descriptive Statistics → Frequencies → by clicking on statistics ... option followed by choosing sum.
(u) Divide the sum you obtained in the previous section by 99 (why?). This should give you the
correlation coefficient of interest.
(v) A strong linear relationship between any two continuous variables is the one for which the
correlation coefficient (r) is close to either -1 or 1. A weaker linear relationship has r closer to
0. Based on these explanations, comment on the linear relationship between the three pairs
of correlations.
4
Problem 2. Egyptian Age-Height Example
(a) Obtain the Kalama data from the webpage (http://www.csub.edu/∼sbehseta/lab140fall04.htm)
by clicking on Kalama.sav.
(b) Draw a scatterplot for age and height variables.
(c) Calculate the correlation coefficient r, between the two variables.
(d) Comment on linearity, strength, and direction of the relationship using both graph and the
value of r.
Problem 3. CEO Problem Again!
(a) Obtain the CEO data from the webpage (http://www.csub.edu/∼sbehseta/lab140fall04.htm)
by clicking on CEO.sav.
(b) Draw a scatterplot for age and salary variables.
(c) Calculate the correlation coefficient r, between the two variables.
(d) Comment on linearity, strength, and direction of the relationship using both graph and the
value of r.
5