Download SOLUTIONS TO THE LAB 1 ASSIGNMENT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

History of statistics wikipedia , lookup

World Values Survey wikipedia , lookup

Time series wikipedia , lookup

Transcript
SOLUTIONS TO THE LAB 1 ASSIGNMENT
Question 1
Excel produces the following histogram of pull strengths for the 100 resistors:
Histogram of Pull Strengths (lb)
25
Frequency
20
15
10
5
0
59
61
63
65
67
69
71
73
75
(a)
The histogram is one-peaked, bell-shaped, and approximately symmetric. Given the relatively
small spread, there is one observation (between 74 and 75) lying far above the main body of the
data. This observation may be considered an outlier. We will verify in Question 2 that indeed, the
single observation is an outlier in a formal sense. The tails of the distribution are relatively short.
(b)
The center of the distribution is at approximately 65 pounds. As the distribution is approximately
symmetric, we expect that the values of mean and the median are very similar, and close to 65.
(c)
If all 100 PST values were overestimated by approximately the same small positive value due to a
poorly calibrated measuring device, the shape of the histogram would be approximately the same
as the histogram for the overestimated values. However, the center (peak) of the histogram would
be shifted to the left by the difference between the overestimated values and the accurate values.
The mean and the median would also be shifted by the difference to the left but standard deviation
and the interquartile range would not be affected (would be the same as the values obtained for the
overestimated PST values.
Question 2
(a)
The summary statistics for the pull strengths obtained with the Descriptive Statistics tool are
displayed below:
Summary Statistics
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
64.859
0.29214323
64.45
64.3
2.921432297
8.534766667
1
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
(b)
The Paste Function feature applied to our data returns the following values of the first quartile, the
third quartile, and the interquartile range:
First Quartile
Q1
Third Quartile Q3
Interquartile range
(c)
0.566577167
0.282186648
16.3
58.2
74.5
6485.9
100
= 63.175
= 66.800
= 3.625
As the distribution of pull strengths is approximately symmetric, the mean and standard deviation
are appropriate measures of center and variation. The median and the interquartile range are used
for skewed distributions.
Question 3
According to the 1.5*IQR criterion, an outlier is any data point that lies below Q1-1.5*IQR or above the
value Q3+1.5*IQR. Taking into account the values of the lower and upper quartiles, and the interquartile
range obtained in Question 2, an outlier lies below 57.7375 and above 72.2375. There is only one observation
that satisfies the condition, the value of 74.5 - the largest observation in the data set.
The outlier 74.5 lies far above the main body of the data. Thus we expect that the mean and the standard
deviation of the remaining 99 observations would decrease. We do not expect a significant change in the
value of the median.
The summary statistics for the data without the outlier are displayed below:
Summary Statistics (Outlier Removed)
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
64.76161616
0.278230661
64.4
64.3
2.768360123
7.66381777
-0.109386988
0.002956345
13.4
58.2
71.6
6411.4
99
The table confirms the conclusions we have reached before.
2
Question 4
In order to convert all 100 PST measurements to kilograms, it is necessary to multiple each value in the
column PST by 0.454. As a consequence, the new mean and the new median can be also obtained by
multiplying the value of the mean and the median for the measurements expressed in pounds by 0.454.
Moreover, given the formula for the standard deviation and the above, the new standard deviation can be
obtained from the standard deviation for the original data by multiplying it by 0.454. Also the interquartile
range for the data in kilograms is equal to the interquartile range for data on the original scale of
measurement multiplied by 0.454.
The histogram for the data expressed in kilograms will have the same shape as the histogram obtained in
Question 1. The peak of the new histogram will be approximately at 65*0.454 = 29.51.
Question 5
In order to answer the question whether the new ozone-friendly cleaning process produces similarly strong or
stronger solder-joints, on the average, we look at the summary statistics for the distribution. The mean of the
pull strengths obtained is 64.761616, and it is almost identical to the mean of pull strengths for the old
technology (64.8). The small difference is due to sampling variability. Thus the new technology produces
solder-joints of similar strength, on the average.
Now we compare the variability of the two processes. The standard deviation for the old technology is 2.25 lb.
This value is smaller than the value of 2.7683 lb obtained in Question 3 (after excluding the outlier). Given the
large sample size that the new standard deviation is based on (99), it is safe to conclude that the new process
results in slightly higher variability than the old process. More advanced statistical methods are required to
determine whether the difference is statistically significant. The new process can be examined thoroughly to
determine whether some sources of extra variation can be eliminated.
Question 6
The histogram of electrical resistance for the 100 boards is displayed below:
Histogram of Electrical Resistances
25
Frequency
20
15
10
5
0
0.2
(a)
0.6
1
1.4
1.8
2.2
2.6
3
More
The histogram is one-peaked, and skewed to the right. Most of the observations lie between 0 and
1, but there are several observations outside the range. The right tail is longer than the left tail of
the distribution. There is one outlier.
3
(b)
As the distribution is skewed, median and interquartile range are appropriate measures of center
and spread, respectively.
Question 7
The scatterplot of electrical resistance (RES) versus pull strength (PST) displays the relationship between
the two variables. It allows you to assess the type of relationship (linear, nonlinear), direction (positive,
negative), and its strength.
(a)
The scatterplot for the data is displayed below:
Electrical Resistance (in teraohms)
Scatterplot of RES vs. PST
3.50
3.00
2.50
2.00
1.50
1.00
0.50
0.00
55
60
65
70
75
Pull Strength (in pounds)
(b)
There is no clear pattern in the plot. It seems that the points in the plot are randomly scattered.
However, it is worthy to notice a substantial difference in the variation of pull strength values for
low electrical resistance values relative to that one for the high electrical resistance values. There
are no obvious outliers in the plot.
4
LAB 1 ASSIGNMENT MARKING SCHEMA
Proper Header and appearance:
1.
10 points
Correctly formatted histogram: 6 points.
(a) Analysis of the shape of the histogram: 3 points
(b) Center (estimates of the mean and the median): 2 points
(c) Histogram of accurate measurements: 2 points
Mean, Median, standard deviation and IQR of accurate values: 2 points
2.
Summary Statistics:
(a) Descriptive Statistics output (mean, median, standard deviation, IQR): 4 points
(b) First Quartile, Third Quartile, IQR: 3 points
(c) Discussion of appropriateness: 2 points
3.
Determining the lower and upper range for outliers: 2 points
Identifying the outlier: 2 points
Effect of removing the outlier on some summary statistics: 3 points
4.
Effect of expressing the PST values in kilograms on summaries: 2 points
Effect of expressing the PST values in kilograms on histogram: 2 points
5.
Comparing the average strength of resistors: 2 points
Comparing the variability of the two processes: 2 points
6.
Correctly formatted histogram: 6 points.
(a) Analysis of the shape of the histogram: 3 points
(b) Numerical measures to describe typical resistance and the spread: 2 points
7.
Relationship between pull strengths and resistance
(a) Discussion of the pattern in the scatterplot: 3 points
Outliers: 1 point
(b) Correctly formatted scatterplot: 6 points
TOTAL = 70
5