Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
2.1
A.
A
B
C
26%
56%
18%
13/50
28/50
9/50
B.
C.
D.
2.15
A
91 94 97 100 102 102 103 108 111 112 115 115 116 116 117 117 117 122 122 123 124 128 129
130 132
B
The stem and leaf gives more information since the way it is presented it is easier to see where
the bulk of values are and it is easier to see which values occur most often. The Stem and leaf is only a
good choice for this problem since the values are concentrated, the more disperse the values the less
effective the stem and leaf becomes.
C
117 Gallons is most likely to be purchased since it is the mode. There are three values for 117
while the next closest number to be the mode only has two values.
D
Yes there is a concentration around the middle/upper bracket of the 11* category. However it is
interesting to note that there are not values for 118,119,120,121 before we have a bulk of value in the
range of 122-124.
2.19
A
B
35
85
110
120
170
180
240
260
300
380
380
460
3 5
16
29
42
4
17 0
30 0
43
5
18 0
31
44
6
19
32
45
7
20
33
46 0
8 5
21
34
9
22
35
10
23
36
11 0
24 0
37
12 0
25
38 00
13
26 0
39
14
27
40
15
28
41
C
In this situation the ordered array provides more value to the interpreter since there is no
concentration to the values presented. Neither is really a good fit for this data though. A box and
whisker plot or some other display option that can make better use at representing the low
concentration of data and extreme outliers would be more appropriate.
D
According to the data, battery life has two occurrences at 380 and a max of one occurrence at
other values. This is misleading though, if we were to cut our data set in half at the 26, so we have the
range of 3-23 and 26-46(equal size ranges starting and ending with the extreme values) we would see
that 8 of our values would be in the first subset of 3-23, while only 4 would be in the second subset of
26-46.
2.25
A.
(1870-1522)/6=58
Insulator Force Frequency Distribution
Force Applied
Frequency Percentage
1580 but less than 1638
5
16.67%
1639 but less than 1696
6
20.00%
1696 but less than 1754
7
23.33%
1754 but less than 1812
8
26.67%
1812 but less than 1870
4
13.33%
B.
D. The insulators are of significant strength since not a single value was below the required breaking
point. All but three of the insulators had breaking points in excess of 1600lbs.
3.1
A. N = 5
Sample Data set {7,4,9,8,2}
A. Mean
Sample mean is the sum of the values divided by number of values (PG 97)
(7+4+9+8+2)/5= 6
Median
The middle value in data set. Odd number of values median is middle number. Even amount of
values, median is average of two closest values.
Median = (N+1)/2 ranked value (PG 99)
2,4,7,8,9 (5+1)/2=3 which is the third value which is 7
Mode
The value that appears most often in a data set, often there is no mode (PG100)
{7,4,9,8,2} There is no mode for this data set
B. Range
The range is equal to the largest value minus the smallest value (PG105)
9-2 =7
First Quartile (PG101)
Q1 = (n+1)/4 ranked value
Third Quartile
Q3 = 3(n+1)/4 ranked value
Rules for Quartiles (PG102)
Rule 1: if whole number, than quartile = to that ranked value
Rule 2: if fractional half than quartile is average between two closest ranked values
Rule 3: if fraction other than half, round to closest ranked value.
Inter Quartile Range
Also called Midspread is the difference between the third and first quartile. (PG106)
{7,4,9,8,2}
Q1=6/4=1.5
Average between 2 and 4 =3
Q3=18/4= 4.5
Average between 8 and 9 =8.5
Inter Quartile Range for this data set would be 8.5-3= 5.5
Sample Variance
Is the sum of the squared differences around the mean divided by the sample size minus 1
(PG107)
Data set {7,4,9,8,2}
In Excel use the formula VAR for answer of 8.5
By hand
Mean = 6
Sample size =5
(7-6)^2 + (4-6) ^2 + (9-6) ^2 + (8-6) ^2 + (2-6) ^2
(5-1)
(1)^2 + (-2)^2 + (3)^2 + (2)^2 + (-4)^2
4
1+4+9+4+16
4
34
4
= 8.5 as variance
Standard deviation
The sample standard deviation is the square root of the sum of the squared differences around
the mean divided by the sample size minus one. (PG 107)
NOTE: To calculate take the square root of the variance.
(8.5)^(1/2)=2.92
To calculate using excel, use the formula STDEV
Coefficient of Variation
The coefficient of variation is equal to the standard deviation divided by the mean, multiplied by
100% (PG 110)
STD=2.92
=2.92/6*100%
=48.67
C. Z Scores
An extreme outlier or value is a value located far away from mean. A Z score is an outlier if it is
+or – 3.0 away from the mean.
Mean = 6
Value
7
Formula
(7-6)/2.915476
0.342997
Outlier
N
4
9
8
2
(4-6)/2.915477
(9-6)/2.915478
(8-6)/2.915479
(2-6)/2.915480
-0.68599
1.028991
0.685994
-1.37199
N
N
N
N
D. Shape of data set (PG 112,113)
Symmetrical is a bell curve design Mean = Median
Negative or left skewed, Mean < median
Positive or Right Skewed, Mean > median
In our data set of {7,4,9,8,2}, the mean is 6, and median is 7 so our curve will be negative or left skewed.
3.3
N=7, {12, 7, 4, 9, 0, 7, 3}
A.
Mean= 42/7=6
Median= (7+1)/2=4 {0, 3, 4, 7, 7, 9, 12} = 7
Mode=7
B.
Range= 12-0=12
Q1= (7+1)/4=2 ranked value, which is 3
Q3=3(7+1)/4=6 ranked value, which is 9
Inter Quartile Range= 9-3=6
Variance=
(0-6)^2+(3-6)^2+(4-6)^2+(7-6)^2+(7-6)^2+(9-6)^2+(12-6)^2
7-1
36+9+4+1+1+9+36
6
96
6
=16
Standard Deviation= 16^(1/2)=4
Coefficient of variation= 4/6*100= 66.67
C.
Value
0
3
4
7
7
9
12
Z Scores
Formula
(0-6)/4
(3-6)/4
(4-6)/4
(7-6)/4
(7-6)/4
(9-6)/4
(12-6)/4
-1.5
-0.75
-0.5
0.25
0.25
0.75
1.5
Outlier
N
N
N
N
N
N
N
D.
Shape of data set
Mean = 6, Median = 7
Our curve will be negative or left skewed.
3.7
A.
Mean 31+33.75+35.05+36.15+40.25+43=219.20, 219.20/6=$36.53
Median (6+1)/2 = 3.5 ranked value, 35.05 +36.15= 71.2/2=$35.60
Q1 (6+1)/4 = 1.75 ranked value = 33.75 (rounding rule)
Q3 3(6+1)/4= 5.25 ranked value = 40.25 (rounding rule)
B.
Variance
(31-36.53)^2+(33.75-36.53)^2+(35.05-36.53)^2+(36.15-36.53)^2+(40.25-36.53)^2+(43-36.53)^2
(6-1)
(-5.53)^2+(-2.78)^2+(-1.48)^2+(-.38)^2+(3.72)^2+(6.46)^2
(5)
30.581 + 7.728 + 2.19 + .144 + 13.838 + 41.861
5
96.343/5 =19.269
Excel formula answer is 19.26867
STD =19.269^.5=4.390
Excel Formula answer is 4.389609
Range
43.00 – 31.00 = 12
Inter Quartile Range
40.25 – 33.75 = 6.5
Coefficient of Variation
STD /mean*100%
4.390/36.54*100%= 12.014
C. IS the data skewed? If so How?
Median =35.60, Mean =36.53
Since the mean is greater than the median the data is positively skewed or right skewed.
D. Conclusions about going to movies based on information from part (a) and then conclusions based on
information from part (b).
The information in Part A of the question deals largely with the concentration and centralized nature of
the data including mean, median, and quartile 1 and quartile 3. This information is relevant to what I
might expect on average if I were to blindly go to the random movie theaters over a large enough time.
The information in Part B deals with the explaining all the possible values I may encounter and their
overall tendencies.
3.13
A.
Money Market Accounts {4.55 4.50 4.40 4.38 4.38 }
Mean:22.21/5=4.442
Variance:
(4.55 – 4.442)^2 + (4.50 – 4.442)^2 + (4.40 – 4.442)^2 + (4.38 – 4.442)^2 + (4.38 – 4.442)^2
(5-1)
(.108)^2 + (.058)^2 + (-.042)^2 + (-.062)^2 + (-.062)^2
(4)
.01166+.00336+0.00176+.00384+.00384
(4)
.024/4= .00612
Excel answer =.00612
Standard Deviation =.00612^.5=.07823
Q1= (n+1)/4 ranked value =6/4=1.5ranked value which is 4.38
Q3=3(n+1)/4 ranked value=18/4=4.5 ranked value which is (4.55+4.50)/2=4.525
Inter quartile Range =4.38-4.525= -0.145
Coefficient of Variation
STD/mean*100%
.07823/4.442*100 = 1.761144
One Year CD’s {4.94 4.90 4.85 4.85 4.85}
Mean = 24.39/5=4.878
Variance
(4.94 -4.878)^2+(4.90 -4.878)^2+(4.85 -4.878)^2+(4.85 -4.878)^2+(4.95 -4.878)^2
(5-1)
=(0.062)^2+(0.022)^2+(-0.028)^2+(-0.028)^2+(-0.028)^2
(4)
=.00384+.00048+.00078+.00078+.00078
(4)
=.007/4=.00167 Excel answer=.00167
Standard Deviation
=.00167^.5 =0.040866 Excel answer = 0.04086563
Difference in answers is in rounding
Inter quartile Range
Q1= (n+1)/4 ranked value =6/4=1.5ranked value which is 4.85
Q3=3(n+1)/4 ranked value=18/4=4.5 ranked value which is (4.94+4.90)/2=4.92
Inter quartile range = 4.92-4.85=.07
Coefficient of Variation
STD/mean*100%
=0.040866/4.878*100=0.837761
B. Money Market Accounts have more variation than One year CD’s. This can be seen in the variance,
Ranges, and Standard deviation. For these figures of that show how disperse the data is the smaller the
number the more condensed the data set. The Money market accounts numbers in these areas are all
larger than the one year CD’s.
3.21
The following is a data set for a population with N=10
{7,5,11,8,3,6,2,1,9,8}
A.
Compute the population mean:
The Population mean is the sum of all the values in the population divided by the population
size of N (PG118).
Sum of values is 60. Population mean is 60/10=6
B.
Compute the population standard
deviation.
The population variance is the sum of the squared differences around the population mean
divided by the population size.
Differences between Population variance and Sample variance;
-POP divide by N, Sample divide by n-1
-Population must use POP size (N) and POP mean (µ), sample must use sample size (X, suppose
to be X with line over top but do not have the font) and sample mean
Population Variance:
=(1-6)^2+(2-6)^2+(3-6)^2+(5-6)^2+(6-6)^2+(7-6)^2+(8-6)^2+(8-6)^2+(9-6)^2+(11-6)^2
10
=(-5)^2+(-4)^2+(-3)^2+(-1)^2+(0)^2+(1)^2+(2)^2+(2)^2+(3)^2+(5)^2
10
=25+16+9+1+0+1+4+4+9+25
10
=94/10
=9.4
Population STD
=9.4^.5
=3.065941943
EXCEL answer = 3.065941943
3.23
Solved Using Excel
A.
Mean =514/50=10.28 (thousands)
Variance =VAR(A2:A51)= 4.182041
Standard Deviation = =STDEVP(A2:A51) = 2.024451
B.
Value range
16.27335
14.2489
8.255549
6.231099
4.206648
Within 1
Within 2
Within 3
12.22445
Separate
32
15
3
Dev.’s
3
2
1
plus/minus
-2
-3
Inclusive
32
47
50
Occurrences
2
8
32
7
1
%
0.64
0.94
1
C. The empirical rule (PG 120)
68% results should be within plus/minus 1 deviation
95% results should be within plus/minus 2 deviations
99.7% results should be within plus/minus 3 deviations
These results are close to the empirical rule as outline in the chart below:
Actual %
Empirical Rule %
.68
0.64
.95
0.94
.997
1
3.31
The Five Number Summary consists of:
Smallest value, 1st Quartile, Median, 3rd Quartile, Largest number for a series (PG 123)
Chart comparing relationships of these numbers to distribution of data is also found on PG123
A box and Whisker plot is a Graphical summary of the Five number summary. (PG 124)
Important chart for applying box and whisker to distributions is figure 3.5 on pg 125
Data set ordered:
Smallest Value : 4 , Largest Value :56 Median
4
5
7
8
16
19
19
20
20
23
24
25
29
29
30
30
30
30
40
56
Median: (23+24)/2 = 23.5
Q1 = (n+1)/4 =21/4=5.25 ranked value= 16
Q3 = 3(n+1)/4= 15.25 ranked value = 30
Computer Assignments sections.
Throughout the homework I have checked and verified my answers using microsoft excel. I have
found out that my answers are always the same as long as I perform the due dilligance to take
my work out enough decimal places.
As noted earlier in 3.1,3.3, 3.21 it takes a little over a page of computation of simple math to
come up with the variance and standard deviation but with excel it only takes one simple line
and a whole lot less time to check your work. In that page of computation every time I perform
a calculation is a chance that I will perform a simple mistake and come up with a wrong answer.
Doing work by hand is nice to have a good understanding of the process and expected results,
however it greatly increases chances of coming up with an incorrect answer and takes an
enormous amount of time that has an opportunity cost attached to it.
Related documents