Download multivariate data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Data mining wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
540201
Statistics for Engineer
Content
1.
2.
3.
4.
5.
6.
Random Sampling
Stem-And-Leaf Diagrams
Histograms
Box Plots
Time Series plots
Multivariate Data
Data type
 Attribute
data
– Discrete, proportion and count of
defects are the most common
– We can count
 Variable data
– Continuous data
– We can measure variables
Variable data ให้ ข้อมูลที่ดีกว่า และต้ องการจานวนข้ อมูล
น้ อยกว่า
3
Sources of Engineering Data
A retrospective study
◦ Historical data
 An observational study
◦ Data from processes or existing
operation
 A designed experiment
◦ Data from an experiment set for group
of interested factors

Parameters
Population
Mean
Sample

µ
n
i 1
Xi
n
Variance
S2 or SD2
Standard
Deviation
S or SD
Standard score
Z
5
Graphing
Univariate Data
 Dot plot
 Stem and Leaf
Diagram
 Histogram
 Box Plot
 Time Series Plot
 Individual Value Plot
 Interval Plot
 Pareto
Multivariate Data
 Scatter Plot
 Matrix Plot
Data Summary and Display

Sample mean
Population mean
Dot Diagram
Useful data display for small samples, up to about 20
observations.
Dot Diagram
Sample Variance and Sample
Standard Deviation
Population variance
Ex: The data of the first yield
strength (kN) from experiment of
circular tubes with cap welded to
the end. Calculate the sample
average and standard deviation.
96
102
104
126
140
160
96
102
108
128
156
164
102
104
126
128
160
170
EXCEL and Minitab’s results
mean
Var
SD
126.2
683.2
26.1
Welcome to Minitab, press F1 for help.
Mean of C8
Mean of C8 = 126.222
Standard Deviation of C8
Standard deviation of C8 = 26.1389
Ex: Calculate the sample mean and
SD of compressive strength (psi) of
80 Al-Li alloy specimens.
105
97
245
163
207
134
218
199
160
196
221
154
228
131
180
178
157
151
175
201
183
153
174
154
190
76
101
142
149
200
186
174
199
115
193
167
171
163
87
176
121
120
181
160
194
184
165
145
160
150
181
168
158
208
133
135
172
171
237
170
180
167
176
158
156
229
158
148
150
118
143
141
110
133
123
146
169
158
135
149
EXCEL and Minitab’s results
Mean
Var
SD
162.7
1140.6
33.8
Results for: Worksheet 2
Mean of C10
Mean of C10 = 162.662
Standard Deviation of C10
Standard deviation of C10 = 33.7732
Dot Plot
Stem-And-Leaf Diagrams



Stem-And-Leaf Diagrams is a good way to obtain an
informative visual display of a data.
Each number consists of at least two digits.
Steps for constructing
1.
2.
3.
4.
Divide each number into two parts: a stem, and a leaf.
List the stem value in a vertical column.
Record the leaf for each observation
Write the units for stems and leaves on the display
Example 2-4
Stem and Leaf Diagram
1 7 6
2 8 7
3 9 7
5 10 15
8 11 058
11 12 013
17 13 133455
25 14 12356899
37 15 001344678888
(10) 16 0003357789
33 17 0112445668
23 18 0011346
16 19 034699
10 20 0178
6 21 8
5 22 189
2 23 7
1 24 5
Stem-and-Leaf Display: Compressive Strength
Stem-and-leaf of Compressive Strength N = 80
Leaf Unit = 1.0
Histograms


Use the horizontal axis to represent the
measurement scale for the data.
Use The Vertical scale to represent the
counts, or frequencies.
Histogram
Box Plot






Describes several features of a data set, such as center,
spread, departure from symmetry, and identification of
observations.
The observations are called “outliers.”
The box encloses the interquartile range (IQR) with left
at the first quartile, q1, and the right at the third
quartile, q3.
A line, or whisker, extends from each end of the box.
The lower whisker extends to smallest data point
within 1.5 interquartile ranges from first quartile.
The upper whisker extends to largest data point within
1.5 interquartile ranges from third quartile.
Box Plot
Q2
Median
Outliers
Whisker
1.5 IQR
Q1
1.5 IQR
Extreme Outliers
Q3
IQR
Whisker Outliers
1.5 IQR
Interquartile Range (IQR) = Q3 – Q1
1.5 IQR
Example








63, 88, 89, 89, 95, 98, 99, 99, 100, 100
A lower quartile of Q1 = 89
An upper quartile of Q3 = 99
Hence the box extends from 89 to 99 and the interquartile
range IQR is 99 - 89 = 10.
An outlier is any data point that is more than 1.5 times the
IQR from either end of the box.
1.5 times the IQR is 1.5*10 = 15 so, at the upper end an
outlier is any data point more than 99+15=114.
There are no data points larger than 114, so there are no
outliers at the upper end.
At the lower end an outlier is any data point less than 89 - 15
= 74. There is one data point, 63, which is less than 74 so 63 is
an outlier.
Box Plot
Box Plot
Time Series Plot
Individual Value Plot
Interval Plot
Pareto Chart



This chart is widely used in quality and
process improvement studies.
Data usually represent different types of
defects, failure modes, or other
categories.
Chart usually exhibit “Pareto’s law”
Pareto
Example 2-8
Example 2-8
Multivariate Data



Using for collecting and analyzing multivariate data
Objective is to determine the relationships among the variables.
The corrected sum of cross-products
Scatter Diagrams


Diagram is a simple descriptive tool for
multivariate data.
The diagram is useful for examining the
pairwise (or two variables at a time)
relationships between the variables.
Correlation Coefficient; r
r=1
S xy   ( xi  x )( yi  y )
r=0.92
S xx   ( xi  x ) 2
S yy   ( yi  y )
r
r=0
r=-0.92
r=-1
S xy
S xx S yy
r=0
2
Ex: The wire bond data was
shown between Pull strength,
Wire length and Die height
Observed
Number
1
2
3
4
5
6
7
8
9
10
11
12
13
Pull
Strength
9.95
24.45
31.75
35.00
25.02
16.86
14.38
9.60
24.35
27.50
17.08
37.00
41.95
Wire
Length
2
8
11
10
8
4
3
3
9
8
4
11
12
Die Height
50
110
120
550
295
200
375
52
100
300
412
400
500
Observed
Number
14
15
16
17
18
19
20
21
22
23
24
25
Pull
Strength
11.66
21.65
17.89
69.00
10.30
34.93
46.59
44.88
54.12
56.63
22.13
21.15
Wire
Length
2
4
4
20
1
10
15
15
16
17
6
5
Die Height
360
205
400
600
585
540
250
290
510
590
100
400
Scatter Plot
Correlations: Pull Strength, Wire
Length
Pearson correlation of Pull Strength
and Wire Length = 0.982
P-Value = 0.000
Correlations: Pull Strength, Die
Height
Pearson correlation of Pull Strength
and Die Height = 0.493
P-Value = 0.012
Correlation



A measure of linear association between
two variables.
The correlation coefficient- which
describes both the strength and direction
of the relationship.
The correlation coefficient ranges from
-1 to 1
Scatter Plot
Matrix Plot
Q &A