Download 2-1 Data Summary and Display

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Data mining wikipedia , lookup

Time series wikipedia , lookup

Transcript
1
2
2-1 Data Summary and Display
3
2-1 Data Summary and Display
4
2-1 Data Summary and Display
5
2-1 Data Summary and Display
Population Mean
For a finite population with N measurements, the mean is
The sample mean is a reasonable estimate of the
population mean.
6
2-1 Data Summary and Display
Sample Variance and Sample Standard Deviation
7
2-1 Data Summary and Display
8
2-1 Data Summary and Display
9
2-1 Data Summary and Display
The sample variance is
The sample standard deviation is
10
2-1 Data Summary and Display
Computational formula for s2
Why?
11
2-1 Data Summary and Display
Population Variance
When the population is finite and consists of N values,
we may define the population variance as
The sample variance is a reasonable estimate of the
population variance.
12
2-2 Stem-and-Leaf Diagram
Steps for Constructing a Stem-and-Leaf Diagram
13
2-2 Stem-and-Leaf Diagram
14
2-2 Stem-and-Leaf Diagram
15
2-2 Stem-and-Leaf Diagram
16
2-2 Stem-and-Leaf Diagram
17
2-2 Stem-and-Leaf Diagram
18
2-2 Stem-and-Leaf Diagram
quartiles
19
2-3 Histograms
A histogram is a more compact summary of data than a
stem-and-leaf diagram. To construct a histogram for
continuous data, we must divide the range of the data into
intervals, which are usually called class intervals, cells, or
bins. If possible, the bins should be of equal width to
enhance the visual information in the histogram.
20
2-3 Histograms
21
2-3 Histograms
22
2-3 Histograms
10 bins
23
2-3 Histograms
24
2-3 Histograms
An important variation of the histogram is the Pareto chart.
This chart is widely used in quality and process
improvement studies where the data usually represent
different types of defects, failure modes, or other categories
of interest to the analyst. The categories are ordered so that
the category with the largest number of frequencies is on
the left, followed by the category with the second largest
number of frequencies, and so forth.
Ordered by the frequency
25
2-3 Histograms
26
2-4 Box Plots
• The box plot is a graphical display that
simultaneously describes several important features of
a data set, such as center, spread, departure from
symmetry, and identification of observations that lie
unusually far from the bulk of the data.
• Whisker
• Outlier
• Extreme outlier
27
2-4 Box Plots
28
2-4 Box Plots
29
Table 2-2
30
2-4 Box Plots
31
2-5 Time Series Plots
• A time series or time sequence is a data set in
which the observations are recorded in the order in
which they occur.
• A time series plot is a graph in which the vertical
axis denotes the observed value of the variable (say x)
and the horizontal axis denotes the time (which could
be minutes, days, years, etc.).
• When measurements are plotted as a time series, we
often see
•trends,
•cycles, or
•other broad features of the data
32
2-5 Time Series Plots
A cyclic variability
Upward trend
33
2-5 Time Series Plots
34
2-5 Time Series Plots
35
2-6 Multivariate Data
• The dot diagram, stem-and-leaf diagram, histogram,
and box plot are descriptive displays for univariate
data; that is, they convey descriptive information
about a single variable.
•Many engineering problems involve collecting and
analyzing multivariate data, or data on several
different variables.
•In engineering studies involving multivariate
data, often the objective is to determine the
relationships among the variables or to build an
empirical model.
36
2-6 Multivariate Data
37
2-6 Multivariate Data
38
2-6 The Corrected Sum of Cross-Products
n
S xy   ( xi  x)( yi  y )
i 1
 n  n 
  xi yi    xi   yi  n
i 1
 i 1  i 1 
n
Inner product!
39
2-6 Multivariate Data
Sample Correlation Coefficient
• Two variables are strong
if 0.8  r  1,
•
moderate if 0.5< r < 0.8, and
•
weak
if 0  r  0.5.
40
2-6 Multivariate Data
41
2-6 Multivariate Data
42
2-6 Multivariate Data
Pairwise correlations
43
2-6 Multivariate Data
44
2-6 Multivariate Data
Interaction between foam and region
45
2-6 Multivariate Data
46
47