Download Introduction to Engi.. - Department of Computer Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Introduction to
Engineering
Fall 2006
Lecture 13: Statistics
1
Review

Introduction to Dimensions & Units

Other Systems

Dimensions in Equations
2
Review - Definitions

Dimensions are properties that can be measured
such as length, time, mass, temperature, or
calculated by multiplying or dividing other
dimensions, such as velocity (length/time)

Units are means of expressing the dimensions such
as feet or meter for length, hours/seconds for time.

Every valid equation must be dimensionally
homogeneous: that is, all additive terms on both
sides of the equation must have the same unit
3
Review - Important???
4
Outline

Introduction to Statistics

Describing Data

Measures of Central Tendency

Statistics in MatLab
5
Introduction to Statistics
6
Introduction 1

Statistics is the science of collecting,
analyzing, and drawing conclusions from data

Population


The collection of all responses, measurements, or
counts that are of interest.
Sample

A portion or subset of the population.
7
Introduction 2

Parameter:



A number that describes a population
characteristic.
For example the average age of all people in the
US
Statistic:


A number that describes a sample characteristic.
For example, the average age of people from a
sample of three states
8
Branches of Statistics

Inferential Statistics


Involves using sample data to draw conclusions
about a population.
Descriptive Statistics

Involves organizing, summarizing, and displaying
data displaying data. .
9
Use in Engineering

Engineers are often asked to draw
conclusions using uncertain, inconsistent or
incomplete sets of data

Statistics is also useful to describe and
understand the variability in the data that
could come from differences in process
variables such as temperature or time.
10
Applications in Engineering






Statistical signal processing
Communications
Systems and control
Decision and resource allocation under
uncertainty
Reliability (dealing with noise, error control,
failures)
Thermodynamics
11
Describing Data
12
Example Data

The amount of time (in seconds) that 25 jobs were in
control of a large mainframe computer’s central
processing unit (CPU)
0.02
0.75
1.17
1.61
2.59

0.15
0.82
1.23
1.94
3.07
0.19
0.92
1.38
2.01
3.35
0.47
0.96
1.40
2.16
3.76
0.71
1.16
1.59
2.41
4.75
We could describe this data using a graphical
method or a numerical method


Graphical Methods: histogram, stem and leaf, time series
plot
Numerical Methods: Central Tendency, Variation, Relative
Standing
13
Histogram

Steps for construction a histogram




Calculate the range of the data:
min(data) – max(data)
Divide the range into classes (bins, intervals) of equal width
 if there are less than 25 observations use 5 or 6 classes
 If there are between 25 and 50 observations use 7 to 14
classes
 If there are more that 50 observations use 15 to 20 classes
For each class, calculate the class frequency which is the
number of observations in that class
The histogram is a bar graph in which the categories are
classes and the heights of the bars are determined by the
class frequency
14
0.02
0.75
1.17
1.61
2.59
Example

0.15
0.82
1.23
1.94
3.07
0.19
0.92
1.38
2.01
3.35
0.47
0.96
1.40
2.16
3.76
0.71
1.16
1.59
2.41
4.75
The range of data is: 4.75 – 0.02 = 4.73

Divide the data into 7 intervals of .7 each
beginning with 0.015







0.015 to 0.715:
0.715 to 1.415:
1.415 to 2.115:
2.115 to 2.815:
2.815 to 3.515:
3.515 to 4.215:
4.215 to 4.915:
5
9
4
3
2
1
1
9
8
7
6
5
4
3
2
1
0
0.0150.715
1.4152.115
2.8153.515
4.2154.915
15
Stem and Leaf Display

Steps to construct a stem-and-leaf display



Divide each observation into two parts: stem and
leaf
List the stems in order in a column
Place the leaf for each observation in the
appropriate stem row – arrange the leaves in
each row in ascending order
16
Example

Student scores on an exam: 12 15 20 27 31 36
37 44 46 48 49 50 51 55
Create the stems:
1
2 5
2
0 7
3
1 6 7
4
4 6 8 9
5
0 1 5
17
Time Series Plot

Some data sets are a time series



That’s is measurements taken at regular intervals over time
These plots often reveal important features of the data set
For example, the time series plot of the number of live births
per 10,000 23-year-ol women in the US between 1917 and
1975:
18
Measures of Central Tendency
19
Central Tendency

Measures of Central Tendency describe how
numbers vary about a central point and how
spread out they are


Some are better descriptions than others
These include





Range
Variance
Mean
Median
Mode
20
Mean

The mean (or average) is the simplest
measure of central tendency to calculate

Given a set of n measurements:
21
Median

the median of a sample is defined as the value at
which half of the measurements are lower and
half are higher.



A simple way to calculate median is to order all the
measurements from lowest to highest.
The number in the middle is the median if n is odd. If n is
even, the median is the average of the middle two values.
The median is sometimes more useful than the
mean, particularly in cases where one or two values
are significantly different than the rest of the values.
22
Example

Given the following data:

3 5 12 17 18 22 25 26 30 31
The Mean is:
3 + 5 + 12 + 17 + 18 + 22 + 25 + 26 + 30 + 31 = 189
189
10 = 18.9
The Median is:
(18 + 22)/2 = 20
Now add a new data element: 200
The new Mean is:
389
11 = 35.36
The new Median is: 22
23
Mode

The mode of the sample is the most
probable value of the n measurements,
i.e., the one that occurs most frequently.


Mode is not used as often as mean or median
because it can be a misleading quantity,
especially if the sample size is small and/or the
distribution of measurements is not purely
random.
If none of the measurements are repeated, the
mode is undefined.
24
Deviation

The deviation of a measurement is defined as the
difference between a particular measurement
and the mean, i.e., for measurement i:
di = xi - x


When considering a group or sample of measurements,
the deviation of one particular measurement is the same as
the precision error or random error of that measurement.
Deviation is not the same as accuracy error since
accuracy error (inaccuracy) is defined as the difference
between a particular measurement and the true value of
the quantity being measured
 Because of bias (systematic) error, xtrue is often not even
known, and the mean is not equal to xtrue if there are bias
errors.
25
Average Deviation

To get some feel for how much deviation is
represented in the sample, we might first think of
averaging all the deviations to obtain some kind of
mean or average deviation.



It turns out that the average of all the deviations is zero!
Because by definition, some of the measurements are
smaller than the average, and some are larger, and the
average deviation turns out to be a meaningless and
worthless calculation – it is always zero.
a better kind of average is the average absolute
deviation, defined as the average of the absolute
value of each deviation.
26
Sample Standard Deviation

an even better, and more accepted measure
of how much deviation or scatter is in the
data is obtained by calculating the sample
standard deviation.
(S
n
s = sqrt
i=1
(xi – x)2
n
)
27
Observations

S is kind of like an average of the deviations, but it is
constructed by taking the square root of the average
of the squared deviations




Notice that the denominator is n – 1, not simply n.
It turns out that for small sample size (n small),
n - 1 yields a better estimate of the actual standard
deviation than does n itself.
As n gets big, the difference between using n or
n – 1 in the denominator becomes negligible.
The sample variance is the square of the sample
standard deviation
28
Statistics in MatLab
29
MatLab

Several of the most common statistical
operators are directly available in MatLab

M = mean(x)

M = median(x)

Y = std(x)
Y = var(x)

Calculates the sample average of a vector or the
mean of each column of a matrix
Calculates the median of a vector of the
median of each column of a matrix
30
Sample Program
31
Sample Run
32
Possible Quiz

Remember that even though each quiz is
worth only 5 to 10 points, the points do add
up to a significant contribution to your overall
grade

If there is a quiz it might cover these issues:



What is a statistic?
Define the mode.
Why is the median sometimes a better measure
than the mean?
33