Download Week 4

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
ISP 121
Week 4
Intro to Statistics
Descriptive Statistics
• Average, or mean
• Median – the middle score
• Percent Rank – calculates the position of a
datapoint in a data set. More precisely,
tells you approximately how many percent
of the data is less than the datapoint.
• Range – difference between the maximum
and minimum values in the data set
Descriptive Statistics
• Lower quartile – or first quartile, it is the
median of the data values in the lower half
of a data set
• Middle quartile – or second quartile, this is
the overall median
• Upper quartile – or third quartile, it is the
median of the data values in the upper half
of a data set
• Quartiles may help in seeing the variation
in a data set
Quartiles
• For example (bank waiting times):
lower quartile
Big Bank:
median
upper quartile
4.1 5.2 5.6 6.2 6.7 7.2 7.7 7.7 8.5 9.3 11.0
Best Bank: 6.6 6.7 6.7 6.9 7.1 7.2 7.3 7.4 7.7 7.8 7.8
Big Bank range: 11.0 – 4.1 = 6.9
Best Bank range: 7.8 – 6.6 = 1.2
Descriptive Statistics
• The five number summary consists of:
– The minimum value
– The lower quartile (first quartile)
– The median (second quartile)
– The upper quartile (third quartile)
– The maximum value
Standard Deviation
• Quartiles are OK for characterizing data,
but standard deviation is preferred by
statisticians
• It is a measure of how far data values are
spread around the mean of a data set
• Std dev = sqrt(sum of (deviations from the
mean)2 / total number of data values – 1)
• Just use Excel’s =stdev function
Standard Deviation
• A simple way to estimate standard
deviation is the range rule of thumb
• Divide range by 4
• Watch for outliers. These can ruin your
range rule of thumb
Standard Deviation
•
•
•
•
•
•
•
Go back to Big Bank / Best Bank example
Big Bank: range = 6.9
6.9 / 4 = 1.7
Actual standard deviation is 1.96
Best Bank: range = 1.2
1.2 / 4 = 0.3
Actual standard deviation is 0.44
Standard Deviation
• Another basic rule of thumb:
– If a data point lies more than two standard
deviations from the mean, it can be called an
outlier
Histograms
• Nice way to view a data set
• A histogram is a chart similar to a dotplot
created by defining a set of bins and
counting how many data points lie in each
bin. Bars are drawn with height
proportional to the number of data points
in each bin.
Example Histogram
Frequency
Histogram
50
40
30
20
10
0
40
50
60
70
80
Bin
90
10
0
11
0
Excel Functions
• Most, if not all, of these functions are built
into Excel
=average(range) where range can be like C3:C40
=median(range)
=percentrank(range,value)
=quartile(range,which quartile)
=min(range)
=max(range)