Download The kth percentile, Pk, is such that no more than k

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Statistics 312 – Uebersax
http://www.john-uebersax.com/stat312/
06 Measures of Central Tendeny & Disperson
Old Business
 Picking up pace
 Mac issues
Topics to Cover
 Frequency distributions/histograms in Mac Excel
 Review last homework
 Central tendency: mean, median, mode
 Percentiles, quartiles,
 Descriptive statistics in JMP
 Measures of Dispersion: the Variance
 Homework assignment
1. Frequency Distributions/Histograms with Mac
Mac only: Status of Excel Data Analysis ToolPak




Not available for Mac as of 2008
Alternative: StatPlus:MacLE
Download and install: http://www.analystsoft.com/en/products/statplusmacle/
Might not make frequency distributions (so see below)
Frequency Distributions/Histograms via Excel FREQUENCY() function





Place data in one column (e.g., a1:a10)
Place bins in another column (e.g., b1:b4)
In another column, select vertical range of blank cells, which contains one more than
number of cells in bin array (e.g., c1:c5)
Type formula: frequency(a1:a10, b1:b4), then press COMMAND+ENTER (Mac) or
CONTROL+SHIFT+ENTER (PC)
Note: double-check bin range address; formula editor may obscure first cell.
Explained here: http://support.microsoft.com/kb/100122
Class demonstration: Female Dover sole lengths
1
Statistics 312 – Uebersax
http://www.john-uebersax.com/stat312/
06 Measures of Central Tendeny & Disperson
2
2. Review Last Homework
1. For males and females separately, make a distribution table:
In separate columns show range, bin, frequency, percentage, cumulative percentage.
Label columns, include units of mg/dL.
2. For females only, make histogram:
 Resize histogram to look nice
 Place legend on bottom (not side)
 Label x-axis: Upper Limit of Range (Cholesterol, mg/dL)
 Add chart title: Distribution of Cholesterol for Females (mg/dL)
 If necessary fix secondary y-axis to range from 0 to 100%
3. Make final report comparing male and female distributions:
Col 1:
Col 2:
Col 3:
Col 4:
Col 5:
Range
Male frequency
Male percentage
Female frequency
Female percentage
Remember to save your worksheet.
4. Based on the results, what conclusions can you reach concerning differences between male
and female patients?
Place results of 1. (male and female), histogram (female), final comparison table, and answer to
question into Word document.
Using JMP: Enter data for females into Date Table; produce histogram & basic statistics; cutand-paste results into the same Word document as above.
Statistics 312 – Uebersax
http://www.john-uebersax.com/stat312/
06 Measures of Central Tendeny & Disperson
3
3. Measures of Central Tendency: Mean, Median, Mode,
Review: Watch Khan Academy video on Average, Median, Mode
http://www.youtube.com/watch?v=uhxtUt_-GyM
The Arithmetic Mean

x
N
( for a population)
x
x
n
( for a sample)
Ex: The data represent the number of textbooks purchased by a sample of seven students:
10 4 7 5 7 8 9
x
=
10  4  7  5  7  8  9
7
=
50
7
= 7.14
Excel AVERAGE() FUNCTION
The mean is affected by any outliers and skews. Because the mean is nonresistant, there are
alternative measures that are more resistant to outliers and skews
The Median
The median is a resistant measure of central tendency that occupies the middle position of data
placed in order of magnitude.
If n is odd, the median is the middle number of the data placed in order of magnitude. It
Statistics 312 – Uebersax
http://www.john-uebersax.com/stat312/
06 Measures of Central Tendeny & Disperson
 n +1

 2 
occupies the 
4
th
position.
If n is even, the median is the average of the middle two numbers of the data placed in order of
n
2
th
magnitude. It is the average of the numbers in the  
n +2

 2 
and 
th
positions.
Ex Reordering the sample of books: 4 5 7 7 8 9 10.
The median is 7. If there were an eighth person who purchased 12 books, the median would
be 7.5.
Excel MEDIAN() FUNCTION
The Mode
The mode, by definition, is the most frequently occurring value in a series.
 There can be more than one modes
 There can be no mode
Excel MODE() FUNCTION
Statistics 312 – Uebersax
http://www.john-uebersax.com/stat312/
06 Measures of Central Tendeny & Disperson
5
4. Percentiles and Quartiles
The kth percentile, Pk, is such that no more than k percent of the data are less than Pk and no
more than (100 - k) percent are greater than Pk. Usually used with large data sets.
The first quartile (Q1) is the point that separates the lower 25 percent of the values from the
upper 75 percent = value corresponding to the
n 1
ordered observation.
4
The third quartile (Q3) is the point that separates the upper 25 percent of the values from the
lower 75 percent = value corresponding to the
3n  1
ordered observation.
4
Ex Books: 4 5 7 7 8 9 10.
n 1
= 2, so Q1 = 5;
4
3n  1
= 6, so Q3 = 9.
4
(If position = #.5, average two nearest values; else, if not integer, round.)
5. Descriptive Statistics in JMP
Method 1: Distribution Function



Enter data into a Data Table (Important: do not mix character and numerical values in a
column!)
Highlight column (takes some practice; hint: to refresh selection: Rows > Clear Row States
Analyze > Distribution > OK
Statistics 312 – Uebersax
http://www.john-uebersax.com/stat312/
06 Measures of Central Tendeny & Disperson
More statistics available by clicking red arrow beside Summary Statistics
Method 2: Summary Function
Tables > Summary
JMP Summary Statistics Menu
6
Statistics 312 – Uebersax
http://www.john-uebersax.com/stat312/
06 Measures of Central Tendeny & Disperson
7
From Statistics drop-down menu (see above), select statistics one at a time. Selected statistics
will then appear in box to right. (Note: drop-down menu does not appear in picture below)
For Q1 and Q3, choose Quantile statistic twice, specifying 25% and 75% in this box:
Click: OK
Statistics 312 – Uebersax
http://www.john-uebersax.com/stat312/
06 Measures of Central Tendeny & Disperson
8
For more info: http://www.jmp.com/support/help/Summarize_Columns.shtml
6. Measures of Dispersion: the Variance
Range
Range = Maximum - Minimum
Ex Books: 4 5 7 7 8 9 10
Range = 10 - 4 = 6
Interquartile Range
IQR = Q3 - Q1
Ex The sample of books:
Q1 = 5, Q3 = 9,
IQR = 9 - 5 = 4
Variance (Population and Sample)
The variance is the average squared distance of observations from the mean.
Population variance formula:
The square root of the variance is the standard deviation.
Spreadsheet calculation of population variance:
Ex Books: 4 5 7 7 8 9 10
Statistics 312 – Uebersax
http://www.john-uebersax.com/stat312/
06 Measures of Central Tendeny & Disperson
Variance = Average[X – mu]^2 =26.857/7 = 3.84
Video: Variance of a Population
http://www.youtube.com/watch?v=6JFzI1DDyyk
7. Homework
Read pp. 104-117, Prob 3.1, 3.2a [skip(4),(6), (10)], 3.2b
Data for 3.b (bolts.xls) on course website
9