Download Confidence Intervals

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Association rule learning wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
GrowingKnowing.com © 2011
GrowingKnowing.com © 2011
1
Percentile
 What ‘s the difference between percentile and percent?
 Percent measures ratio

90% percent on a test shows you got 9 out 10 questions right.
 Percentile measures position


90 percentile shows you beat 9 out of 10 students on the test,
even if the whole class failed that test.
 Business uses percentiles to reward the best
employees, best customers, or possibly drop the worst.
GrowingKnowing.com © 2011
2
Formula
 There are 3 popular formulas for percentiles
 Check with your instructor to ensure you use the right formula.
 In Excel, =Percentile(A1:A9, percentile)

where your data is in cells A1 to A9, and percentile is written as a
decimal (i.e. 90 percentile is written as .9)
 Our practice questions uses the more popular formula below
 which is different from Excel’s formula.
 Formula i = percentile/100 x n
 Sort data
 If i is a decimal, round up to the next data item in the sorted list
 If i is a whole number, take the average of data item i plus the next
data item in the sorted list.
 n is the number of data items in your list
GrowingKnowing.com © 2011
3
Example 1
 Find the 50th percentile for : 1, 4, 2, 5, and 4
 Sort the data 1st: 1, 2, 4, 4, 5
 n = 5 because we have 5 data items
 Calculate i = 50/100 x 5 = 2.5
 Since i is a decimal, we round up to the next data item
which is position 3 in the sorted list.
 The number 4 is in position 3 of our sorted list.
 Note: Median is another name for 50th percentile or the
middle position, and as you can see the formula we are
using confirms our Median calculations.
GrowingKnowing.com © 2011
4
Example 2: percentile.
 Find 25th percentile for numbers: 3, 9, -1, and 0
 Sort the data: -1, 0, 3, 9
 i = 25/100 x 4 = 1
 i is whole number, so take average of data item in
position 1 and the next data item in position 2.
 (-1 + 0) / 2 = -0.5
 25th percentile = -0.5
GrowingKnowing.com © 2011
5
Interquartile Range (IQR)
 TIP: Review percentiles before you study IQR
 IQR range shows you the middle 50 percentile of your
data from the 25th percentile to 75th percentile.
25th
50th
75th
IQR
 We often want to know the middle 50 percentile of the
data: we design a chair or other new product to fit the
middle segment of people and ignore the extremes.
GrowingKnowing.com © 2011
6
Calculate IQR
 There is no Excel function for IQR
 Use Excel like a fancy calculator to determine IQR.
 Example: What is the IQR if Q1 is 75 and Q3 is 200
 Answer: IQR = Q3 – Q1 = 200 – 75 = 125




What is the IQR for this data: 6, 2, 3, 4, 9, 10, 5, 7, 8?
Calculate Q1: the 25th percentile = 4
Calculate Q3: the 75th percentile = 8
IQR = Q3 – Q1
=8–4
=4
 TIP: There are a few ways to calculate percentiles so ensure you are
using the method recommended by your teacher.
GrowingKnowing.com © 2011
7
Outliers
 IQR is very important in the calculation of outliers.
 An outlier calculation is at least 1.5 times the IQR
 Lower outlier = Q1 – 1.5(IQR)
 Upper outlier = Q3 + 1.5(IQR)
 Outliers indicate the exceptional data
 Book have been written about outliers as the exceptional
performers
 Who are our outstanding sales people?
 What bacteria is exceptionally lethal?
 Performance can be exceptionally good or bad!
 The outlier calculation helps you determine what exactly is the
point that signals outstanding performance.
 Is 10 good enough, or do you need to score 12.3 ?
GrowingKnowing.com © 2011
8
Outlier calculation
 If first quartile is 10, third quartile is 30, what are the
outliers?
 IQR = 30 – 10 = 20
 Upper outlier = Q3+ 1.5(IQR) = 30 + 1.5(20) > 60
 Lower outlier = Q1 – 1.5(IQR) = 10 - 1.5(20) < -20
 Any data value more than 60 or less than -2o is an
outlier (which is an extreme value)
 Some books use 3(IQR) as a more extreme outlier.
 You can now calculate if your friend is an extreme
video gamer or just plays a lot.
GrowingKnowing.com © 2011
9
Box Plot
 A box plot shows the quartiles using a visual picture of a
box.
 The left side of the box is Q1,
 a line in the middle of the box is Q2 (i.e. median),
 and Q3 is the right side of the box
 Lines, called whiskers, that are drawn from the box to the
minimum and maximum values.
Q1
Q2
Q3
22
93
Symmetrical data
47
57
GrowingKnowing.com © 2011
67
10
Box Plot
 If the whiskers are longer on one side than another,
that indicates the distribution skewed to the long side.
 If the median (Q2) is to the left of the middle, the
distribution is skewed right.
 If Q2 is to the right of the middle, the distribution is
skewed left
Skewed left
GrowingKnowing.com © 2011
Skewed right
11
z score
 Z score measures position by the number of standard
deviations (S.D.) above or below the mean.
 Negative z score is S.D. below the mean
 Positive z score is S.D. above the mean
 Formula
z = (x – mean) / standard deviation

where x is any data value you select or want
 Brackets are needed or calculators/computers will divide
first then subtract which gives an incorrect answer.
GrowingKnowing.com © 2011
12
z score
 Excel: =STANDARDIZE(x, mean, standard deviation)
 Since a z score is a subtraction and a division, it may be
faster to calculate z mentally than use a computer.
 What is the z score for 50 if you have a mean of 100
and standard deviation of 10?
z score = (50 - 100) / 10: = -5
 What is the z score for 150 if the mean is 100 and
standard deviation is 25?
z score = (150 - 100) / 25: = +2
GrowingKnowing.com © 2011
13