Download Q 1 - ISpatula

Measure of the Central Tendency For Grouped data Mean – Grouped Data o The mean may often be confused with the median, mode or range. The mean is the arithmetic average of a set of values, or distribution. Example: The following table gives the frequency distribution of the number of orders received each day during the past 50 days at the office of a mail-order company. Calculate the mean. Solution: Number of order 10 – 12 13 – 15 16 – 18 19 – 21 Number of order 10 – 12 13 – 15 16 – 18 19 – 21 f x fx 4 12 20 14 n = 50 11 14 17 20 44 168 340 280 = 832 f 4 12 20 14 n = 50 X is the midpoint of the class. It is adding the class limits and divide by 2. x=  fx = 832 = 16.64 n 50 Median and Interquartile Range – Grouped Data o a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, Median = Lm  n  2 -F +  fm    i   Step 1: Construct the cumulative frequency distribution. Step 2: Decide the class that contain the median. Class Median is the first class with the value of cumulative frequency equal at least n/2. Step 3: Find the median by using the following formula: Where: n = the total frequency F = the cumulative frequency before class median f = the frequency of the class median m i = the class width Lm = the lower boundary of the class median Example: Based on the grouped data below, find the median: Time to travel to work 1 – 10 11 – 20 21 – 30 31 – 40 41 – 50 Frequency 8 14 12 9 7 Solution: 1st Step: Construct the cumulative frequency distribution Time to travel Frequency Cumulative to work Frequency 1 – 10 8 8 11 – 20 14 22 21 – 30 12 34 31 – 40 9 43 41 – 50 7 50 rd class median is the 3 class So, n 50   25 2 2 F = 22, fm = 12, Lm = 20.5 and i = 10 Therefore, n  F   Median = Lm   2 i  fm     25 - 22  = 21.5   10  12  = 24 Thus, 25 persons take less than 24 minutes to travel to work and another 25 persons take more than 24 minutes to travel to work. Mode – Grouped Data Mode •Mode is the value that has the highest frequency in a data set. •For grouped data, class mode (or, modal class) is the class with the highest frequency. •To find mode for grouped data, use the following formula:   Δ1 Mode = Lmo +  i Δ + Δ  1 2  Where: i is the class width 1 is the difference between the frequency of class mode and the frequency of the class after the class mode  2 is the difference between the frequency of class mode and the frequency of the class before the class mode Lmo is the lower boundary of class mode Calculation of Grouped Data - Mode Example: Based on the grouped data below, find the mode Time to travel to work Frequency 1 – 10 11 – 20 21 – 30 31 – 40 41 – 50 8 14 12 9 7 Solution: Based on the table, Lmo 1 = 10.5, i = 10 = (14 – 8) = 6,2  6  Mode = 10.5   10  17.5 6  2   = (14 – 12) = 2 and Biostatistics Lecture 4 Descriptive Statistics “Indicators of dispersion” Dr. Alkilany 2012 Indicators of dispersion • If all observations are the same, there is no variability. If they are not all the same, then dispersion is present in the data. • Variation is an inherent characteristic of experimental observations due to several reasons. • it is always important to get an estimate of how much given objects tend to differ from that central tendency Dispersion (variability) • In any experiment, variation will depend on: • The instrument used for analysis. • The analyst performing the assay. • The particular sample chosen. • Unidentified error commonly known as noise. Dr. Alkilany 2012 Central tendency Indicators of dispersion (why we need them?) 1. A, B, C have the same mean A B C 2. Based on similarity of the mean, can we say the data sets are the same? 3. What is the differences between these data sets? 4. How we can describe the (differences)? Dr. Alkilany 2012 Dr. Alkilany 2012 Indicators of dispersion Standard deviation Coefficient of variation Quartiles Variance Box and whisker plot Dr. Alkilany 2012 Alpha Standard deviation Bravo • Two tabletting machines producing erythromycin tablets with a nominal content of 250 mg. • 500 tablets are randomly selected from each machine and their erythromycin contents was assayed. Mean~250 mg/tablet Although tablets from both machines had equal mean, do you think the two machine still differ? How? Mean~250 mg/tablet Dr. Alkilany 2012 Standard deviation • The two machines are very similar in terms of average drug content for the tablets, both producing tablets with a mean very close to 250 mg. However, the two products clearly differ. • With the Alpha machine, there is a considerable proportion of tablets with a content differing by more than 20 mg from the nominal dose (i.e. below 230 mg or above 270 mg), whereas with the Bravo machine, such outliers are much rarer. • An ‘indicator of dispersion’ is required in order to convey this difference in variability and to decide which one has better performance!! Dr. Alkilany 2012 Standard deviation n SD  2 (X  X)  i i1 n 1 •This is the standard deviation (SD) for the sample •For population it is usually donated : σ •Same unit of the mean  Standard deviation is a widely used measure of variability and central dispersion Let us go back to tabletting machines (raw data)! Dr. Alkilany 2012 n SD  Standard deviation  Xi  X  Xi __ X= Dr. Alkilany 2012 (X i1 i  X) 2 n 1 Standard deviation • The Alpha machine produces rather variable tablets and so several of the tablets deviate considerably from the overall mean. • These relatively large figures then feed through the rest of the calculation, producing a high final SD (8.72 mg). • In contrast, the Bravo machine is more consistent and individual tablets never have a drug content much above or below the overall average. • The small figures in the column of individual deviations, leading to a lower SD (3.78 mg). Dr. Alkilany 2012 Standard deviation • Reporting the SD: • The  symbol is used in reporting the SD • The symbol  reasonably interpreted as meaning ‘more or less’. •  is used to indicate variability. • With the tablets from our two machines, we would report their drug contents as: • Alpha machine: 248.78.72 mg (MeanSD mg) • Bravo machine: 251.13.78 mg (MeanSD mg) • The figures quoted before summarize the true situation. The two machines produce tablets with almost identical mean contents, but those from the Alpha machine are two to three times more variable. Dr. Alkilany 2012 Standard deviation and Coefficient of variation - Elephant tail=150±10 cm - Mouse tail=7±3 cm With this in mind, which is more variable: the elephant tail length results or the one for the mouse? CV  SD/ Mean *100 • Elephant tail: CV=10/150x100=6.7% • Mouse tail: CV= 3/7x100=42.8%  • Coefficient of variation (CV) expresses variation relative to the magnitude of data • Useful to compare variation in two or more sets of data with different mean values • CV is has no unit (it is a ratio!) Dr. Alkilany 2012 Variance • The Variance: 2 (population) or S2 (sample) is a measure of spread that is related to the deviations of the data values from their mean. Variance  SD2  sample Population 2 ( X  )   2 (X  X)   SD  N squared. If mean in mg, variance will • Unit: same as mean but 2 be in mg2 Dr. Alkilany 2012  n 1 2 Quartiles Q2 Q1 Median Q3 • Quartiles: the three points that divide the data set into four equal groups, each representing a fourth of the population being sampled • The median= Q2 First Quartile Q1 cuts off lowest 25% of data 25th percentile Second Quartile Q2 cuts data set in half 50th percentile Third Quartile Q3 cuts off highest 25% of data, 75th percentile or lowest 75% Dr. Alkilany 2012 Quartiles Interquartile range: difference between the upper and lower quartiles IQR= (Q3 – Q1) Dr. Alkilany 2012 Finding Quartiles • To find the quartiles for a set of data, do the following: 1.Arrange the data from smallest to highest (ordered array) 2.Locate the median (Q2) 3.The half to the left: locate their median (Q1) 4.The half to the right: Locate their median (Q3) Half to the left Q1 Median Q2 Dr. Alkilany 2012 Half to the right Q3 Finding Quartiles • Example with odd (n) Times needed for 15 tablets to disintegrate in minutes: 5, 10 10 10 10 12 15 20 20 25 30 30 40 40 60 1. Data is already in an order from smallest to highest 2. Median is the (n+1/2)th=8th=20 (in bold red) 3. For the half to the right: n=7, median=4th=10 minutes 4. For the half to the right: n=7, median=4th=30 minutes 5. Q1=10 minutes; Q2= 20 minutes; Q3=30 minutes. IQR=Q3-Q1=20 minutes 6. This means that 25% of tablets need less than 10 minutes to disintegrate. Also 50% of tablets need 20 minutes to disintegrate. Before 30 minutes, 75% of all tables were disintegrated. 25% only of these tablets need more than 30 minutes to disintegrate. This question can come in this form Disintegration time (min) Frequency 5 1 10 4 12 1 15 1 20 1 25 2 30 2 40 2 60 1 Total Dr. 15 Alkilany 2012 Finding Quartiles • Example with even (n) Times needed for 20 capsules to disintegrate in minutes: 5, 10, 10, 15, 15, 15, 15, 20, 20, 20, 25, 30, 30 40, 40, 45, 60, 60, 65, 85 1. Data is already in an order from smallest to highest 2. Median is the mean of the two middle values (n/2)th and ((n/2) + 1)th (in bold red)=10th and 11th=(20+25)/2= 22.5 3. For the half to the right: n=10, median=mean of 5th & 6th=15 minutes 4. For the half to the right: n=10, median=mean of 5th & 6th=42.5 5. Q1=15minutes; Q2= 22.5 minutes; Q3=42.5 minutes. IRQ=?? Disintegration time (min) Frequency 5 1 10 2 15 4 20 3 25 1 30 2 40 2 45 1 60 2 65 1 85 Total 1 Dr. Alkilany 2012 20 Quartiles • Consider the elimination half-lives of two synthetic steroids have been determined using two groups, each containing 15 volunteers. • The results are shown in the following table with the values ranked from lowest to highest for each steroid. Dr. Alkilany 2012 Quartiles and IQR as a measurement for data spread The IQR for the half life of steroid 2 is only half that for steroid 1, duly reflecting its less variable nature. Just as the median is a robust indicator of central tendency, the interquartile range is a robust indicator of dispersion. The interquartile range is a more useful measure of spread than range as it describes the middle 50% of the data values and thus less affected by outliers. Box and whisker plot • A box-and-whisker plot can be useful for handling many data values. • It shows only certain statistics rather than all the data. • Five-number summary is another name for the visual representations of the box-and-whisker plot. • The five-number summary consists of the median, the quartiles, and the smallest and greatest values in the distribution (not including outliers). • Immediate visuals of a box-and-whisker plot are the center, the spread, and the overall range of distribution. Box and whisker plot • The first step in constructing a box-and-whisker plot is to first find the median (Q2), the lower quartile (Q1) and the upper quartile (Q3) of a given set of data. • Example: The following set of numbers are weights of 10 patients in hospital (kgs): 75 1 62 62 2 67 78 3 73 96 4 75 73 5 78 93 6 79 85 7 81 81 8 85 67 9 93 79 10 96 Smallest (S) Q1 Median=78.5 (Q2) S L Q3 Q3 Q1 Largest (L) Q2 FYIP • Outliers (extreme values) are values that are much bigger or smaller (distant) than the rest of the data. • In order to be an outlier, the data value must be: • larger than Q3 by at least 1.5 times the interquartile range (IQR), or • smaller than Q1 by at least 1.5 times the IQR. • Represented by a dot on the box and whisker plot

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Q 1 - ISpatula