Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Unit I Frequency Distribution Statistics: The numerical facts in the preceding statements ($165,000, 79%, 25.3, 11%, $4.00 , $201,449,289, $5,000,000 and 8721) are called statistics. The term statistics refers to numerical facts such as averages, medians, percents, and index numbers that help us understand a variety of business and economic situations Application in business and economics : 1) 2) 3) 4) 5) Accounting : Public accounting firms use statistical sampling procedures when conducting audits for their clients. Finance : Financial analysts use a variety of statistical information to guide their investment recommendations. In the case of stocks, the analysts review a variety of financial data including price/earnings ratios and dividend yields Marketing : Electronic scanners at retail checkout counters collect data for a variety of marketing research applications Production : Today’s emphasis on quality makes quality control an important application of statistics in production. A variety of statistical quality control charts are used to monitor the out put of a production process. Economics : Economists frequently provide forecasts about the future of the economy or some aspect of it They use a variety of statistical information in making such forecasts For instance, in fore casting inflation rates, economists use statistical information on such indicators as the Producer Price Index, the unemployment rate, and manufacturing capacity utilization. Often these statistical indicators are entered into computerized forecasting models that predict inflation rates Data: Data are the facts and figures collected, analyzed, and summarized for presentation and interpretation. All the data collected in a particular study are referred to as the data set for the study. Element: Elements are the entities on which data are collected. Variables: A variable is a characteristic of interest for the elements. Observations: Measurements collected on each variable for every element in a study provide the data. The set of measurements obtained for a particular element is called an observation. Scales of Measurement: 1. Nominal scale : When the data for a variable consist of labels or names used to identify an attribute of the element, the scale of measurement is considered a nominal scale 2. Ordinal scale : The scale of measurement for a variable is called an ordinal scale if the data exhibit the properties of nominal data and the order or rank of the data is meaningful 3. Interval scale: The scale of measurement for a variable is an interval scale if the data have all the properties of ordinal data and the interval between values is expressed in terms of a fixed unit of measure. Interval data are always numeric 4. Ratio scale: The scale of measurement for a variable is a ratio scale if the data have all the properties of interval data and the ratio of two values is meaningful. Variables such as distance, height, weight, and time use the ratio scale of measurement. Quantitative data: Numeric values that indicate how much or how many of something data are obtained using either the interval or ratio scale of measurement Quantitative variable: A variable with quantitative data Descriptive statistics: Tabular, graphical, and numerical summaries of data. Frequency: is the number of occurrences of a repeating event per unit time. It is also referred to as temporal frequency. Relative Frequency : Relative Frequency = Frequency of the class Total Frequency Percent Frequency: Percent Frequency of a class = Relative frequency * 100. Cumulative Frequency: Cumulative frequency is nothing but the running total of frequencies. OR It is defined as the sum of all previous frequencies up to the current point. Frequency Density: It is defined as the number of observations per unit of its width. Frequency density gives the rate of concentration of observation in a class. Frequency Density = Frequency of the class Width of the class Tabular Representation of Data : , 1. Bar Charts : A bar chart is a graphical device for depicting categorical data summarized in a frequency ,relative frequency, or percent frequency distribution. On one axis of the graph (usually the horizontal axis), we specify the labels that are used for the classes (categories).Afrequency relative frequency, or percent frequency scale can be used for the other axis of the chart. Eg 2. Pie Chart : The pie chart provides another graphical device for presenting relative frequency and percent frequency distributions for categorical data. To construct a pie chart, we first draw circle to represent all the data. Then we use the relative frequencies to subdivide the circle into sectors, or parts, that correspond to the relative frequency for each class. Use Last example Summarizing Quantitative Data Width of Class interval: The approximate size of a class interval can be decide by the use of the following formula Class Interval = Largest observation−Smallest observation Number of class intervals Class Limits: Class limits are the smallest and largest observations (data, events etc) in each class. Therefore, each class has two limits: a lower and upper. Class limit for various class intervals can be done in two ways: 1) Exclusive method( class ) : in this method the upper limit is taken to be equal to the lower limit 2) Inclusive method (class): here all observations with the magnitude greater than or equal to the lower limit and less than or equal to the upper limit of a class are included in it. Class Mark or Mid Value of a class or central value: The average of the values of the class limits for a given class. A class mark is also called a Midvale or central value. 1. In exclusive type of class intervals the mid value of a class is defined as the arithmetic mean of its lower and upper limit. 2. In case of inclusive class intervals, there is a gap between the upper limit and lower limit of class which is eliminated by the class boundaries. Class Boundaries: Class Boundaries are the midpoints between the upper class limit of a class and the lower class limit of the next class in the sequence. Therefore, each class has an upper and lower class boundary Dot Plot : One of the simplest graphical summaries of data is a dot plot. Ahorizontal axis shows the range for the data. Each data value is represented by a dot placed above the axis Example : : Histogram : A common graphical presentation of quantitative data is a histogram. This graphical summary can be prepared for data previously summarized in either a frequency, relative frequency, or percent frequency distribution. A histogram is constructed by placing the variable of interest on the horizontal axis and the frequency, relative frequency, or percent frequency on the vertical axis. Example : last example data used… Ogive A graph of a cumulative distribution, called an ogive, shows data values on the horizontal axis and either the cumulative frequencies, the cumulative relative frequencies, or the cumulative percent frequencies on the vertical axis Example : : Solution : : Example: Example : Calculate cumulative frequency, relative frequency, percent frequency of the following data : Soft Drink Coke Diet coke Pepsi Thums up Sprite Frequency 19 8 5 13 5 Solution : Softt Drink Frequency Coke Diet coke Pepsi Thums up Sprite Total 19 8 5 13 5 50 Cumulative frequency 19 27 32 45 50 Relative frequency 0.38 0.16 0.10 0.26 0.10 1.00 Percent frequency 38 16 10 26 10 100 The Stem-and-Leaf Display The techniques of exploratory data analysis consist of simple arithmetic and easy-to-draw graphs that can be used to summarize data quickly.One technique—referred to as a stem-and-leaf display Example : Solution : To develop a stem-and-leaf display, we first arrange the leading digits of each data value to the left of a vertical line. To the right of the vertical line, we record the last digit for each data value. Based on the top row of data in Table 2.8 (112, 72, 69, 97, and 107), the first five entries in constructing a stem-and-leaf display would be as follows: Crosstabulations A crosstabulation is a tabular summary of data for two variables Scatter Diagrams : The scatter diagram is one of the tools of quality. A scatter diagram is a graphical technique used to analyze the relationship between two variables. It shows whether or not there is correlation between two variables. Correlation refers to the measure of the relationship between two sets of numbers or variables Different types of scatter diagram Measure of Central Tendency Average: “Average is a value which is typical or representative of a set of data” Murray R. Spiegal Various Meaures of average: A) Mathematical Averages 1) Arithmetic mean 2) Geometric mean 3) Harmonic mean 4) Quadratic mean B) Positional Averages 1) Median 2) Mode Arithmetic mean : It is defined as the sum of observations divided by the number of observations. It can be computed in two ways 1) Simple arithmetic mean 2) Weighted arithmetic mean Simple Arithmetic Mean: a) When individual observations are given : 𝑋𝑖 1) Direct Method : 𝑋̅ = ∑ 𝑛 𝑑𝑖 2) Shortcut Method : 𝑋̅ = A+ ∑ 𝑛 , where 𝑑 = 𝑋 − 𝐴 𝑑𝑖 3) Step deviation method : 𝑋̅ = A+ ∑ 𝑛 ∗ 𝑖 , where 𝑑 = 𝑋−𝐴 𝑖 Example: The following figures related monthly output of cloth of a factory in a given year. Month: Jan Feb Mar Apr May Jun July Aug Sep Oct Nov Dec Output:80 88 92 84 96 92 96 100 92 94 98 86 Calculate the average (mean, arithmetic mean) monthly Solution: Direct Method 𝑋̅ = 88+92+84+96+92+96+100+92+94+98+86 12 = 91.5 meters Short cut method: Where A= assumed mean, subtract A from every observation. Take A =90 Months Jan Feb Mar Apr May Jun July Aug Output 80 88 92 84 96 92 96 100 di = X- 90 -10 -2 2 -6 6 2 6 10 Sep Oct Nov Dec 92 94 98 86 2 4 8 -4 ∑ 𝑑𝑖 = 18 18 𝑋̅ = 90+ 12 = 91.5 𝑚𝑡𝑟𝑠 b) When data are in the form of ungrouped frequency distribution 1) Direct Method : 𝑋̅ = ∑ 𝑓𝑖𝑋𝑖 𝑁 2) Shortcut method : 𝑋̅ =A+ ∑ 𝑓𝑖𝑑𝑖 𝑁 𝑓𝑖𝑑𝑖 3) Step deviation method : 𝑋̅ = A+ ∑ 𝑛 ∗ 𝑖 , where 𝑑 = 𝑋−𝐴 𝑖 Example: The following is the frequency distribution of age of 670 students of a school. Compute the arithmetic mean of the data. X : 5 6 7 8 9 10 11 12 13 14 F : 25 45 90 165 112 96 81 26 18 12 Solution: Direct Method X 5 6 7 8 9 10 11 12 13 14 f 25 45 90 165 112 96 81 26 18 12 fx 125 270 630 1320 1008 960 891 312 234 168 ∑ 𝑓𝑥 = 5918 Type equation here. 𝑋̅ = ∑ 𝑓𝑖𝑋𝑖 𝑁 = 𝑋̅ = 5918 670 = 8.83 Shortcut method: d = X –A , X f 5 25 6 45 Here A = 8 d = X-A -3 -2 fd -75 -90 7 8 9 10 11 12 13 14 90 165 112 96 81 26 18 12 -1 0 1 2 3 4 5 6 ∑ 𝑓 = 670 𝑋̅ = 𝐴 + ∑ 𝑓𝑑 𝑁 = 8+ 558 670 -90 0 112 192 243 104 90 72 ∑ 𝑓𝑑 = 558 = 8 + 0.83 = 8.83 years c) When data are in the form of a grouped frequency distribution or continuous series in Exclusive class Question: calculate Arithmetic mean of the following distribution Class intervals :0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 Frequency : 3 8 12 15 18 16 11 5 Solution: here A = 35 Class intervals Mid Value X 0-10 5 10-20 15 20-30 25 30-40 35 40-50 45 50-60 55 60-70 65 70-80 75 Frequency 3 8 12 15 18 16 11 5 d = X -35 -30 -20 -10 0 10 20 30 40 ∑ 𝑓 = 88 𝑓𝑑 𝑋̅ = 𝐴 + ∑ 𝑁 = 35 + 660 88 fd -90 -160 -120 0 180 320 330 200 ∑ 𝑓𝑑 = 660 = 42.5 d) When data are in the form of a grouped frequency distribution or continuous series in Inclusive class Example: Class intervals : Frequency : 240-269 270-299 300-329 330-359 360-389 390-419 420-449 7 19 27 15 12 12 8 Solution: here A = 344.5 Class intervals Mid Value X 240-269 254.5 270-299 284.5 Frequency 7 19 d = X -344.5 -90 -90 fd -630 -1140 300-329 330-359 360-389 390-419 420-449 314.5 344.5 374.5 404.5 434.5 27 15 12 12 8 -30 0 30 60 90 -810 0 360 720 720 ∑ 𝑓 = 100 𝑓𝑑 i.e. 𝑋̅ = 𝐴 + ∑ 𝑁 = 344.5 + −780 100 ∑ 𝑓𝑑 = −780 = 336.7 E) Step Deviation Method : 𝑓𝑑 𝑋̅ = 𝐴 + ∑ 𝑁 ∗ 𝑖 Here i = class interval, d = Example: class interval: Frequency 𝑋−𝐴 𝑖 0-5 5-10 20 7 10-15 15-20 20-25 25-30 2 9 10 5 Solution: Take A =12.5 Class interval 0-5 5-10 10-15 15-20 20-25 25-30 F 20 7 2 9 10 5 Mid value ( X) 2.5 7.5 12.5 17.5 22.5 27.5 ∑ 𝑓 = 53 𝑓𝑑 −3 𝑋̅ = 𝐴 + ∑ 𝑁 ∗ 𝑖 = 12.5 + 53 ∗ 5 = Rs.12.22 d= (X-A)/i -2 -1 0 1 2 3 fd -40 -7 0 9 20 15 ∑ 𝑓 = −3 MEDIAN: It is the another measure of central location for a variable. The median is the value in the middle when the data are arranged in ascending order. (A) When individual observation are given a) For odd number of observations (terms) the median is middle value i.e. b) For an even number of observations i.e. 1 2 𝑁 𝑁 2 2 𝑁+1 2 term { 𝑡𝑒𝑟𝑚 + ( + 1) 𝑡𝑒𝑟𝑚 } Example: Find the median of the following observation 20,15,25,28,18,16,30 Solution: Observations arranged in ascending order We get 15,16,18,20,25,28,30 Since number of terms is 7 i.e. odd, the median is the size of ( 7+1 2 )𝑡ℎ, 𝑖. 𝑒. 4𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 Hence median denoted by Md = 20 Example : find the median of the following 245,230,265,236,220,250 Solution : Arrange the observation in ascending order 220,230,236,245,250,265 Since number of terms is 6 i.e. even, the median is the size ie.. 1 𝑁 𝑁 2 2 2 = { 𝑡𝑒𝑟𝑚 + ( + 1) 𝑡𝑒𝑟𝑚 } = = 1 2 1 2 1 6 6 2 2 { 𝑡𝑒𝑟𝑚 + ( + 1) 𝑡𝑒𝑟𝑚 } { 3 𝑟𝑑 𝑡𝑒𝑟𝑚 + 4𝑡ℎ 𝑡𝑒𝑟𝑚 } = 2 (236 + 245) = 240.5 (B) When ungrouped frequency is given : in this case calculate cumulative frequency Example: Locate median of the following frequency distribution Variable (X) : 10 11 12 13 14 15 16 Frequency (f) : 8 15 25 20 12 10 5 Solution : X 10 11 12 13 14 15 16 F 8 15 25 20 12 10 5 c.f 8 23 48 68 80 90 95 Median is (95+1)/2 = 48 i.e. median is 12 C) Median in Continuous Series 𝑁 2 ( )−𝑐.𝑓 a. Md = L + 𝑓 ∗𝑖 Example : The following table shows the distribution of marks by 500 students in an examination. Obtain median from the following data Marks F 0-9 30 10-19 40 20-29 50 30-39 48 40-49 24 50-59 162 60-69 132 Solution : Class intervals 0-9 10-19 20-29 30-39 40-49 50-59 Class biundaries 0.5-9.5 9.5-19.5 19.5-29.5 29.5-39.5 39.8-49.5 49.5-59.5 F 30 40 50 48 24 162 c.f 30 70 120 168 192 354 70-79 14 60-69 70-79 59.5-69.5 69.5-79.5 132 14 186 500 Since N/2 = 250, the median class is 49.5-59.5 and therefore L=49.5, i=10,f=162 c.f =192 Substitute the values in formula 𝑁 2 ( )−𝑐.𝑓 Md = L + 𝑓 Md = 49.5 + ∗𝑖 250−192 162 ∗ 10 = 53.08 Example :the weekly wages of 1000 workers of a factory are shown in the following table. Calculate median Weekly wages( less than) No. of workers 425 475 525 575 625 675 725 775 825 875 2 10 43 423 293 506 719 864 955 1000 Ans = 673.59 Mode: The "mode" is the value that occurs most frequently or which has the greatest frequency. If no number is repeated, then there is no mode for the list. Question : Calculate mode from the following data obtained by the marks of the students Roll no Marks 1 20 2 30 3 31 4 32 5 25 6 25 7 30 8 21 9 30 10 32 Solution : Number of item it occurs 20 1 25 2 30 3 31 2 32 2 total 10 Since the item 30 occurs maximum number of times i.e. 3, hence the mode (or modal) marks are 30 Size of item Mode in continuous series := L + 𝑓1−𝑓0 2𝑓1−𝑓—𝑓2 ∗𝑖 Where L= Lower limit of the class f1= frequency of the class (or modal class) f0= frequency of the previous class(Pre-modal class) f1= frequency of the next class(post-modal class) Question : the frequency distribution of marks obtained by 60 students of a class in a college given below Marks 30-34 35-39 Frequency 3 5 Find mode of the distribution. 40-44 12 45-49 18 50-54 14 55-59 6 60-64 2 49.5-54.5 14 54.5-59.5 6 59.5-64.5 2 Solution : convert the inclusive class into exclusive class Marks 29.5-34.5 34.5-39.5 39.5-44.5 44.5-49.5 Frequeny 3 5 12 18 Highest value is 18 i.e the modal class is 44.5-49.5 Here L = 49.5 , f1 = 18 , f0 = 12 , f2 = 14, i= 5 … values substitute in th formula := L + 𝑓1−𝑓0 2𝑓1−𝑓—𝑓2 18−20 := 49.5 + ∗𝑖 2∗18−12−14 ∗5 Mode = 47.5 marks Quaritle : Divides the distribution into 4 equal parts are known as quartile. Q1 : is known as first quartile Q2 : it is known as second quartile or it is known as middle quartile or its called median. Q3. It is known as upper quartile Q1 < Q2 < Q3 Computation of quartiles 2. In case of individual and discrete series ( after arranging the size of items in asecending or descending order) a. Q1 = Size of b. Q2 = size of c. Q3 = Size of 𝑁+1 th item 4 2(𝑁+1) 4 3(𝑁+1) 4 th item th item 3. In case of continuous series ( i.e frequencies class with interval ) 𝑁 4 ( )−𝑐.𝑓 a. Q1 = L + 𝑓 ( b. Q2 = L + ∗𝑖 2𝑁 )−𝑐.𝑓 4 𝑓 ( c. Q3 = L + Where L c.f, f i ∗𝑖 3𝑁 )−𝑐.𝑓 4 𝑓 ∗𝑖 = lower limit of the class = cumulative frequency of the previous class = frequency of the class = class interval Coefficient of Quartile deviation = 𝑄3−𝑄1 𝑄3+𝑄1 Range: It is the difference between the largest and smallest value. Usually it is denotes by R a) For individual or discrete series = L-S , b) For continuous series = UL- LS , where L= Largest value, S= smallest value where UL- = Upper limit of largest value of the class , LS = Lower limit of smallest value of the class c) Coefficient of range individual or discrete series = = d) Coefficient of range continuous series = = 𝐿−𝑆 𝐿+𝑆 UL− LS UL+ LS e) Interquartile range = Q3-Q1 Note : Frequency are not considere at all for computing for range and coefficient of range. Percentile: The value of a variate which divides a given series or distribution into 100 equal parts. It is denotes by P a) In case of individual series or discrete series after arrangement Pj = Size of 𝑗(𝑁+1) 100 th item where j = 1 to 99 b) In case of continuous series Pj = L + ( 𝑗𝑁 )−𝑐.𝑓 100 𝑓 ∗𝑖 Example : calculate Q1,Q3,P70,P10,P90 & interquartile range Marks No. of student 0-10 10 10-20 20 20-30 30 30-40 50 Solution : Marks 0-10 10-20 20-30 30-40 40-50 50-60 No. of student 10 20 30 50 40 30 c.f 10 30 60 110 150 180 calculation of Q1 N/4 th item= 180/4 = 45 th item 45 th item lies = 20 – 30 L = lower limit of the class = 20, c.f = 30, f = 30, i = 10 40-50 40 50-60 30 𝑁 4 ( )−𝑐.𝑓 Q1 = L + 𝑓 ∗ 𝑖 = 20 + 45−30 30 ∗ 10 = 25 Calculation of Q3 ( Q3 = L + 3𝑁 )−𝑐.𝑓 4 𝑓 ∗𝑖 Here 3N/4 = 3*180 / 4 = 135, 135 item lies in 40-50 class L=40 , c.f =110 , f =40 , i=10 135−110 Q3 = 40 + 40 ∗ 10 = 46.25 Calculation of P70 P70 = L + ( 70𝑁 )−𝑐.𝑓 100 𝑓 ∗𝑖 Here 70N/100 = 70*180/100 => 126 item 126 item lies in class 40-50 then L = 40 , c.f = 110 , f = 40 , i= 10 Substitute values in formula P70 = 40 + 126−110 40 ∗ 10 = 44 Calculation of P10 P10 = L + ( 10𝑁 )−𝑐.𝑓 100 𝑓 ∗𝑖 Here 10N/100 = 10*180/100 => 18 item 18 item lies in class 10-20 then L = 10 , c.f = 10 , f = 20 , i= 10 Substitute values in formula P70 = 40 + 18−10 20 ∗ 10 = 14 Calculation of P90 P90 = L + ( 90𝑁 )−𝑐.𝑓 100 𝑓 ∗𝑖 Here 90N/100 = 90*180/100 => 162 item 162 item lies in class 50-60 then L = 50 , c.f = 150 , f = 300 , i= 10 Substitute values in formula P90 = 50 + 162−150 30 ∗ 10 = 54 Interquartile range = Q3- Q1 = 46.25 – 25 = 21.25 Standard Deviation: The squares of the deviations from arithmetic mean are taken and the positive square root of the arithmetic mean of sum of squares of these deviations is taken as a measure of dispersion. This measure of dispersion is known as standard deviation or root mean square deviation.It is denoted by greek letter small sigma. Square of standard deviation is known as variance. OR Where ∑f = N Coefficient of Variation :( 𝑆.𝐷 𝑀𝑒𝑎𝑛 ∗ 100 )% Note : The coefficient of variation is useful when we wish to compare the variability of two data sets relative to the general level of values (and thus relative to the mean) in each set. Question : Find the standard deviation , variance , coefficient of variation from the following data Age under 10 20 30 40 50 60 70 80 No. of persons dying 15 30 53 75 100 110 115 125 Solution : Age 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 Mid value 5 15 25 35 45 55 65 75 f 15 15 23 22 25 10 5 10 ∑f= 125 D = X-A -30 -20 -10 0 10 20 30 40 fd -450 -300 -230 0 250 200 150 400 ∑fd = 20 fd2 13500 6000 2300 0 2500 4000 4500 16000 ∑fd = 48800 fx 75 225 575 770 1125 550 325 750 ∑fx = 4395 Values substitute in the formula 44800 √ 125 20 − ( 125 ) 2 = √358.4 − 0.000256=18.931 Variance = square of S.D. = (18.931)2 = 358.382 ( Approx) Mean = ∑ 𝑓𝑖𝑋𝑖 = 𝑁 4395 125 = 35.16 Coefficient of variation = 𝑆.𝐷 𝑀𝑒𝑎𝑛 18.931 ∗ 100 = 35.16 ∗ 100 = 53.84 % (Approx) Example: find the standard deviation of the following distribution Age 20-25 25-30 30-35 35-40 40-45 45-50 No. of persons 170 110 80 45 40 35 Take assumed average = 32.5 Ans. 7.936 (Approx) Weighted Mean : The weighted mean or weighted average is an arithmetic mean in which each value is weighted according to its importance in the overall group.The formulas for the population and sample weighted means are identical 𝑖𝑛 𝑐𝑎𝑠𝑒 𝑜𝑓 𝑖𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙 𝑠𝑒𝑟𝑖𝑒𝑠 ∑ 𝑊𝑋 𝑋̅ = ∑𝑊 In case of frequency distribution 𝑋̅ = ∑ 𝑊(𝑓𝑋) ∑𝑊 Example : comment on the performance of the students of the three colleges given below using simple and weighted averages. College course Pass % M.A M.Com B.A B.Com B.Sc M.Sc 71 83 73 74 65 66 ‘A’ No. of students in 100 3 4 5 2 3 3 Pass % ‘B’ No. of students in 100 2 3 6 7 3 7 82 76 73 76 65 60 Pass % ‘C’ No. of students in 100 2 3.5 4.5 2 7 2 81 76 74 58 70 73 Solution : Colleg e course Pass % X M.A M.Co m B.A B.Com B.Sc M.Sc 71 83 73 74 65 66 ∑X=43 2 ‘A’ No. of student s in 100(W) 3 4 WX Pass %(X) 213 332 82 76 5 2 3 3 ∑W=20 365 148 195 198 ∑WX=145 1 73 76 65 60 ∑X=43 2 ‘B’ No. of student s in 100(W) 2 3 WX Pass %(X) 164 228 81 76 6 7 3 7 ∑W=28 438 532 195 420 ∑WX=797 7 74 58 70 73 ∑X=43 2 ‘C’ No. of student s in 100(W) 2 3.5 WX 4.5 2 7 2 ∑W=21 333 116 490 146 ∑WX=151 3 162 266 Simple and Weighted Arithmetic mean College ‘A’ = 𝑋̅ = ∑𝑋 = 432 6 =72 College ‘A’ = 𝑋̅ = ∑ 𝑊𝑋 College ‘B’ = 𝑋̅ = ∑𝑋 = 432 6 =72 College ‘B’ = 𝑋̅ = ∑ 𝑊𝑋 ∑𝑋 College ‘C’ = 𝑋̅ = 𝑁 = 432 6 =72 𝑁 𝑁 Measure of Association Between two Variables: Covariance : ∑(𝑋− 𝑋̅) ( 𝑌−𝑌̅ ) 𝑁 (𝑋 − 𝑋̅) = 𝑖𝑡𝑠 𝑐𝑎𝑙𝑙𝑒𝑑 𝑚𝑒𝑎𝑛 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑋 = 1451 20 =72.55 = 1977 28 =70.61 ∑ 𝑊𝑋 College ‘C’ = 𝑋̅ = 𝑊 = 1513 21 =72.05 𝑊` 𝑊 (Y− 𝑌̅) = 𝑖𝑡𝑠 𝑐𝑎𝑙𝑙𝑒𝑑 𝑚𝑒𝑎𝑛 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑌 Correlation : Correlation measures the degree of linear association between two variables, say X and Y 𝑐𝑜𝑣(𝑥,𝑦) 𝜎𝑥∗𝜎𝑦 Karl Pearson’s Coefficient of Correlation : rxy = Example : calculate the Karl pearsons coefficient of correlation from the following data Height of father 66 68 69 72 65 59 62 67 61 71 Height 65 of sons Solution : 64 67 69 64 60 59 68 60 64 Height of father 66 68 69 72 65 59 62 67 61 71 ∑X=660 Height of son (𝑋 − 𝑋̅) (Y− 𝑌̅) (𝑋 − 𝑋̅)2 (Y− 𝑌̅)2 65 64 67 69 64 60 59 68 60 64 ∑Y=640 0 2 3 6 -1 -7 -4 1 -5 5 1 0 3 5 0 -4 -5 4 -4 0 0 4 9 36 1 49 16 1 25 25 ∑(𝑋 − 𝑋̅)2 =166 1 0 9 25 0 16 25 16 16 0 ∑(Y− 𝑌̅)2 =108 𝑋̅ = 660/10=66, 𝑌̅ = 640/10 = 64 Values substitute in formula = Calculate cov(x,y) = 𝑐𝑜𝑣(𝑥,𝑦) 𝜎𝑥∗𝜎𝑦 ∑(𝑋− 𝑋̅) ( 𝑌−𝑌̅ ) 𝑁 = 𝟏𝟏𝟏 𝟏𝟎 = 11.1 𝟏𝟔𝟔 Standard devation of x , σx=√ 𝟏𝟎 = 𝟒. 𝟎𝟕 𝟏𝟎𝟖 Standard devation of y , σy=√ 𝟏𝟎 = 𝟑. 𝟐𝟖 11.1 Karl pearson’s coefficient of correlation = 4.07∗3.28= 0.83 (𝑋 − 𝑋̅) (Y− 𝑌̅) 0 0 9 30 0 28 20 4 20 0 ∑ 𝑋 − 𝑋̅) (Y− 𝑌̅) = 111 Question : calculate the coefficient of correlation between age group and rate of mortality from the following data Age group Rate of mortality 0-20 350 Hint ; find mid value of the class 20-40 280 40-60 540 60-80 760 Ans : 0.95(Approx) 80-100 900