Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Business Statistics Course Index S. No. Reference No. 1. Chapter 1 Introduction to Business Statistics 08 – 21 2. Chapter 2 Descriptive Statistics: Collection, Processing and Presentation of Data 22 – 36 3. Chapter 3 Measures of Central Tendency 37 – 51 4. Chapter 4 Measures of Dispersion 52 – 66 5. Chapter 5 Skewness and Kurtosis 67 – 79 6. Chapter 6 Correlation Analysis 80 – 98 7. Chapter 7 Regression Analysis 99 – 114 8. Chapter 8 Theory of Probability 115 – 134 9. Chapter 9 Probability Distribution 135 – 153 10. Chapter 10 Use of Excel Software for Statistical Analysis 154 – 194 1– 2 Particulars Slide From – To Course Introduction Managerial decision-making can be made efficient and effective by analyzing available data using appropriate statistical tools. Statistical not only have application in research (marketing research functional areas like quality management, analysis, human resource planning 1– 3 tools included) but also in other inventory management, financial and so on. Cont…. The word statistics is derived from the Italian word ‘Stato’ which means ‘state’; and ‘Statista’ refers to a person involved with the affairs state. Thus, statistics originally was meant for collection of facts affaires of the state, like taxes, land records, population 1– 4 useful of for demography, etc. Cont…. Significant contribution has also been made by Indians in the field of statistics. Prof Prasant Chandra Mahalanobis, is the first to pioneer the study of statistical science in India. He founded the Indian Statistical Institute (ISI) in1931. Mahalanobis viewed statistics as a all human efforts and also tool in increasing the efficiency of concentrated on sample surveys. Statistics are the classified facts representing the conditions of the people in the state…. specially those facts which can be stated in number or in table of numbers or in any tabular or classified arrangement”. – Webster 1– 5 Cont…. Statistical methods are broadly divided into five categories. These are Descriptive Statistics, Analytical Statistics, Inductive Statistics, Inferential Statistics, Applied Statistics Statistics is an indispensable tool of production control and market research. Statistical tools are extensively used in business for time motion study, consumer behaviour study, investment decisions, measurements and compensations, credit ratings, inventory accounting, quality control, distribution 1– 6 and performance management, channel design, etc. Cont…. Statistical analysis is a vital component in every aspect of research. Social surveys, laboratory experiment, clinical trials, marketing research, human resource planning, inventory management, quality management etc., require statistical treatment before arriving at valid conclusions. Functions of statistics are Condensation, Comparison, Forecast, Testing of hypotheses, Preciseness, Expectation. Statistical techniques, because of their flexibility and economy, have become popular and are used in numerous fields. But statistics is not cure-all technique and has limitations. It cannot be applied to all situations and cannot be made to answer all queries. 1– 7 kinds a of Introduction to Business Statistics S. No. Reference No. 1. 1– 8 Particulars Slide From – To Learning Objectives 09 – 09 2. Topic 1 Introduction 10 – 10 3. Topic 2 Development of Statistics 11 – 11 4. Topic 3 Definitions of Statistics 12 – 12 5. Topic 4 Importance of Statistics 13 – 13 6. Topic 5 Classification of Statistics 14 – 14 7. Topic 6 Role of Statistics 15 – 15 8. Topic 7 Functions of Statistics 16 – 16 9. Topic 8 Limitations of Statistics 17 – 17 10. Topic 9 Summary 18 – 21 Learning Objectives After studying this chapter, you should be able to: Understand the development, importance and role of statistics Explain the basic concept of statistical studies Understand the application of statistics in business and management Learn about functions and limitations of statistics 1– 9 Introduction Information derived from good statistical analysis is always precise and never useless. One of the primary tasks of a manager is decision-making. Statistical techniques offer powerful tools in the decision-making process. These tools have power to interpret quantitative information in a scientific an objective manner. 1– 10 and Development of Statistics The word statistics is derived from the Italian word ‘Stato’ which means ‘state’; and ‘Statista’ refers to a person involved with the affairs of state. Statistics originally was meant for collection of facts useful for affaires of the state, like taxes, land records, population demography, etc. During ancients times even before 300BC, the rulers and kings, like Maurya used statistics to maintain the land and revenue and registration of births and deaths. 1– 11 Chandragupta records, collection of taxes Definitions of Statistics “Statistics are the classified facts representing the conditions of the people the state…. specially those facts which can be stated in number or in in table of numbers or in any tabular or classified arrangement”. – Webster “By statistics we mean quantitative data affected to a marked extent by multiplicity of causes”. –Yule and Kendall “Statistics may be defined as the science of collection, presentation, analysis and interpretation of data”. – Croxton and Cowden 1– 12 Importance of Statistics Identify what information or data is worth collecting, Decide when and how judgments may be made on the basis of partial information, Measure the extent of doubt and risk associated with the use of partial information and and stochastic processes. 1– 13 Classification of Statistics 1– 14 Role of Statistics Role of Statistics in Business Role of Statistics in Decision Making 1– 15 Role of Statistics in Research Functions of Statistics Laws of Statistics 1– 16 Condensation Comparison Forecast Testing of Hypotheses Preciseness Expectation Limitations of Statistics COMMON STATISTICAL ISSUES DISTRUST OF STATISTICS MISUSE OF STATISTICS 1– 17 Summary Managerial decision-making can be made efficient and effective by analyzing available data using appropriate statistical tools. Statistical tools not only application in research (marketing research included) but also in other like quality management, inventory management, financial have functional areas analysis, human resource planning and so on. The word statistics is derived from the Italian word ‘Stato’ which means ‘state’; ‘Statista’ refers to a person involved with the affairs of state. Thus, statistics meant for collection of facts useful for affaires of the state, like taxes, and originally land was records, population demography, etc. Cont…. 1– 18 Significant contribution has also been made by Indians in the field of Prasant Chandra Mahalanobis, is the first to pioneer the study India. He founded the Indian Statistical Institute (ISI) as a tool in increasing the efficiency of of statistics. statistical Prof science in in1931. Mahalanobis viewed statistics all human efforts and also concentrated on sample surveys. Statistics is the classified facts representing the conditions of the people in state…. specially those facts which can be stated in number or in table of the numbers or in any tabular or classified arrangement. Statistical methods are broadly divided into five categories. These are Statistics, Analytical Statistics, Inductive Statistics, Inferential Statistics Descriptive and Applied Statistics. Cont…. 1– 19 Statistics is an indispensable tool of production control and market research. Statistical tools are extensively used in business for time and motion study, consumer behaviour study, investment decisions, performance measurements and compensations, credit ratings, inventory management, accounting, quality control, distribution channel design, etc. Statistical analysis is a vital component in every aspect of research. Social surveys, laboratory resource planning, experiment, inventory clinical management, trials, quality marketing research, management, etc., human require statistical treatment before arriving at valid conclusions. Functions of statistics are Condensation, Comparison, Forecast, Testing of hypotheses, Preciseness and Expectation. Cont…. 1– 20 Statistical techniques, because of their flexibility and economy, have become popular and are used in numerous fields. But statistics is not a cure-all has limitations. It cannot be applied to all kinds of situations technique and and cannot be made to answer all queries. More dangerous than distrust is misuse of statistics to draw convenient conclusions satisfy selfish or ulterior motives. Arguments and analysis charts, graphs, index numbers, etc. are indeed can be used to intimidate opposing 1– 21 to supported by facts, figures, very appealing and convincing. They views. Hence, statistics is open to manipulation. Descriptive Statistics: Collection, Processing and Presentation of Data S. No. Reference No. 1. 1– 22 Particulars Slide From – To Learning Objectives 23 – 23 2. Topic 1 Introduction 24 – 24 3. Topic 2 Descriptive and Inferential Statistics 25 – 26 4. Topic 3 Collection of Data 27 – 27 5. Topic 4 Editing and Coding of Data 28 – 28 6. Topic 5 Classification of Data 29 – 29 7. Topic 6 Tabulation of Data 30 – 30 8. Topic 7 Diagrammatic and Graphical Presentation of Data 31 – 32 9. Topic 8 Summary 33 – 36 Learning Objectives After studying this chapter, you should be able to: Describe descriptive and inferential statistics Explain collection, editing and classification of primary and secondary data Define tabulation and presentation of data Understand diagrammatic and graphical presentation Understand Bar diagram, Histogram, Pie Diagram, Frequency polygons and Ogives 1– 23 Introduction Success of any statistical investigation depends on the availability of accurate and reliable data. These depend on the appropriateness of the method chosen for data Data collection is a very basic activity in decision-making. Data may be classified either as primary data or secondary data. Successful use of the collected data depends to a great extent upon the arranged, displayed and summarized. 1– 24 collection. way it is Descriptive and Inferential Statistics Descriptive Statistics Descriptive statistics is the type of statistics that probably comes to most of the minds of people when they hear the word “statistics.” Cont…. 1– 25 Inferential Statistics Inferential statistics studies a statistical sample, from this analysis we are able to say something about the population from which the sample came. 1– 26 and Collection of Data 1– 27 Editing and Coding of Data Coding of Data Editing Primary Data Completeness Consistency Coding assigning Accuracy Homogeneity is the some process symbols of either alphabetical or numeral or both to the answers so that the responses can be recorded into Editing Secondary Data Field Editing Central Editing 1– 28 a limited number of classes or categories. Classification of Data Classification grouping refers of homogeneous to the data into classes and 2 1 Bases of Rules of Classification Classification categories. It is the process of arranging things in groups or classes according to resemblances and affinities. 1– 29 their 3 Frequency Distribution Tabulation of Data Tabulation the data (two format is in flat dimensional by table arrays) grouping the observations. Table is with rows a Types of Tabulation arranging One – Way Tabulation Advantages of Tabulation spreadsheet and columns with headings and stubs indicating data. 1– 30 class of the Two – Way Tabulation Multi – Way Tabulation Diagrammatic and Graphical Presentation of Data Difference Between Diagrams And Graphs Difference between Diagram and Graphs Diagram Graph 1. Can be drawn on an ordinary paper. 1. Can be drawn on a graph paper. 2. Easy to grasp. 2. Needs some effort to grasp. 3. Not capable of analytical treatment. 3. Capable of analytical treatment. 4. Can be used only for comparisons. 4. Can be used to represent a mathematical relation. 5. Data are represented by bars, and rectangles, pictures, etc. 5. Data are represented by lines curves. Cont…. 1– 31 TYPES OF DIAGRAMS BAR DIAGRAM HISTOGRAM PIE DIAGRAM FREQUENCY POLYGON OGIVES 1– 32 Summary There are two major divisions of the field of statistics, namely descriptive and inferential statistics. Both the segments of statistics are important, and accomplish different objectives. Data can be obtained through primary source or secondary source according need, situation, convenience, time, resources and availability. The most for primary data collection is through questionnaire. Data based so that it helps a decision-maker to arrive at interested in. In other situations, data may constitute important method must be objective and fact- a better decision. Statistical data is a set of facts expressed in quantitative form. Data is various methods. Sometimes our data set consists of the to collected through entire population we are a sample from some population. Cont…. 1– 33 Type of research, its purpose, conditions under which the data are obtained determine the method of collecting the data. If relatively few items of required quickly, and funds are limited telephonic interviews are will information are recommended. If respondents are industrial clients Internet could also be used. If depth interviews and probing techniques are to be used, it is necessary to employ investigators to collect data. The quality of information collected through the filling of a questionnaire depends, to a large extent, upon the drafting of its questions. Hence, it is extremely important that the questions be designed or drafted very carefully and in a tactful manner. Before any processing of the data, editing and coding of data is necessary to the correctness of data. In any research studies, the voluminous data only after classification. Data can be presented through tables ensure can be handled and charts. Cont…. 1– 34 Classification refers to the grouping of data into homogeneous classes and categories. It is the process of arranging things in groups or classes according to their resemblances and affinities. A frequency distribution is the principle tabular summary of either discrete continuous data. The frequency distribution may show actual, relative frequencies. Actual and relative frequencies may be charted as or a frequency polygon. Two commonly used or data or cumulative either histogram (a bar chart) graphs of cumulative frequencies are less than ogive or more than ogive. Once the raw data is collected, it needs to be summarized and presented to the decision-maker in a form that is easy to comprehend. Tabulation not only condenses the data, but also makes it easy to understand. Tabulation is the way to extract information from the mass of data and hence popular fastest even among those not exposed to the statistical method. Cont…. 1– 35 The charts help in grasping the data and analyze it qualitatively. This also managers to effectively present the data as a part of reports. Various bar diagram, multiple bar diagrams, component bar helps types of chart are diagram, deviation bar diagram, sliding bar diagram, Histogram and Pie charts. A graphic presentation is another way of representing the statistical data in a and intelligible form. There are two types of graphs which we have graphs and ogives. 1– 36 discussed, simple line Measures of Central Tendency S. No. Reference No. 1. 1– 37 Particulars Slide From – To Learning Objectives 38 – 38 2. Topic 1 Introduction 39 – 39 3. Topic 2 Characteristics of Central Tendency 40 – 41 4. Topic 3 Arithmetic Mean 42 – 42 5. Topic 4 Median 43 – 43 6. Topic 5 Mode 44 – 44 7. Topic 6 Empirical Relationship between Mean, Median and Mode 45 – 45 8. Topic 7 Limitations of Central Tendency 46 – 46 9. Topic 8 Summary 47 – 51 Learning Objectives After studying this chapter, you should be able to: Understand the concept and characteristics of central tendency Describe all the measures of central tendency: mean, median and mode. Explain merits and demerits of all measures of central tendency. Discuss partition values or positional measures like quartiles, deciles and percentiles. 1– 38 Introduction The concept of central tendency plays a dominant role in the study of statistics. In many frequency distributions, the tabulated values show a distinct tendency to cluster or to group around a typical central value. This behaviour of the data to concentrate the values around a central part of distribution is called ‘Central Tendency’ of the data. 1– 39 Characteristics of Central Tendency A good measure of central tendency should possess as far as possible the following characteristics: Easy to understand. Simple to compute. Based on all observations. Uniquely defined. Possibility of further algebraic treatment. Not unduly affected by extreme values. Cont…. 1– 40 Common Measures of Central Tendency Mean Median 1– 41 Mode Arithmetic Mean The arithmetic mean of Properties of Arithmetic Mean a series is the quotient obtained by dividing Calculation of Simple Arithmetic Mean the sum of the values by the number of items. In algebraic language, Merits and Demerits of Arithmetic Mean if X1, X2, X3....... Xn are the n values of a variate X. 1– 42 Weighted Arithmetic Mean Median Median is the value, which divides the distribution of data, arranged in ascending or descending order, into two equal parts. Thus, the ‘Median’ is a value of the middle observation. 1– 43 Calculation of Median Merits and Demerits of Median Partition Values or Positional Measures Quartiles Deciles Percentiles Mode Mode is the value has the greatest density. Mode is 1– 44 which frequency denoted by Z. Calculation of Mode Merits and Demerits of Mode Graphic Location of Mode Empirical Relationship between Mean, Median and Mode A distribution in which the mean, the median, and the mode coincide is known as symmetrical (bell shaped) distribution. Normal distribution is one such a symmetric distribution, which is very commonly used. If the distribution is skewed, the mean, the median and the mode are not equal. In a moderately skewed distribution distance between the mean approximately one third of the distance between the expressed as: Mean – Median = (Mean – Mode) / 3 Mode = 3 * Median – 2 * Mean 1– 45 and the median is mean and the mode. This can be Limitations of Central Tendency In case of highly skewed data. In case of uneven or irregular spread of the data. In open end distributions. When average growth or average speed is required. When there are extreme values in the data. Except in these cases AM is widely used in practice. 1– 46 Summary Measures of the central tendency give one of the very important of the data. According to the situation, one of the various characteristics measures of central tendency may be chosen as the most representative. Arithmetic mean is widely used and understood. What characterizes the three measures of centrality, and what are the relative merits of each in the given situation, is the question. Mean summarizes all the information in the data. Mean can be visualized as a single point where all the mass (the weight) of the observations is concentrated. It is like a centre of gravity in physics. Mean also has some make it useful in the context of desirable mathematical properties that statistical inference. Cont…. 1– 47 To simplify the manual calculation, we may sometimes use shift of origin and change of scale. Shifting of origin is achieved by adding or subtracting a constant to all observations. In case of discrete data we add or subtract (usually subtract) a constant to the individual observations. Whereas for subtract (usually subtract) the constant to the class grouped data, we add or mark values. There are cases where relative importance of the different items is not the same. In such a case, we need to compute the weighted arithmetic mean. procedure is similar to the grouped data calculations studied earlier, The when we consider frequency as a weight associated with the class-mark. Median is the middle value when the data is arranged in order. The median resistant to the extreme observations. Median is like the geometric centre case we want to guard against the influence of a few outlying is in physics. In observations (called outliers), we may use the median. Cont…. 1– 48 Quantiles are related positional measures of central tendency. These are useful frequently employed measures. Most familiar quantiles are Quartiles, and Deciles, and Percentiles. Quartiles are position values similar to the Median. There are three denoted by Q1, Q2 and Q3. Q1 is called the lower Quartile or first quartile. quartiles The second quartile Q2 is nothing but the median. In a distribution, one fourth of the item are less then Q1 and the other ¾ th item are greater then Q1 is called the upper quartile (or) the 3rd quartile. Inter-quartile range is defined as the difference between the first and third quartile. It is a measure of spread of the data. D1, D2, D3… and D9 are the nine deciles. They divide a series into 10 equal parts. One tenth of the items are less than or equal to D1. One tenth of the items are more than or equal to D9 and one tenth of the items between any successive pairs of deciles when all the items are in ascending order 1– 49 Cont…. Pth percentile of a group of observations is that observation below which lie P% (P percent) observations. The position of Pth percentile is given by , where ‘n’ is the number of data points. If the value of The Mode of a data set is the value that occurs most frequently. There are is a fraction, we need to interpolate the value. many situations in which arithmetic mean and median fail to reveal the true characteristics of a data (most representative figure), for example, most common size of shoes, most common size of garments etc. In such cases, mode is the best-suited measure of the central tendency. A distribution in which the mean, the median, and the mode coincide is known as symmetrical (bell shaped) distribution. Normal distribution is one such a symmetric distribution, which is very commonly used. Cont…. 1– 50 This can be expressed as: Mean – Median = (Mean – Mode) / 3 Mode = 3 * Median – 2 * Mean No single average can be regarded as the best or most suitable under all circumstances. Each average has its merits and demerits and its own utility. A proper selection of an average depends on the (1) nature of the data and (2) purpose of enquiry or requirement of the data. 1– 51 particular field of importance and Measures of Dispersion S. No. Reference No. 1. 1– 52 Particulars Slide From – To Learning Objectives 53 – 53 2. Topic 1 Introduction 54 – 54 3. Topic 2 Characteristics of Measures of Dispersion 55 – 55 4. Topic 3 Absolute and Relative Measures of Dispersion 56 – 57 5. Topic 4 Range 58 – 59 6. Topic 5 Inter-quartile Range and Deviations 60 – 60 7. Topic 6 Variance and Standard Deviation 61 – 62 8. Topic 7 Summary 63 – 66 Learning Objectives After studying this chapter, you should be able to: Understand absolute and relative measures of variation Learn about range and inter-quartile range Discuss variance, standard deviation, mean deviation and coefficient of variation Study the empirical relationship between different measures of variation 1– 53 Introduction Data is useful: A measure of dispersion To compare the current results or variation in any data with the past results. shows the extent to which the To compare two are more sets numerical of observations. values tend to spread To suggest methods to control about an average. variation in the data. 1– 54 Characteristics of Measures of Dispersion It should be simple to understand. It should be easy to compute. It should be rigidly defined. It should be based on each individual item of the distribution. It should be capable of further algebraic treatment. It should have sampling stability. It should not be unduly affected by the extreme items. 1– 55 Absolute and Relative Measures of Dispersion ‘Relative’ or ‘Coefficient’ of dispersion is the ratio or the percentage of a measure of absolute dispersion to an appropriate average. A precise measure of dispersion is one which gives the magnitude of the variation in a series, i.e. it measures in numerical terms, the extent of the scatter of the values around the average. Cont…. 1– 56 ABSOLUTE AND RELATIVE MEASURES OF DISPERSION Measures of Dispersion Relative Variability The range Relative range The Quartile Deviation Relative Quartile Deviation The Mean Deviation Relative Mean deviation The Median Deviation Coefficient of Variation The Standard Deviation Graphical Method 1– 57 Range The ‘Range’ of the data is the difference between the largest value of data and smallest value of data. Cont…. 1– 58 Merits and Demerits of Range Merits Range is a simplest method of studying dispersion. It takes lesser time to compute the ‘absolute’ and ‘relative’ range. Demerits Range does not take into account all the values of a series, i.e. it considers only the extreme items and middle items are not given any importance. Range cannot be computed in the case of “open ends’ distribution i.e., a distribution where the lower limit of the first group and upper limit of the 1– 59 higher group is not given. Inter – Quartile Range and Deviations Inter-quartile Range Inter-quartile range is a difference between upper quartile (third quartile) and lower quartile (first quartile). Quartile Deviation Quartile Deviation is the average of the difference between upper quartile and lower quartile. Mean Deviation Mean deviation is the arithmetic mean of the absolute deviations of the values their arithmetic mean or median or mode. 1– 60 about Variance and Standard Deviation Variance is defined as the average of squared deviation of data points from their mean. Cont…. 1– 61 Different Formulae for Calculating Variance Calculation of Standard Deviation Properties of Standard Deviation Merits and Demerits of Standard Deviation Standard Deviation of Combined Means Coefficient of Variation Empirical Relationship Between Different Measures of Variation 1– 62 Summary Study of distribution is very important for decision-making. Usually, measures central tendency and variability are adequate for taking decision. However, of if data is quite different from normal distribution then measure skewness and kurtosis need considered. We discussed measures of variability: Range, Variance and to be Standard Deviation. A measure of dispersion gives an idea about the extent of lack of uniformity in sizes and qualities of the items in a series. It helps us to know the degree consistency in the series. If the difference between items is large the of uniformity and the dispersion or variation is large and vice versa. Cont…. 1– 63 The measures of dispersion can be either ‘absolute’ or ‘relative’. Absolute measures of dispersion are expressed in the same units in which the original are expressed. For example, if the series is expressed as Marks of the particular subject; the absolute dispersion will provide the value students data in a in Marks. The only difficulty is that if two or more series are expressed in different units, the series cannot be compared on the basis of dispersion. The ‘Range’ of the data is the difference between the largest value of data and smallest value of data. This is an absolute measure of variability. However, if have to compare two sets of data, ‘Range’ may not give a true picture. In such we case, relative measure of range, called coefficient of range is used. Inter-quartile range is a difference between upper quartile (third quartile) and quartile (first quartile). Quartile Deviation is the average of the difference between lower upper quartile and lower quartile. Cont…. 1– 64 Average used for calculating deviation can be the mean, the median or the However, usually the mean is used. There is also an advantage of taking the median, because ‘Mean Deviation’ from median is lowest ‘Mean Deviations’. Since absolute values of calculating Mean Deviation, the mean deviations mode. from as compared to any other deviations ignoring sign are taken for deviation is not amenable to further algebraic treatment. The variance is the average squared deviation of the data from their mean. sample data, we take the average by dividing with (n-1) where n is a is to cater for degree of freedom. For population data, we For sample size. This average by dividing with the population size N. The Standard Deviation (SD) of a set of data is the positive square root of the variance of the set. This is also referred as Root Mean Square (RMS) value of deviations of the data points. SD of sample is the square root of the sample the variance Cont…. 1– 65 There is no effect of shifting origin on standard deviation or variance. The measures of deviation are very effective in making reports and the business executives to present their data top general public presentations by who not do understand statistical methods. Variance analysis also helps in managing budgets by controlling budgeted actual costs. Without the standard deviation, you can’t compare two effectively. 1– 66 data versus sets Skewness and Kurtosis S. No. Reference No. 1. 1– 67 Particulars Slide From – To Learning Objectives 68 – 68 2. Topic 1 Introduction 69 – 70 3. Topic 2 Karl Pearson’s Coefficient of Skewness (SKP) 71 – 71 4. Topic 3 Bowley’s Coefficient of Skewness (SKB) 72 – 72 5. Topic 4 Kelly’s Coefficient of Skewness (SKK) 73 – 73 6. Topic 5 Measures of Kurtosis 74 – 74 7. Topic 6 Moments 75 – 75 8. Topic 7 Summary 76 – 79 Learning Objectives After studying this chapter, you should be able to: Understand the concept and different types of skewness Discuss various measures of kurtosis Learn about moments, its properties and coefficients based on moments 1– 68 Introduction Skewness is a measure that studies the degree and direction of departure from symmetry. Nature of Skewness Skewness can be positive or negative or zero. When the values of mean, median and mode are equal, there is no skewness. When mean > median > mode, skewness will be positive. When mean < median < mode, skewness will be negative. Cont…. 1– 69 Characteristic of a Good Measure of Skewness It should be a pure number in the sense that its value should be independent of the unit of the series and also degree of variation in the series. It should have zero-value, when the distribution is symmetrical. It should have a meaningful scale of measurement so that we could easily interpret the measured value. Mathematical measures of skewness can be calculated by: Karl-Pearson’s Method Bowley’s Method Kelly’s method 1– 70 Karl Pearson’s Coefficient of Skewness (SKP) Karl Person has suggested two formulae: Where the relationship of mean and mode is established; Where the relationship median is not established. 1– 71 between mean and Bowley’s Coefficient of Skewness (SKB) Bowley’s method of skewness is based on the values of median, lower and upper quartiles. This method suffers from the same limitations which are in the case of median and quartiles. Wherever positional measures are given, skewness should be measured by Bowley’s method. This method is also used in case of ‘open-end series’, where importance of extreme values is ignored. Absolute skewness = Q3 + Q1 – 2 Median Coefficient of Skewness, (SkB) = Where, Q is quartile. 1– 72 the Kelly’s Coefficient of Skewness (SKK) Kelly’s coefficient of skewness is defined as: Skk = Where, P is percentile. Example: Calculate the Kelly’s coefficient of skewness from the following data: 1– 73 Measures of Kurtosis Kurtosis is a measure of peaked-ness of distribution. Larger the kurtosis, more more peaked will be the distribution. The kurtosis is calculated either as an absolute or a relative value. Absolute kurtosis is always a positive number. Negative kurtosis indicates a flatter distribution than the normal distribution, and called as platykurtic. A positive kurtosis means more peaked curve, called Leptokurtic. Peakedness of normal distribution is called Mesokurtic. 1– 74 and Moments The arithmetic mean PROPERTIES OF MOMENTS of various powers of these any deviations in distribution is called the moments of the distribution mean. 1– 75 about COEFFICIENTS BASED ON MOMENTS Summary Measures of Skewness and Kurtosis, like measures of central tendency and dispersion, study the characteristics of a frequency distribution. Averages tell about the central value of the distribution and measures of dispersion tell us us about the concentration of the items around a central value. When two or more symmetrical distributions are compared, the difference in is studied with ‘Kurtosis’. On the other hand, when two or more compared, they will give different degrees of exclusive i.e. the presence of them symmetrical distributions are Skewness. These measures are mutually skewness implies absence of kurtosis and vice-versa. Cont…. 1– 76 Bowley’s method of skewness is based on the values of median, lower and quartiles. This method suffers from the same limitations which are in and quartiles. Wherever positional measures are given, skewness Bowley’s method. This method is also used in should upper the case of median be measured by case of ‘open-end series’, where the importance of extreme values is ignored. Kelly’s coefficient of skewness is defined as: Skk = Where, P is percentile. Cont…. 1– 77 Kurtosis is a measure of peaked-ness of distribution. Larger the kurtosis, more more peaked will be the distribution. The kurtosis is calculated either as and an absolute or a relative value. Absolute kurtosis is always a positive number. Absolute kurtosis of normal distribution (symmetric bell shaped distribution) is taken as 3. It is taken as datum to calculate relative kurtosis as follows: Absolute kurtosis = Relative kurtosis = Absolute kurtosis – 3 Cont…. 1– 78 a Moments about mean are generally used in statistics. We use a Greek mu for these moments. Consider a mass attached at each frequency and take moments about the mean. First, alphabet read as point proportional to its second, third and fourth moments can be used as a measure of Central Tendency, Variation (dispersion), asymmetry and peakedness of the curve. 1– 79 Correlation Analysis S. No. Reference No. 1. 1– 80 Particulars Slide From – To Learning Objectives 81 – 81 2. Topic 1 Introduction 82 – 83 3. Topic 2 Types of Correlation 84 – 84 4. Topic 3 Methods of Calculating Correlation 85 – 85 5. Topic 4 Scatter Diagram Method 86 – 86 6. Topic 5 Co-variance Method – The Karl Pearson’s Correlation Coefficient 87 – 88 7. Topic 6 Rank Correlation Method 89 – 89 8. Topic 7 Correlation Coefficient using Concurrent Deviation 90 – 91 9. Topic 8 Summary 92 – 98 Learning Objectives After studying this chapter, you should be able to: Understand the concept of correlation Study about different types of correlation Describe various methods of calculating correlation such as scatter diagram method Discuss various types of correlation coefficients viz, Karl Pearson coefficient, rank correlation and coefficient based on concurrent 1– 81 deviations. correlation Introduction Croxton and Cowden say, “When the relationship is of a quantitative nature, the appropriate statistical tool for discovering and measuring the relationship and expressing it in a brief formula is known as correlation”. Cont…. 1– 82 The study of correlation helps managers in following ways: To identify relationship of various factors and decision variables. To estimate value of one variable for a given value of other if both are estimating sales for a given advertising and promotion expenditure. To understand economic behaviour and market forces. To reduce uncertainty in decision-making to a large extent. 1– 83 correlated. E.g. Types of Correlation Positive or Negative Correlation Simple or Multiple Correlations Partial or Total Correlation Linear and Non-linear Correlation 1– 84 Methods of Calculating Correlation Scatter Diagram Method Karl Pearson’s Coefficient of Correlation 1– 85 Concurrent Deviation Method Rank Method Scatter Diagram Method The pattern of points obtained by plotting the observed points are knows as scatter diagram. It gives us two types of information. Whether the variables are related or not. If so, what kind of relationship or estimating equation the relationship. 1– 86 that describes Co – Variance Method – The Karl Pearson’s Correlation Coefficient The correlation coefficient measures the degree of association between two variables X and Y. Karl Pearson’s formula for correlation coefficient is given as, Where r is the ‘Correlation Coefficient’ or ‘Product Moment Correlation Coefficient’ between X and Y. Cont…. 1– 87 Assumptions Underlying Karl Pearson’s Correlation Coefficient Interpretation of R Estimation of Probable Error 1– 88 Rank Correlation Method RANK CORRELATION WHEN RANKS ARE GIVEN RANK CORRELATION WHEN RANKS ARE NOT GIVEN RANK CORRELATION WHEN EQUAL RANKS ARE GIVEN 1– 89 Correlation Coefficient using Concurrent Deviation This is the easiest method to find the correlation between two variables. Although method is effective in giving the direction of the correlation as to give the accurate strength of the correlation. In each data series as increasing (+), or the positive or negative but fails this method we check the fluctuation in decreasing (-) or equal values. Then we count the number of items that increase or decrease or remains equal concurrently and denote as c. The correlation coefficient is then calculated as, Where, n = total number of pairs. c = Number of concurrent changes Cont…. 1– 90 Example: The data of advertisement expenditure (X) and sales (Y) of a company for past 10 year period is given below. Determine the correlation coefficient between these variables and comment the correlation. 1– 91 Summary In this chapter the concept of correlation or the association between two variables been discussed. A scatter plot of the variables may suggest that related but the value of the Pearson correlation the coefficient two r has variables quantifies are this association. Correlation is a degree of linear association between two random variables. In these two variables, we do not differentiate them as dependent and independent variables. It may be the case that one is the cause and other is an effect dependent variables respectively. On the other hand, both i.e. independent may be and dependent variables on a third variable. Cont…. 1– 92 In business, correlation analysis often helps manager to take decisions by estimating the effects of changing the values of the decision variables like promotion, advertising, price, production processes, on the objective costs, sales, market share, consumer satisfaction, becomes more objective by removing competitive price. parameters like The decision subjectivity to certain extent. The correlation coefficient r may assume values between –1 and 1. The sign indicates whether the association is direct (+ve) or inverse (-ve). A numerical value of r equal to unity indicates perfect association while a value of zero indicates no association. Cont…. 1– 93 The correlation is said to be positive when the increase (decrease) in the value variable is accompanied by an increase (decrease) in the value of other of variable one also. Negative or inverse correlation refers to the movement of the variables in opposite direction. Correlation is said to be negative, if an accompanied by a decrease increase (decrease) in the value of one variable is (increase) in the value of other. In simple correlation the variation is between only two variables under study the variation is hardly influenced by any external factor. In other words, variables remains same, there won’t be any change in other and if one of the variable. Cont…. 1– 94 In case of multiple correlation analysis there are two approaches to study the correlation. In case of partial correlation, we study variation of two variables excluding the effects of other variables by keeping them under controlled condition. When the amount of change in one variable tends to keep a constant ratio to amount of change in the other variable, then the correlation is said to be amount of change in one variable does not bear a constant change in the other variable then the correlation is and the linear. But if the ratio to the amount of said to be non-linear. Cont…. 1– 95 Correlation analysis may also be necessary to eliminate a variable which shows low or hardly any correlation with the variable of our interest. In statistics, there are number of measures to describe degree of association are Karl Pearson’s Correlation Coefficient, Spearman’s coefficient of determination, Yule’s coefficient between variables. These rank of correlation association, coefficient, coefficient of colligation, etc. The correlation coefficient measures the degree of association between two variables X and Y. Karl Pearson’s formula for correlation coefficient is given as, Cont…. 1– 96 The purpose of computing a correlation coefficient in such situations is to determine the extent to which the two sets of ranking are in agreement. The coefficient that is determined from these ranks is known as Spearman’s rank coefficient, rs. This is defined by the following formula: Cont…. 1– 97 Although the concurrent deviation method is effective in giving the direction of the correlation as positive or negative but fails to give the accurate strength of the correlation. In this method we check the fluctuation in each decreasing (–) or equal values. Then we decrease or remains equal then calculated Where, count the number of items that increase or concurrently and denote as c. The correlation coefficient is as, n = total number of pairs. c = Number of concurrent changes 1– 98 data series as increasing (+), or Regression Analysis S. No. Reference No. 1. 1– 99 Particulars Slide From – To Learning Objectives 100 – 100 2. Topic 1 Introduction 101 – 101 3. Topic 2 Regression Analysis 102 – 103 4. Topic 3 Simple Linear Regression 104 – 106 5. Topic 4 Coefficient of Regression 107 – 108 6. Topic 5 Non-linear Regression Models 109 – 109 7. Topic 6 Correlation Analysis vs Regression Analysis 110 – 110 8. Topic 7 Summary 111 – 114 Learning Objectives After studying this chapter, you should be able to: Understand the concept of regression analysis Discuss the applicability of regression Describe simple linear regression and nonlinear regression model. Learn about coefficient of regression and linear regression equations 1– 100 Introduction In regression analysis we develop an equation called as an estimating equation used to relate known and unknown variables. Then correlation analysis is used to determine the degree of the relationship between the variables. In this chapter mathematically. 1– 101 we will learn, how to calculate the regression line Regression Analysis According to Morris Myers Blair, “regression is the measure of the average relationship between two or more variables in terms of the original units of the data.” Cont…. 1– 102 Applicability of Regression Analysis Regression analysis is a branch of statistical theory which is widely used in all the scientific disciplines. It is a basic technique for measuring or estimating the relationship among economic variables that constitute the essence of economic theory and economic life. 1– 103 Simple Linear Regression The This model highest bivariate power variables of x is is used we have i.e. only two considered and the distribution are if ‘best fit’ curve is approximated to a called as straight line. order of the model. Cont…. 1– 104 Simple Linear Regression Model The linear regression model uses straight line relationship. Equation of a straight line is of the form, (1) Where ŷ is the predicted value of Y corresponding to x. and are if we assume the error (deviation) in Y direction is e, we can constants. Now write the relationship of X and Y in data points as, Error e is the amount by which observation will fall off regression line. Error e is due to random error ‘a’ and ‘b’ are called parameters of the linear regression model whose values are found out from the observed data. 1– 105 Cont…. Linear Regression Equation Suppose the data points are (x1, y1) (x2, y2) ….. (xn, yn) . Then we can write from regression equation, (2) Thus, sum square of errors is, 1– 106 To have minimum sum of squares of errors (SSE) we must have the condition, Coefficient of Regression The coefficients of regression are bYX and bXY. They have following implications: Slopes of regression lines of Y on X and X on Y viz. bYX and bXY must have same signs (because r² cannot be negative). Correlation coefficient is geometric mean of bYX and bXY. If both slopes bYX and bXY are positive correlation coefficient r is positive. If both bYX and bXY are negative the correlation coefficient r is negative. If Both regression lines intersect at point indicating perfect correlation. Cont…. 1– 107 Properties of Regression Coefficients The coefficient of correlation is the geometric mean of the two regression coefficients. Both the regression coefficients are either positive or means that they always have identical sign i.e., negative. It either both have positive sign or negative sign. The coefficient of correlation and the regression coefficients will also have same sign. Regression coefficients are independent of the change in the origin but not of the scale. 1– 108 Non – Linear Regression Models Second Degree Model Other Regression Models Seasonal Model Seasonal Model with Trend Coefficient of Determination 1– 109 Correlation Analysis vs Regression Analysis Degree and Nature of Relationship Cause and Effect Relationship Like in correlation, regression analysis can also be studied as ‘simple and multiple’, ‘total and partial’, ‘linear and nonlinear’, etc. In correlation, there is no distinction between independent and dependent variables. 1– 110 Summary In this chapter, the concept of regression between dependent and variables has been discussed. Regression provides us a measure facilitates to predict one variable for a value of independent of the relationship and also other variable. Unlike correlation analysis, in regression analysis, one variable is and other dependent. Please note that this relationship need not independent be a cause-effect relationship. Regression analysis is a branch of statistical theory which is widely used in the scientific disciplines. It is a basic technique for measuring or among economic variables that constitute the all estimating the relationship essence of economic theory and economic life. The uses of regression analysis are not confined to economic and business activities. Its applications are extended to almost all the natural, physical and social sciences. Cont…. 1– 111 Simple linear regression model is used if we have bivariate distribution i.e. two variables are considered and the ‘best fit’ curve is approximated to This describes the liner relationship between two variables. too simplistic, in many business situations, it is based on this model for any decision- only a straight line. Although it appears to be adequate. At least, initial study can be making situation. We have studied simple linear, non-linear and multiple regression models. multiple regression and non-linear regression models, MS Excel or any package would help in reducing voluminous calculations. We of determination as a measure of the strength of For other computer also discussed coefficient relationship. Cont…. 1– 112 Least square principle can also be applied to the fitting of a second degree polynomial which may be useful in business situation if we have some idea that the relationship between two variables is parabolic. In any case second degree polynomial fit is more likely to be better approximation of the actual relationship. We may use second order model (parabolic trend) if we feel that the variation is parabolic. The least square approximation can be calculated easily for low degree polynomials, like linear, parabolic, cubic, etc. But for higher degrees (more normal equations becomes ill conditioned. This coefficients. Then the approximation ‘orthogonal polynomials’ are than three), the system of causes large errors in values of becomes incorrect. To avoid these problems, used for approximation. Cont…. 1– 113 Mean Square Error (MSE) is an estimate of the variance of the regression error. MSE depends on the values of data and its scales. Hence we need a measure that calculates relative degree of variation so that it can be the fits obtained from different models and for different data sets. compared for Coefficient of determination is such a measure. Coefficient of determination is a measure of the strength of the regression fit. an estimator of population parameter of correlation and can be obtained decomposition of variation in Y into two components, viz. due to It is directly from a error and due to regression. Error is a deviation of a data point from its respective group mean. Thus error is the deviation of a data from its 1– 114 predicted values explained by the regression line. Theory of Probability S. No. Reference No. 1. 1– 115 Particulars Slide From – To Learning Objectives 116 – 116 2. Topic 1 Introduction 117 – 117 3. Topic 2 Important Terms in Probability 118 – 119 4. Topic 3 Kinds of Probability 120 – 120 5. Topic 4 Simple Propositions of Probability 121 – 125 6. Topic 5 Addition Theorem of Probability 126 – 127 7. Topic 6 Multiplication Theorem of Probability 128 – 128 8. Topic 7 Conditional Probability 129 – 129 9. Topic 8 Law of Total Probability 130 – 131 10. Topic 9 Independence of Events 132 – 132 11. Topic 10 Combinatorial Concept 133 – 133 12. Topic 11 Summary 134 – 134 Learning Objectives After studying this chapter, you should be able to: Understand the meaning and important terms of probability Learn about addition theorem and multiplicative theorem of probability Understand the concept of independence of events, combinatorial concepts permutation and combination Solve problems of conditional probability and Baye’s Theorem and other concepts of probability 1– 116 like Introduction A probability is a quantitative measure of risk. This chapter provides exposure to fundamental concepts, since probability inseparable from statistical methods. 1– 117 is Important Terms in Probability Probability and sampling are inseparable parts of statistics. Random Experiment Random experiment is an experiment whose outcome is not predictable in advance. Cont…. 1– 118 Sample Space 1– 119 Event Event Space Union of events Intersection of events Mutually exclusive events Collectively exhaustive events Complement of event Kinds of Probability Classical Probability Relative Frequency Axiomatic Probability Probability Subjective Probability 1– 120 Simple Propositions of Probability Proposition 1 P (EC) = 1 – P (E) Probability of compliment: Let even EC denote complement of the event E. Obviously by definition of complement, EC has all elements from the sample space S that are not in E. Thus, E and EC are mutually exclusive and collectively exhaustive. Therefore, by axiom 2 and 3 we have, 1 = P(S) = P (E ∪ EC) = P (E) + P (EC) or, P (EC) = 1 - P (E) Cont…. 1– 121 Proposition 2 If E ⊂ F, then P (E) ≤ P (F) If the event E is contained in event F, that is, then we can express, F = E ∪ (EC ∩ F). However, as events E and (EC ∩ F) are mutually exclusive, we get, P (F) = P (E) + P (EC ∩ F) But, by axiom 1, P (EC ∩ F) ≥ 0. Therefore, we have proved the proposition, P (E) ≤ P (F) Cont…. 1– 122 Proposition 3 P (E ∪ F) = P (E) + P (F) – P (E ∩ F) Probability of unions: Event E ∪ F can be written as the union of the two disjoint events namely E and (EC ∩ F). Thus, from axiom 3, P (E ∪ F) = P [E ∪ (EC ∩ F)] = P (E) + P (EC ∩ F) (1) Also, F = (E ∩ F) ∪ (EC ∩ F), hence, P (F) = P (E ∩ F) + P (EC ∩ F) (2) From (1) and (2) we get the proposition 3 as, P (E ∪ F) = P (E) + P (F) - P (E ∩ F) Extended statement of this proposition for n events is also called as inclusionexclusion principle. P(E ∪ F ∪ G) = P(E) + P(F) + P(G) – P(EF) – P(FG) – P(EG) + P(E∩F∩G) Cont…. 1– 123 Proposition 4 Mutually exclusive events: When the sets corresponding to two events are disjoint (have no common elements, or the intersection is null), the two events are called mutually exclusive. E ∩ F = Φ Therefore, P (E ∩ F) = P (Φ) = 0 Also, for mutually exclusive events E and F, P (E ∪ F) = P (E) + P (F) Cont…. 1– 124 Proposition 5 P (EC∩F) = P (F) – P (E∩F) From set theory, F can be written as a union of two disjoint events E ∩ F and F . Hence, by Axiom III, we have, P(F) = P(E ∩ F) + P(EC ∩ F). By re- terms we get the result. 1– 125 arranging EC ∩ the Addition Theorem of Probability The addition theorem in the probability concept is the process of of the probability that either event ‘A’ or event ‘B’ occurs or between two events ‘A’ and ‘B’ the addition is determination both occur. The notation denoted as ‘∪’ and pronounced as Union. Let A and B be two events defined in a sample space. The union of events A and B is the collection of all outcomes that belong either to A or to B or to both A and B and is denoted by A or B. Cont…. 1– 126 The result of this addition theorem generally written using Set notation, P (A ∪ B) = P (A) + P (B) – P (A ∩ B), Where, P (A) = probability of occurrence of event ‘A’ P (B) = probability of occurrence of event ‘B’ P (A ∪ B) = probability of occurrence of event ‘A’ or event ‘B’. P (A ∩ B) = probability of occurrence of event ‘A’ or event ‘B’.Addition theorem probability can be defined and proved as follows: Let ‘A’ and ‘B’ are Subsets of a finite non empty set ‘S’ then according to the addition rule P (A ∪ B) = P (A) + P (B) – P (A). P(B), On dividing both sides by P(S), we get P (A ∪ B) / P(S) = P (A) / P(S) + P (B) / P(S) – P (A ∩ B) / P(S) (1). 1– 127 Multiplication Theorem of Probability Probability is the branch of mathematics which deals with the occurrence of samples. The basic form of Multiplication theorems on probability for two events ‘X’ and ‘Y’ can be stated as, P (x. y) = p (x). P(x / y) Here p (x) and p (y) are the probabilities of occurrences of events ‘x’ and ‘y’ respectively. P (x / y) is the Conditional Probability of ‘x’ and the condition is that ‘y’ has occurred before ‘x’. P (x / y) is always calculated after ‘y’ has occurred. Here, occurrence of ‘x’ depends on ‘y’. ‘y’ has changed some events already. So, occurrence of ‘x’ also changes. 1– 128 Conditional Probability Conditional probability is the probability that an event will occur given that event has already occurred. If A and events, then the another B are two conditional probability of A given B is written as P(A/B) and read as “the probability of A given that B has already occurred.” 1– 129 Law of Total Probability Consider two events, E and F. Whatsoever be the events, we can always say that the probability of E is equal to the probability of intersection of E and F, plus, the probability of the intersection of E and complement of F. That is, P (E) = P (E ∩ F) + P (E ∩ F ∩ C) 1– 130 Bayes’s Formula Let, E and F are events. E = (E ∩ F) U (E ∩ F ∩ C) For any element in E, must be either in both E and F or be in E but not in F. (E F) and (E FC) are mutually exclusive, since former must be in F and latter must not in F, we have by Axiom 3, P (E) = (E F) + (E FC) = P(E/F) × P(F) +P(E/FC) × P(FC) = P(E/F) × P(F) + ()[1()] 1– 131 Independence of Events 1– 132 Combinatorial Concept 1 Product Rule of Counting 1– 133 2 Sum Rule of Counting 3 4 Permutation Combination Summary In this chapter, we discussed basic idea of probability. We defined probability in different ways and pointed out serious limitations of each definition. Then we discussed axioms of probability, which are the backbone of theory of probability. Then we studied number of useful propositions of probability. We also defined conditional probability, law of total probability, and Bayes’ Theorem. We also defined mutually exclusive events, and independence of events. Lastly, we discussed few important concepts of combinatorial analysis, which comes very handy while calculating probability of an event. 1– 134 Probability Distribution S. No. Reference No. 1. 1– 135 Particulars Slide From – To Learning Objectives 136 – 136 2. Topic 1 Introduction 137 – 137 3. Topic 2 Random Variable 138 – 139 4. Topic 3 Probability Distributions of Standard Random Variables 140 – 140 5. Topic 4 Bernoulli Distribution 141 – 142 6. Topic 5 Binomial Distribution 143 – 145 7. Topic 6 Poisson Distribution 146 – 147 8. Topic 7 Normal Distribution 148 – 149 9. Topic 8 Summary 150 – 153 Learning Objectives After studying this chapter, you should be able to: Differentiate between discrete and continuous random variables Discuss probability distributions of standard random variable Understand discrete probability distribution which include Binomial and Distribution 1– 136 Explain continuous probability distribution which includes Normal distribution Poisson Introduction We will study a few common distributions in this chapter. Normal distribution has extensive use in statistical tools and therefore advised to study it in detail. 1– 137 Knowledge of sequences, series and calculus is expected. readers are Random Variable Arandom variable, usually writtenX, is a variable whose possible values are numerical outcomes of a random phenomenon. Cont…. 1– 138 Discrete and Continuous Random Variables Probability Mass Function (P.M.F.) Probability Density Function Cumulative Distribution Function Expectation Value of Random Variables Expected Value of a Function of a Random Variable Variance and Standard Deviation of Random Variable 1– 139 Probability Distributions of Standard Random Variables 1 2 Bernoulli Distribution Normal Distribution Binomial Distribution Poisson Distribution 4 1– 140 3 Bernoulli Distribution It is a basis of many discrete random variables, as it deals with individual trial. It is a building block random variables. for It other is a single trial distribution. Cont…. 1– 141 Application of Bernoulli Distribution Bernoulli trial is fundamental to many discrete distributions like Binomial, Poisson, Geometric, etc. Situations where Bernoulli distribution is commonly used are: 1– 142 Sex of newborn child; Male = 0, Female = 1 say. Items produced by a machine are Defective or Non-defective. During next flight an engine will fail or remain serviceable. Student appearing for examination will pass or fail. Binomial Distribution A binomial random variable is the number of successes x in n repeated trials of a binomial experiment. The probability distribution of a binomial random variable is called a binomial distribution (also known as a Bernoulli distribution). Cont…. 1– 143 Applications of Binomial Distribution Trials are finite (and not very large), performed repeatedly for ‘n’ times. Each trial (random experiment) should be a Bernoulli trial, the one that results in either success or failure. Probability of success in any trial is ‘p’ and is constant for each trial. All trials are independent. Cont…. 1– 144 Following are some of the real life examples of applications of binomial distribution. Number of defective items in a lot of n items produced by a machine. Number of male births out of n births in a hospital. Number of correct answers in a multiple-choice test. Number of seeds germinated in a row of n planted seeds. Number of re-captured fish in a sample of n fishes. Number of missiles hitting the targets out of n fired. 1– 145 Poisson Distribution A random variable X, taking one of the values 0, 1, 2 … is said to be a Poisson random variable with parameter λ, if for some λ > 0, P(X = i) is a probability mass function (p.m.f.) of the Poisson random variable. Its expected value and variance are, m = E [X] = l Var [X] = l Cont…. 1– 146 Some of the common examples where Poisson random variable can be used to define the probability distribution are: Number of accidents per day on expressway. Number of earthquakes occurring over fixed time span. Number of misprints on a page. Number of arrivals of calls on telephone exchange per minute. Number of interrupts per second on a server. 1– 147 Normal Distribution Equation For Normal Probability Curve Standard Normal Distribution Properties Of Normal Distribution Areas Under Standard Normal Probability Curve Importance Of Normal Distribution Cont…. 1– 148 Area under the Normal Curve 1– 149 Summary Random variable is a real valued function defined over a sample space with probability associated with it. The value of the random variable is outcome of an experiment. Random variables are neither ‘random’ nor ‘variable’. In this chapter we discussed several important random variables, the associated formulae, and problem solving using formulae. A discrete random variable is the one that takes at the most countable values. A continuous random variable can take any real value. Cont…. 1– 150 We also discussed probability distributions of random variables. Binomial distribution is used if an experiment is carried out for finite number of n independent trials; all trials being Bernoulli trials with constant probability of success p. Random variable will follow Poisson distribution if it is the number of occurrences of a rare event during a finite period. Waiting time for a rare event Negative binomial distribution is used if numbers is exponentially distributed. of Bernoulli trials are made to achieve desired number of successes. Cont…. 1– 151 One of the continuous random variable required often is uniform random variable. Waiting time for an event that occurs periodically follows uniform distribution. Normal probability distribution is the most important distribution in defined normal distribution with parameters (μ, σ) where μ is statistics. We mean and σ is standard deviation. Further, we defined standard normal distribution, which is a special case of normal distribution with parameters (0, 1). Cont…. 1– 152 We also discussed transformation of normal random variable X to standard random variable Z using xzms−= Z distribution is very convenient for manual calculation as we can use standard normal tables which are extensively plotted, to find probability and interval. Normal distribution is used as a model in many real world situations, both as continuous distribution or an approximation to discrete distributions like Poisson. 1– 153 binomial a or Use of Excel Software for Statistical Analysis S. No. Reference No. 1. 1– 154 Particulars Slide From – To Learning Objectives 155 – 155 2. Topic 1 Introduction 156 – 157 3. Topic 2 Introduction to Excel 158 – 168 4. Topic 3 Entering Data in Excel 169 – 169 5. Topic 4 Descriptive Statistics 170 – 172 6. Topic 5 Basic Built-in Functions (Average, Mean, Mode, Count, Max and Min) 173 – 177 7. Topic 6 Statistical Analysis 178 – 182 8. Topic 7 Normal Distribution 183 – 183 9. Topic 8 Brief about SPSS 184 – 189 10. Topic 9 Summary 190 – 194 Learning Objectives After studying this chapter, you should be able to: Understand the basic concepts of using Microsoft Excel Discuss how to enter data in excel and basic built-in functions Gain knowledge about SPSS 1– 155 Introduction The most popular software in the MS Office Suite includes the following: Microsoft Word Microsoft Excel Microsoft PowerPoint Microsoft Access Microsoft Project Plan Microsoft Outlook Cont…. 1– 156 MICROSOFT OFFICE SUITE 1– 157 Suite Product Home and Student Home and Business Professional Word 2010 Included Included Included Excel 2010 Included Included Included PowerPoint 2010 Included Included Included OneNote 2010 Included Included Included Outlook 2010 - Included Included Access 2010 - - Included Publisher 2010 - - Included Introduction to Excel Opening A Document Click on File-Open (Ctrl+O) to open/retrieve an existing workbook; change the directory area or drive to look for files in other locations. To create a new workbook, click on File-New-Blank Document. Cont…. 1– 158 Saving And Closing A Document To save your document with its current filename, location and file format either click on File - Save. When you have finished working on a document you should close it. Go to the File menu and click on Close. Cont…. 1– 159 Excel Screen Menu Bar in Excel Cont…. 1– 160 Excel Screen Cont…. 1– 161 Workbooks and Worksheets Cell Row Column Spreadsheet Workbook Cont…. 1– 162 Cell Name Box Spreadsheet Tabs in Excel Cont…. 1– 163 Moving Around the Worksheet Margins Orientation Paper Size Print Area Cont…. 1– 164 Margin Options in Excel Cont…. 1– 165 Orientation Options in Excel Cont…. 1– 166 Print Area Selection Cont…. 1– 167 Moving between Cells While working with any Office productivity tool, the clipboard functions are invaluable. The most common clipboard functions are ‘Cut’, ‘Copy’ and ‘Paste’. In the Microsoft Office suite, there are keyboard shortcuts for these KEYBOARD SHORTCUTS 1– 168 Cut Ctrl + X Copy Ctrl + C Paste Ctrl + V functions. Entering Data in Excel A new worksheet is a grid of Entering Labels Entering Values Rounding Numbers that Meet Specified Criteria Sorting by Columns rows and columns. The rows are labeled with numbers, and the columns are labeled with letters. Each intersection of row and a column is a cell. 1– 169 a Descriptive Statistics Excel includes elaborate and customisable toolbars, for example the “standard” toolbar shown here: Some of the icons are useful mathematical computation: is the “Autosum” icon, which enters the formula “=sum ()” to add up a range of cells. is the “Function Wizard” icon, which gives you access to all the functions available. Cont…. 1– 170 is the “Graph Wizard” icon, giving access to all graph types available, as shown in this display: Cont…. 1– 171 Excel can be used to generate measures of location and variability for a variable. Suppose we wish to find descriptive statistics for a sample data: 2, 4, 6, and 8. Step1: Select the Tools *pull-down menu, if you see data analysis, click on this option, otherwise, click on add-in.. option to install analysis tool pak. Step 2: Click on the data analysis option. Step 3: Choose Descriptive Statistics from Analysis Tools list. Step 4: When the dialog box appears: Enter A1:A4 in the input range box, A1is a value in column A and row 1; in case this value is 2. Using the same technique enters other VALUES until this you reach the last one. Step 5: Select an output range, in this case B1. Click on summary statistics to results. Select OK. 1– 172 see the Basic Built – in Functions (Average, Mean, Mode, Count, Max and Min) Manual Equation Entry Cont…. 1– 173 Arithmetic Function, Syntax and Description Functions in Excel Cont…. 1– 174 SUM Function The SUM function is probably the most commonly used function in Excel. It comes in three flavours in Excel, namely: 1 2 SUMIF() SUM() SUMIFS() 3 Cont…. 1– 175 Logical Functions AND () FALSE IF () TRUE IFERROR () OR () NOT Cont…. 1– 176 Statistical Functions Statistical functions are invaluable in any mathematical calculations. They can provide insights into trends provide data for detailed analysis as well as help identify gaps that need to be plugged. Excel provides a wide range of functions that can be used to perform basic statistical analyses. 1– 177 Statistical Analysis Creating Charts Select the data range (only numbers) for which the chart needs to be created. Under the Insert Ribbon, in the Chart section, click on the type of chart you want to create and the category. Here the clustered chart has been used. Select the chart and click on Select Data button in Data section of the Design Layout. In the Select Data Source dialog, select ‘Series 1’ and click on Edit button. Cont…. 1– 178 Select Data Source 1– 179 Cont…. This opens the Edit Series dialog that allows you to change the range of values in series and provide a Series name. For the series name, click on icon to select the column title of Series 1. Edit Series Cont…. 1– 180 Histogram Now follow the steps given below to draw histogram. Select the first two columns i.e. class interval and frequency in the Excel Click on ‘Chart Wizard’ icon on tool bar or select from menu [Insert → Chart…..] insert drop down menu. A dialogue box with title ‘Chart sheet. From Wizard – Step 1 to 4 – Chart type’ will appear. In the menu ‘Standard Type’, select ‘Column’. Click on ‘Next’ button. Now the next menu with title ‘Chart Wizard – Step 2 to 4 – Chart Source Data’ appear. Since we have already selected the source data, select ‘Next’. Don’t will forget to check that column is selected in data series. Now the next menu with title ‘Chart Wizard – Step 3 to 4 – Chart Options’ will appear. Cont…. 1– 181 Correlation Plot and Regression Analysis Using MS Excel for calculating Karl Pearson’s correlation coefficient Calculating Karl Pearson’s correlation coefficient using MS Excel is very simple. The steps are as follows: Open an Excel worksheet and enter the data values of X and Y variables as two arrays (columns or rows). Keep these contiguous if possible. Select the cell where you want to store the result r. Enter the formula with as, ‘=CORREL (array1, array2)’ ‘array1’ is a cell range of values and ‘array2’ is a second cell range of values. 1– 182 syntax Normal Distribution NORMDIST returns the normal distribution for the specified mean and standard deviation. This function has a very wide range of applications in statistics, including hypothesis testing. Syntax: NORMDIST(x,mean,standard_dev,cumulative) X is the value for which you want the distribution. Mean is the arithmetic mean of the distribution. Standard_dev is the standard deviation of the distribution. 1– 183 Brief about SPSS SPSS Statistics is a software package used for statistical analysis. SPSS Files SPSS uses several types of files. First, there is the file that contains data view variable view. These have been entered using SPSS Data Editor and Window. It is known as an SPSS system file. Cont…. 1– 184 SPSS Data Editor Window – Data View Cont…. 1– 185 Data Editor Window – Variable View Cont…. 1– 186 Define Variable Dialog Box Student Motivation Not willing Undecided Willing Cont…. 1– 187 Value Labels – Dialog Box Value Labels Coded with Value and Value Label Cont…. 1– 188 SPSS Data Editor Window with all Record Entered 1– 189 Summary Microsoft office is one of the most powerful office productivity tools in the market today. The entire suite is vast and covers a wide range of software solutions catering to various aspects of modern businesses. Microsoft excel is a powerful accounting and calculation solution. It has a standard tabular layout and it supports a wide range of arithmetic, accounting and statistical functions. The Microsoft Outlook is the mail client that can be set up to download mails mail server as well as send and receive emails as desired. Being a part Office suite, this tool is compatible with other applications in from a of the Microsoft the suite. Cont…. 1– 190 One of the most popular and widely used Microsoft Office Suites is the MS 2003. Later Microsoft released two other versions of Office, namely Office 2010. Although Office 2010 is the latest version, many use Office 2003. From Office 2003 to Office 2007, Office Office 2007 and businesses still continue to Microsoft radicalised the overall look and feel of the office suite. Excel is built on the concept of cell, rows, columns, spreadsheets and workbooks. The entire structure is hierarchical, and this allows it to be scalable and versatile enough to adapt to varying needs for users from different specialisations. Understanding concepts is pretty useful in developing complex reports and models. the following Cont…. 1– 191 As long as you work on the soft copies, page layouts are not really important – can scroll a spreadsheet to view the contents. However, when it comes to important that one gets the page layouts sorted out. Excel 2010 printouts you it is has all the page layout options under Page Layout menu item. While working with any Office productivity tool, the clipboard functions are invaluable. The most common clipboard functions are ‘Cut’, ‘Copy’ and ‘Paste’. Microsoft Office suite, there are keyboard shortcuts for these functions. conversant with the Excel functions, you would prefer to use In the Once you become the keyboard shortcuts as they are faster and easier to use than the mouse. Cont…. 1– 192 A new worksheet is a grid of rows and columns. The rows are labelled with numbers, and the columns are labelled with letters. Each intersection of a row column is a cell. Each cell has an address, which are the column letter number. The arrow on the worksheet to the right points to cell A1, which highlighted, indicating that it is an active cell. A cell must be and and a the is active row currently to enter information into it. Excel is a very powerful accounting tool, but before going to the real complex functions, let us sees how to use Excel for simple calculations. There are two of using Excel for simple calculations: you can enter the actual cell or use pre-defined Excel formulas to do the ways arithmetic equations in the same. Cont…. 1– 193 Statistical calculations for exponential random variables could be calculated statistical functions available in MS Excel. NORMDIST returns the distribution for the specified mean and standard deviation. This range of applications in statistics, including hypothesis using normal function has a very wide testing. Syntax: NORMDIST(x,mean,standard_dev,cumulative) SPSS Statistics is a software package used for statistical analysis. Long produced SPSS Inc., it was acquired by IBM in 2009. The current versions IBM SPSS Statistics. Companion products in the by (2014) are officially named same family are used for survey authoring and deployment (IBM SPSS Data Collection), data mining(IBM SPSS Modeler), text analytics, and collaboration 1– 194 and deployment (batch and automated scoring services). 1– 195