Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Transcript

Statistics • The systematic and scientific treatment of quantitative measurement is precisely known as statistics. • Statistics may be called as science of counting. • Statistics is concerned with the collection, classification (or organization), presentation, analysis and interpretation of data which are measurable in numerical terms. Stages of Statistical Investigation Collection of Data Organization of data Presentation of data Analysis Interpretation of Results Statistics • It is divided into two major parts: Descriptive and Inferential Statistics. • Descriptive statistics, is a set of methods to describe data that we have collected. i.e. summarization of data. • Inferential statistics, is a set of methods used to make a generalization, estimate, prediction or decision. When we want to draw conclusions about a distribution. Statistics functions & Uses • • • • • • • It simplifies complex data It provides techniques for comparison It studies relationship It helps in formulating policies It helps in forecasting It is helpful for common man Statistical methods merges with speed of computer can make wonders; SPSS, STATA MATLAB, MINITAB etc. Scope of Statistics • • • • • • • In Business Decision Making In Medical Sciences In Actuarial Science In Economic Planning In Agricultural Sciences In Banking & Insurance In Politics & Social Science Distrust & Misuse of Statistics • Statistics is like a clay of which one can make a God or Devil. • Statistics are the liers of first order. • Statistics can prove or disprove anything. Measure of Central Tendency It is a single value represent the entire mass of data. Generally, these are the central part of the distribution. It facilitates comparison & decision-making There are mainly three type of measure 1. Arithmetic mean 2. Median 3. Mode Arithmetic Mean This single representative value can be determined by: A.M. =Sum/No. of observations Properties: 1. The sum of the deviations from AM is always zero. 2. If every value of the variable increased or decreased by a constant then new AM will also change in same ratio. Arithmetic Mean (contd..) 3. If every value of the variable multiplied or divide by a constant then new AM will also change in same ratio. 4. The sum of squares of deviations from AM is minimum. 5. The combined AM of two or more related group is defined as Median Mode • Mode is that value which occurs most often in the series. • It is the value around which, the items tends to be heavily concentrated. • It is important average when we talk about “most common size of shoe or shirt”. Relationship among Mean, Median & Mode • For a symmetric distribution: Mode = Median = Mean • The empirical relationship between mean, median and mode for asymmetric distribution is: Mode = 3 Median – 2 Mean Advantages and disadvantages Mean More sensitive than the median, because it makes use of all the values of the data. It can be misrepresentative if there is an extreme value. Median It is not affected by It is less sensitive than the extreme scores, so can give mean, as it does not take a representative value. into account all of the values. Mode It is useful when the data are in categories, such as the number of babies who are securely attached. It is not a useful way of describing data when there are several modes. Same center, different variation • Ignores the way in which data are distributed Range = 12 - 7 = 5 Range = 12 - 7 = 5 7 8 9 10 11 12 7 8 9 10 11 12 When the value of Arithmetic mean is fraction value(not an integer), Then to compute variance we use the formulae: 1 X 2 X n n 2 2 Calculate S.D.;x 10 11 17 25 7 13 21 10 12 14 Formulae for Frequency distribution By Definition: 2 1 f x x f 2 For Computation: f x fx f f 2 2 2 Example • An analysis of production rejects resulted in the following figures. Calculate mean and variance for number of rejects per operator No. of rejects per operator No. of operators 21-25 5 26-30 15 31-35 23 36-40 42 41-45 12 46-50 03 Example • Calculate variance from the following data. (Sale is given in thousand Rs.) Sale No. of days 10-20 3 20-30 6 30-40 11 40-50 3 50-60 2 An Analysis of production rejects resulted in following observations No. of rejects/ operator No. of operators 20-25 5 25-30 15 30-35 28 35-40 42 40-45 15 45-50 12 50-55 3 Calculate the mean and standard deviation. • Measures relative variation • Always in percentage (%) • Shows variation relative to mean • Is used to compare two or more sets of data measured in different units Comparing Coefficient of Variation • Stock A: – Average price last year = $50 – Standard deviation = $5 • Stock B: – Average price last year = $100 – Standard deviation = $5 Coefficient of variation: Stock A: S CV X $5 100% 100% 10% $50 Stock B: S CV X $5 100% 100% 5% $100 An investment ‘A’ has an Expected return of Rs.1,000 and a standard deviation of Rs. 300. Another investment ‘B’ has a standard deviation of its returns as 400 but its expected return is 4,000. Calculate which investment is more risky. . Example • 2. A quality control laboratory received samples of electric bulbs for testing their lives, from two companies. The results were as follows: (a). Which company’s bulbs have the greater length of life? (b). Which company’s bulbs are more uniform with respect to their lives? Length of life (in hrs.) Company A Company B 15-20 16 18 20-25 26 22 25-30 08 08 The share prices of a company in Mumbai and Kolkata markets during the last 10 months are recorded below: Month Mumbai Kolkata Jan 105 108 Feb 120 117 March 115 120 April 118 130 May 130 100 June 127 125 July 109 125 Aug 110 120 Sep 104 110 Oct 112 135 Determine the Mean and standard deviation of prices of shares .In which markets are the share prices more stable? Shape of a Distribution • Describes how data is distributed • Measures of shape – Symmetric or skewed Left-Skewed Mean < Median < Mode Symmetric Mean = Median =Mode Right-Skewed Mode < Median < Mean Skewness For a positively skewed distribution: Mean>Median>Mode • For a Negatively skewed distribution: Mean<Median<Mode Measure of Skewness • Karl Pearson coefficient of Skewness: Mean Mode Sk S.D Where -3 <= Sk <= 3 Calculate the Karl pearson coefficient of skewness for the given data & comment about the result. 7, 9, 15, 16, 17, 22, 25, 27,33,39. Advantages and disadvantages Advantages Disadvantages Range Quick and easy to calculate Affected by extreme values (outliers) Does not take into account all the values Standard deviation More precise measure of dispersion because all values are taken into account Much harder to calculate than the range