Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
THEME: VARIATION ROWS. AVERAGES THEME TOPIC:.: Averages and variable rows are very important in every day medical practice and in the scientific activity. We use them for analyzing of medical establishments in qualitative way, for physical development estimation (average height, average weight, ) for calculate social and demographical index. Averages are using for finding central tendency of phenomenon, making conclusions about dispersion one phenomenon over the space. SEMINAR GOAL: LEARNING : Students must be able to: find central tendency of the phenomenon, analyze event dispersion and use averages in practical activity. EDUCATIONAL: A generalizing average process is very important to the health care. Only general processes examination allows make conclusions about population. LEARNING OBJECTIVES Student must know: Student must be able to: Practical usage of descriptive statistic; Main rules of making variable rows; Main average values, methods of their calculation; Practical usage of averages; Build and describe variable row Calculate arithmetic mean. Find standard deviation and coefficient of variation. GENERAL INFORMATION The information gathered in a study can often take different forms, such as frequency data (for example, the number of votes cast for a candidate in elections) and scale data. These data are often initially arranged or organized in such a way that they are difficult to read and interpret. Descriptive statistics offers us some procedures that allow us to represent data in a readable and worthwhile form. Some of these procedures allow us to obtain a graphical representation of the data, while others allow us to obtain a set of parameters that summarize important properties of the basic data. Every independent variable with the same qualitative characteristics and different or same relative frequency could be organized (grouped) into the table or it is possible or by the building a variation row. Simple variation row –when every variable relative frequency is one (1) and the total number of observations is no more then 30. Grouped variation row –when every variable relative frequency is more than one (1) and the total number of observations is more then 30. Other worlds we can say that every variable has each own weight inside that variation row. Every variation row has: Variable(x) --is a measurable characteristic of data taken in correct way. Relative frequency (f )—shows how often is every variable in the one group. Number of observations—n , mathematically it is n=∑f There are such kinds of variation row: Ranged –variables are grouped and systemized in order of increasing or decreasing of their numerical value Unranged-- variables are grouped and but not systemized in order of increasing or decreasing of their numerical value Interval —variable value represented by an interval Uninterval-- variable value represented without an interval Discrete – variable are taken by the counting and can by represented by the whole numbers Increte (continues)-- variable are represented by the fractions numbers, and are results of measurements Range or the row or Amplitude of the row-- Consider a set of observations relative to a quantitative variable X. If we denote by Xmax the value of the highest observation in a set of observations and by Xmin the lowest value, then the range is given by: range = Xmax - Xmin When observations are grouped into classes, the range is equal to the difference between the center of the two extreme classes. Let S1 be the center of the first class and Sk the center of the last class. The range is equal to: range = Sk — S1 CENTRAL LIMITTHEOREM 1 The central limit theorem states that, under conditions of repeated sampling from a population, the sample means of random measurements tend to possess an approximately normal distribution. This is true for population distributions that are normal and decidedly not normal.THECENTRALLIMITTHEOREMISAFUNDAMENTAL THEOREM OF STATISTICS---PRESCRIBES THAT THE SUM OF ASUFFICIENTLY LARGE NUMBEROF INDEPENDENT IDENTICALY DISTRIBUTED RANDOM VAIABLES APPROXIMATELY FOLLOWS ANORMALDISTRIBUTION. NORMALDISTRIBUTION A simple way of organizing the data is to list all the possible values between the highest and the lowest in order recording the frequency (f)with which each score occurs. This forms a frequency distribution. Normal distribution plays a central role in the theory of probability and its statistical applications. Many measurements such as the size or weight of individuals, IQ, etc. approximately follow a normal distribution. The normal distribution is frequently used as an approximation , either when the normality is attributed to a distribution in the construction of a model or when a known dist NORMAL DISTRIBUTION CURVE-- A Brief History of the Normal Curve The discovery of the normal curve, also known as the “bell-shape” curve or the Gaussian curve, can be dated to the 17th century, when Galileo Galilei, an Italian physicist and astronomer, noted that the measurement errors in astronomical observations were very systematic and that small errors were more likely to occur than large errors. In 1778, Pierre-Simon Laplace, while working on his famous central limit theorem, noted that the sampling distribution of the sample mean approximated a normal distribution and that the larger the sample size, the closer the distribution would be to a normal distribution, no matter what the population distribution might be. Also in the 18th century, a French statistician, Abraham de Moivre, who was often asked to do statistical consulting for gamblers, found that when the number of events (e.g., coin flips) increased, the shape of the binomial distribution would approximate a symmetrical and smooth curve. However, the mathematical formula for this curve was not discovered until the 19th century, by Adrian Marie Legendre in 1808 and Carl Friedrich Gauss in 1809. The German 10 deutsche mark bill had Gauss’s picture on it, along with the well-known bell-shaped normal curve and its formula Important Properties of a Normal Curve (normal distribution curve) This curve has the following characteristics, which are important to know.. The mode, the mean, and the median are all at the same point on the abscissa, the horizontal axis of the curve That is to say, mode = mean = median for a normal distribution The curve is symmetrical about the point on the abscissas that denotes the mean, the mode, or the median, with equal numbers of observations above and below the point 95% of the distribution falls between approximately ±2 standard deviation of the mean. This leaves the remaining 5% split into two equal parts at the two tails of the distribution (normal distribution is symmetrical) therefore ,only 2,5% of the distribution falls more than 2 standard deviation above the mean, and another 2,5% falls more than 2 standard deviations below the mean. The standard deviation is particularly useful in normal distributions, because the pro portion of elements in the normal distribution (i.e., the proportion of the area under the curve) is a constant for a given number of standard deviations above or below the mean of the distribution, as approximately 68% of the distribution falls within ± 1 standard deviation of the mean, approximately 95% of the distribution falls within ±2 standard deviations of the mean and approximately 99.7% of the distribution falls within ±3 standard deviations of the mean. Because these proportions hold true for every normal distribution, they should be memorized. Averages Measure of central tendency A measure of central tendency is a statistic that summarizes a set of data relative to a quantitative variable. More precisely, it allows determining a fixed value, called a central value, around which the set of data has tendency to group. We use averages to measure central tendency. We can say that an entire distribution can be characterized by one typical measure that represents all the observations—measure of central tendency. The principal measures of central tendencies and central distribution are: Arithmetic mean—when there is arithmetical progression Median Mode Geometric mean —when there is geometric series (progression) __ Arithmetic mean (or mean µ or X) 2 —allows us to characterize the center of the frequency distribution of a quantitative variable by considering all of the observations with the same weight afforded to each ( in contrast to the weighted arithmetic mean). It is calculated by summing the observations and then dividing by the number of observations. _ ∑x X= -------------n _ X-- Arithmetic mean, simple n—number of observations x—every observation Arithmetic mean, simple— when relative frequency for every variable is no more then 1 by the other words all observations have the same importance—no more then 1. Weight Arithmetic mean—when relative frequency for every variable is more then 1, by the other words all observations doesn’t have the same importance. We must assign a weight to each observation depending on its importance relative to other observations. The weighted arithmetic mean equals the sum of observations multiplied by their weights divided by the sum of their weights. __ _ ∑x∙f X--Weight Arithmetic mean, X= -------------n—Total number of observations n x—every observation f—relative frequency (or relative importance or weight of variable) The main criteria of arithmetic mean are: Depends on the value of all observations. Is simple to interpret. Is the most familiar and the most used measure Is frequently used as an estimator of the mean of the population Has a value that can be falsified by the outliers. the sum of squared deviations of each observation xi of a set of data and a value α is minimal when a equals the arithmetic mean , see formula below Median (Md or Xe ) The median is a measure of central tendency defined as the value that is in the center of a set of ordered observations when it is in increasing or decreasing order We find then 50% of the observations on each side of the median • Is easy to determine because only one data classification is needed. • Is easy to understand but less used than the arithmetic mean. Is not influenced by outliers, which gives it an advantage over the arith metic mean, if the series really have outliers Is used as an estimator of central values of a distribution, especially when it is asymmetric or has outliers. The sum of square deviations in absolute value between each observation xi of a set of data and a value α is minimal when α equals the median Mode (X0 or Mo ) The mode is a measure of central tendency. The mode of a set of observations is the value of the observation that have the highest frequency. According to this definition a distribution can have a unique mode (called the unimodal distribution). In some situations a distribution may have many modes (called the bimodal, trimodal, multimodal, etc. distribution). Has practical interest because it is the most represented value of a set 3 Is in any event a rarely used measure Has a value that is little influenced by outliers Has a value that is strongly influenced by the fluctuations of a sampling. It can strongly vary from one sample to another In addition, there can be many (or no) modes in a data set Geometric mean The geometric mean is defined as the root of the product of n non-negative numbers. or We note that the logarithm of the geometric mean of a set of positive numbers is the arithmetic mean of the logarithms of these numbers (or the weighted arithmetic mean in the case of grouped observations) or Dispersion A measure of dispersion allows to describe a set of data concerning a particular variable, giving an indication of the variability of the values inside the data set. The measure of dispersion completes the description given by the measure of central tendency of a distribution. If we observe different distributions, we can say that for some of them, all the data are grouped in a more or less short distance from the central value; for others the distance is much greater. There are such criteria of dispersion: Standard deviation Coefficient of variation Standard Deviation ( δ or S ) The standard deviation is a measure of dispersion. It corresponds to the positive square root of the variance, where the variance is the mean of the squared deviations of each observation with respect to the mean of the set of observations It is usually denoted by δ when it is relative to a population and by S when it is relative to a sample In practice, the standard deviation δ of a population will be estimated by the standard deviation S of a sample of this population For the simple variation row For the grouped variation row. 2 d f n 1 d n 1 2 _ d = x –Х – a difference of each variants from arithmetic mean or d= ( xi –x ) for n-1 is probably more appropriate for small samples, whereas the use of n is preferable for large samples. if you replace d by the ( xi –x ) For the simple variation row 1 1 x 2 ( x ) 2 n 1 n For the grouped variation row. x f nX 2 2 n 1 Coefficient of variation The coefficient of variation is a measure of relative dispersion. It describes standard deviation as a percentage of the arithmetic mean. This coefficient can be used to compare the dispersions of quantitative variables that are not expressed in the same units ( for example , when comparing the salaries in different countries, given in different currencies), or the dispersions of variables that have very different means. The coefficient of variation (CV) is defined as the ratio of the standard deviation to the arithmetic mean for a set of observations. This coefficient is independent of the unit of measurement used for the variable. S δ CV= -------------- ×100% or CV=---------- ×100% X µ S—standard deviation for the sample or δ-- standard deviation for the population X—is arithmetic mean of the sample or µ--is the mean of the population 4 Standard deviation represents …n.% of the arithmetic mean and show where the distribution is more homogeneous. CV<10%....the low level of dispersion CV=10-20%.medium level of dispersion.. CV>20%....high level of dispersion PRACTICAL SKILLS THEORETICAL QUESTIONS 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Describe notion «variation rows», name its components. Describe the types of variation rows, what is the difference between them. Methods of calculating arithmetic mean and weight arithmetic mean. Name main mean fetchers. Standard deviation—notion, and practical use. Standard deviation calculation—for mean and weight mean. Coefficient of variation, notion and practical use. What is the median and middle? How do we find them? Dispersion—explain the term and how do we use it in practice. How to find central variable in the interval row of variable. 11. .Normal distribution—explain it. SITUATION MODEL TASKS TASK № 1 Describe the row of variables; Calculate mode , median, mean ;Find standard deviation; Make conclusions. Quantity of population in the district connected with clinic №1 : 1240, 1350,1210,1305, 1116. TASK № 2 Describe the row of variables; Calculate mode , median, mean ;Find standard deviation; Make conclusions Results of weight measuring in 35 new-born boys (in kg): 4,0; 3,2; 3,7; 4,5; 4,4; 3,0; 4,3; 3,3; 3,2; 4,1; 3,8; 3,8; 4,2; 4,1; 4,0; 3,3; 3,1; 2,5; 2,8; 3,2; 4,2; 3,5; 2,9; 3,5; 3,2; 3,1; 5,0; 2,7; 3,1; 3,3; 3,2; 3,0; 3,0; 3,2; 3,8. TASK № 3 Describe the row of variables; Calculate mode , median, mean ;Find standard deviation; Make conclusions The height of 7-yares school boys (cm) in school № K. Growth Amount of boys 114-115,9 116-117,9 118-119,9 120-121,9 122-123,9 124-125,9 126-127,9 128-129,9 130-131,9 4 7 9 12 16 14 8 6 3 TASK № 4 Describe the row of variables; Calculate mode , median, mean ;Find standard deviation; Make conclusions Number of appeals during a day (for period there are 12 months of calendar year) after services of medicare of city N quick and exigent. made: 165, 161, 167, 164, 163, 142, 143, 137, 156, 151, 147, 149. TASK № 5 Describe the row of variables; Calculate mode , median, mean ;Find standard deviation; Make conclusions The girlies weight (kg) of the first form of school № 5 m. The girls age is seven years old. Mass of body 20,0-21,9 22,0-23,9 24,0-25,9 26,0-27,9 28,0-29,9 30,0-31,9 32,0-33,9 Amount of girlies 6 8 12 16 9 5 4 TASK № 6 Describe the row of variables; Calculate mode , median, mean ;Find standard deviation; Make conclusions Amount of patients with the breaks of lower jaw after the medical treatment in the first-aid stomatology hospital L. Terms of medical treatment 38-40 41-43 44-46 47-49 50-52 53-55 56-58 (days) Amount of patients 3 6 10 12 11 6 2 TASK №7 Describe the row of variables; Calculate mode , median, mean ;Find standard deviation; Make conclusions 5 Number of children over one year, which are examining by 34 doctors-pediatrics: 58, 62, 60, 55, 62, 63, 65, 48, 49, 52, 54, 42, 51, 59, 57, 47, 48, 48, 40, 45, 51, 60, 39, 50, 58, 40, 51, 42, 54, 49, 47, 38, 45, 45. TASK № 8 Describe the row of variables; Calculate mode , median, mean ;Find standard deviation; Make conclusions The period of medical treatment (in days) in the poulmonology department of hospital № 2 m. . patients with pneumonia: 25, 11, 12, 13, 24, 23, 23, 24, 21, 22, 21, 14, 14, 22, 20, 20, 15, 15, 16, 20, 20, 16, 16, 20, 17, 17, 19, 19, 19, 18, 18, 18, 18, 19, 19, 17, 17, 18, 18, 19, 26. TASK № 9 Describe the row of variables; Calculate mode , median, mean ;Find standard deviation; Make conclusions Number of patients with burns after the 1-5 6-10 11-15 16-20 21-25 26-30 31-35 36-40 41-45 term of stay in burning department, Treatment interval (day) Amount of patients 1297 718 884 658 297 200 118 54 40 TASK № 10 Describe the row of variables; Calculate mode , median, mean ;Find standard deviation; Make conclusions v Fever period (period of height temperature) for pneumonia at 32 patients (number of days with height temperature): 3, 8, 14, 14, 7, 6, 4, 12, 13, 3, 4, 5, 10, 11, 5, 10, 10, 11, 12, 8, 9, 7, 7, 8, 9, 9, 7, 8, 12, 6, 10, 9. TASK № 11 Describe the row of variables; Calculate mode , median, mean ;Find standard deviation; Make conclusions Number of patients with strokes (paralization of the whole body) after medical treatment period in the neurological department in hospital L. Terms of medical treatment 38-40 41-43 44-46 47-49 50-52 53-55 56-58 (days) Amount of patients 5 8 12 18 15 1 7 3 RECOMMENDED LITERATURE 1. “Social medicine and health care organization, recommendations for the practical lessons”— Acad. Voronenko Y.V, Prof. Ruden V.V. 2004 Lviv national medical University of Danylo Galittsky 2. MEDICAL STATISTICS from A to Z ,A Guide for Clinicians and Medical Students B.S. Everitt 2006 Institute of Psychiatry, King’s College, University of London 2006 Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, Sao Paulo 3. Medical Statistics from Scratch An Introduction for Health Professionals Bowers (Honorary Lecturer, School of Medicine, University of Leeds, UK) 2008 John Wilkey & Sons, Ltd Encyclopedia of measurement and statistics Neil J. Salkind , K. RASMUSSEN (UNIVERSITY OF KANSAS) 2007, SAGE Publications, Inc 4. The Concise Encyclopedia of statistics Y.Dodge ( Honorary professor university of Neuchatel Switzerland) – 2008 Springer Science +Business Media 6 Graphical structure of the theme of the practical lessons TYPES OF VARIARION RAW VARIATION RAW – it is a raw of variables (x), independent variable with the same qualitative characteristics and different or same relative frequency are grouped into the raw SIMPLE When every variable relative frequency is one (1) and the total number of observations is no more then 30. GROUPED When every variable relative frequency is more than one (1) and the total number of observations is more then 30. CHARACTERISTIKS OF VARIATION RAW VARIABLE (х) Separate element (the value) is a measurable characteristic of data taken in correct way FRIQUENCY (f) Shows how often is every variable in the one group KINDS OF VARIATION RAWS NUMBER OF OBSERVATIONS RANGED (n) The sum of observations n=Σf Variables are grouped and systemized in order of increasing or decreasing of their numerical UNRANGED Variables are grouped and but not systemized in order of increasing or decreasing of their INTERVAL Variable value represented by an interval UNINTERVAL DISCRETE (DISCONTINUOUS) INCRETE (CONTINUES) Variable value represented without an interval Variable are taken by the counting and can by represented by the whole numbers Variable are represented by the fractions numbers, and are results of measurements GROUPED VARIATION RAW COMPILING STAGES To estimate number of groups To estimate interval To find groups limits and the middle To classify observations into groups PRACTICAL USAGE OF VARIATION RAWS For distribution characteristic For finding averages (average values) 7 AVERAGES – We use averages to measure central tendency of the phenomenon. A measure of central tendency is a statistic that summarizes a set of data relative to a quantitative variable. More precisely, it allows determining a fixed value, called a central value, around which the set of data has tendency to group. The measures of central tendencies and central distribution are (TYPES OF VARIABLES) : MODE – Х0 The mode of a set of observations is the value of the observation that have the highest frequency MEDIAN – Хе defined as the value that is in the center of a set of ordered observations when it is in increasing or decreasing order MEAN ( ARITHMETIC MEAN) --Х allows us to characterize the center of the frequency distribution of a quantitative variable GEOMETRIC MEAN ХGEOM. The geometric mean is defined as the root of the product of n non-negative numbers ARITHMETIC MEAN, SIMPLE when relative frequency for every variable is no more then 1 by the other words all observations have the same importance—no more then 1 MEAN TYPES WEIGHT ARITHMETIC MEAN when relative frequency for every variable is more then 1, by the other words all observations doesn’t have the same importance We must assign a weight to each observation depending on its importance relative to other observations. The weighted arithmetic mean equals the sum of observations multiplied by their weights divided by the sum of their weights THE MAIN CRITERIA OF ARITHMETIC MEAN PRACTICAL USE OF ARITHMETIC MEAN Average pulse rate Х Depends on the value of all observations. Is simple to interpret Is the most familiar and the most used measure Is frequently used as an estimator of the mean of the population. Has a value that can be falsified by the outliers For finding general characteristic of the phenomenon Average blood pressure Average newborns weight Х х n хf n the sum of squared deviations of each observation xi of a set of data and a value α is minimal when a equals the arithmetic mean , see formula below For comparing phenomenon characteristic with the average—to find a deviation ( compare the weight with the age standards) Average bed occupancy Average time of treatment 8 Dispersion Criteria –A measure of dispersion allows to describe a set of data concerning a particular variable, giving an indication of the variability of the values inside the data set. The measure of dispersion completes the description given by the measure of central tendency of a distribution. If we observe different distributions, we can say that for some of them, all the data are grouped in a more or less short distance from the central value; for others the distance is much greater. σ STANDARD d 2 n 1 - for simple variation raw n-1 f DEVIATION σ (σ or S) d 2 f n 1 - for grouped variation raw - number of observations in population (if n>30 we can replace (n-1) with n ) - relative frequency of variables; d = x –Х – a difference of each variants from middle arithmetic mean x - variable σ - corresponds to the positive square root of the variance, where the variance is the mean of the squared deviations of each observation with respect to the mean of the set of observations, the a characteristic of dispersion — the bigger dispersion the bigger is standard deviation “σ” . COEFFICIENT OF VARIATION (СV or C) – is a measure of relative dispersion. It describes standard deviation as a percentage of the arithmetic mean. This coefficient can be used to compare the dispersions of quantitative variables that are not expressed in the same units ( for example , when comparing the salaries in different countries, given in different currencies), or the dispersions of variablesthat have very different means. is defined as the ratio of the standard deviation to the arithmetic mean for a set of observations. This coefficient is independent of the unit of measurement used for the variable СV<10% low level of dispersion CV=10-20% - CV >20% - medium level of dispersion high level of dispersion CV σ 100 % X 9