Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Class 2 Submit Homework Questions Quiz Introduction to Web CT Descriptive Statistics Introduction to Excel Web CT • Address is http://webct.liu.edu (No www) • Or you can click on it from my web pages Logging Into Web CT • • Your I.D. is first name.last name Use all lower case & no spaces • Password is your soc. sec. #, no spaces Descriptive Statistics • Organize • Summarize • Clarify • Present Data Measures of Central Tendency Mean Group of friends ages: 29, 27, 22, 31, 32, 27 Number of individuals = n = 6 Each age = X, i.e. X1, X2, X3, X4… General term is Xi Sum of the ages = Σ Xi Mean = Σ Xi/n Mean Add the ages = 168 Divide by 6 Mean (Average) = 28 Is this a good representation of the ages? 29, 27, 22, 31, 32, 27 Second Example of Mean One friend is replaced by a great-grandpa Ages: 29, 27, 22, 31, 32, 90 Mean = 231/6 = 38.5 Hey!!! The answer is 38.5 Older than 5 of the 6 people there! Doesn’t look so representative of the data to me. What d’ ya think? Property of the Mean The mean gives a lot of weight to extreme values. The Median Middle Value of an Array of Values An array???? The values in ascending or descending order. How do we get the Median? We have to pick one of the values. Which one? Pick it by its location in the array. Formula for the location. Location = (n+1)/2 Same Two Groups 29, 27, 22, 31, 32, 27 29, 27, 22, 31, 32, 90 There are 6 values Location = (6+1)/2 = 7/2 = 3.5 Halfway between the 3rd & 4th values The Median Value Halfway between 3rd & 4th value 29, 27, 22, 31, 32, 27 29, 27, 22, 31, 32, 90 OOOPS! What 3rd and 4th value??? Do the arrays first 22, 27, 27, 29, 31, 32 Median = 28 22, 27, 29, 31, 32, 90 Median = 30 Commentary The medians are 28 and 30. Do these represent the data any better than the mean did? Depends on your opinion and on what you want to convey and the circumstances, etc. More about the location • • • • • • • Another example: 7, 9, 22, 27, 33, 34, 80 How many values? 7 What is (n+1)/2 4 The median is the 4th value. It's easier to find when there are an odd number of values. • What is the median? • 27 Property of the Median Completely ignores the extremes Two more examples 24, 27, 29, 30, 34, 36, 39 2, 8, 10, 30, 40, 66, 90 Median 1 = 30 Median 2 = 30 Mode Number that occurs most frequently 29, 27, 22, 31, 32, 27 What’s the mode? 27 29, 26, 22, 31, 32, 27 Mode? There is none. Is the Mode Useful • Sometimes there is no mode • 24, 24, 27, 29, 30, 30, 80 • The mode? • There are 2 modes. • So the usefulness of the mode is limited Is the Mode Representative of the Data? 29, 27, 22, 31, 32, 27 What’s the mode? 27 29, 27, 22, 32, 80, 80 What’s the mode? 80 Advantage of the Mode • Can be used for qualitative data • Family picnic group 3 children, 1 single adult, 1 widow, 2 married What’s the mode? • children • Patients’ Diseases in Dr. Jones’ practice 10 diabetes, 14 coronary artery disease, 3 asthma What’s the mode? • Coronary artery disease What you should know about Measures of Central Tendency • How to determine each of them. • Properties, advantages, disadvantages of each. How Good is our Measure • Whenever we summarize data, for example showing central tendency, we always show an approximation. • We are always underestimating or overestimating or even just throwing away some data. • One method of judging our estimate of central tendency is to look at how closely the individual values are clustered around it. Measures of Dispersion • Indicate whether the values are compressed or widely spread. • 3 patients, ages 23, 24, 27 Mean = 24.67 • 3 patients, ages 1, 33, 40 Mean = 24.67 Measures of Dispersion RANGE Highest Value minus Lowest Value _ Patients ages: 23, 24, 27.XX = 24.67 Patients ages: 1, 33, 40. X = 24.67 Ranges? 27-23 = 4 40-1 = 39 Properties of the Range Easy to use – a quick indication. Immediately showed the difference in our two sets of patients. BUT… • Ignores most of the values • Uses only the two extremes Weakness of the Range 7, 20, 33, 48, 60, 70, 80 What’s the Range? 80 – 7 = 73 But look, a very different group 7, 25, 25, 26, 28, 29, 80 Range? 80 – 7 = 73 Interquartile Range • • • • • IQR = Quartile 3 – Quartile 1 Find quartiles Data has to be in an array Divide data into 4 equal parts Dividing value between the top ¼ and the bottom ¾ of the values = Q3 = 75th %ile • Between the top ¾ and the bottom ¼ = Q1 = 25th percentile Finding the Quartiles • Have to find locations for Q3 and Q1 • For Q3 loc, find n*3/4 • If it’s not an integer, take the next higher whole number. • Find it in the array. That is Q3 • For Q1 loc, find n/4. If it’s not an integer, take the next higher whole number. • Find it in the array. That is Q1 Interquartile Range Example • 29, 27, 22, 31, 32, 27, 31, 43 • What first? • Array 22, 27, 27, 29, 30, 31, 31, 43 • Q1 = 27 Q3 = 31 • IQR = 31 – 27 = 4 Spread around the Mean • Suppose we look at how far each individual value is from the mean • And take a sum of Xn – X • Would give a good idea of the spread • Could even make it into an average spread by dividing by n Variance The Spread Around the Mean 7, 20, 30, 33, 50, 60, 80 The mean is 40 Variance Difference between each value and the mean Squared Totaled Divided by n-1 Calculate the Variance 7, 20, 30, 33, 50, 60, 80 Mean is 40 (7-40) + (20-40) + (30-40) + (33-40) + (50-40) + (60-40) + (80-40) Finish Calculating the Variance • Square the Differences • Take the sum of the Squares • Divide by n – 1. • Why not divide by n. Discuss later. n – 1 = degrees of freedom A Change of Pace We have more to do on Descriptive Statistics but let’s do some calculations using excel Excel & Descriptive Stats Turn on Computers • • • • Go to Start (lower left) All Programs Office Excel Excel • Use as a calculator • Use statistical package requires an “add-in” Getting the Statistical Pkge • Open excel, click tools, look for “Data Analysis” • If it’s not listed, click Add-Ins (if you don't see add-ins, click on the arrows at the bottom of tools menu) • Click the box next to Data Analysis and then click OK • (You may be told that you have to insert the cd with excel program on it. If so, insert it.) • You should notice the computer working to download the add-in • Click on tools again and you should see Data Analysis in the list. Using Excel as a Calculator 1. 2. 3. 4. 5. 6. List the ages in a column Click on cell after the last age Click Σ on the toolbar Write n in next column & 6 under it. Write Mean in another column In cell under Mean, type =, click on the cell containing total, type /, click on the cell containing 6. To do an Array So-o-o Easy • • • • Type ages in a column Copy and paste into the next column Highlight the values in the copy A On the toolbar, click Z To locate the median value • Remember the formula for location is Location = (n+1)/2 • In excel, start all formulas with the equal sign • Type in “Location”. In the next cell, type =(6+1)/2, Enter To Calculate the Variance • • • • Put ages in a column & get the sum Make a labeled cell for n Make a labeled cell for n-1 Put label “Mean” & under it put =, click on the cell with sum, then /, click on cell with value for n-1, Enter. Very Important Hint for Excel • I will be putting models for calculation on the web site. • You will see the values, not the formulas • BUT, you can change to view the formulas • Hold down control and hit the accent key, upper left above the tab and under esc. Copying Formulas • To copy to the next cells, highlight the cell and drag the little box in the lower right of the cell. • Remember that the cells to be used in the formula will be adjusted • To copy the value in the cell not the location, use a $ before the column letter or before the row number or before each. Formatting Excel • To make a column wide enough for the label or values, double-click on the right border • To highlight cells adjacent to each other, click on one, hold down shift key and move to the other cells. • To underline a cell, not just the letters in it, use the little box on the toolbar • To merge several cells and center the label, use the little box with 2 arrows Excel Doing All the Work • You now know the summation sign on the toolbar. Click on the little arrow next to it. We will find lots of “functions” there. • Find the Mean. Click on an empty cell, probably the one just under the mean you already calculated. Click on the functions arrow. • Click Average & highlight the column of ages, Enter. • You should have the same answer. Variance • Click cell under the variance you calculated • Click Autofunction (the little arrow by the summation sign) • Click More Functions & In select a category, choose statistical • Go down to VAR • Next to the Number 1 box, click on the little graph, highlight the ages data, Enter, Enter. • Should have same answer. Standard Deviation • May be the most widely used measure of dispersion • Related to the Variance • It’s just the square root of the variance. Root-Mean-Square • Descriptive name for the St Dev • (∑(Xn-X)/n-1)-1/2 • Go back to excel and get the standard deviation by taking square root and by letting excel do it directly from data To Compare s from 2 Samples • Use Coefficient of Variation • s/X times 100. Answer is a percentage. • Why use CV? CV • Removes differences due to units of measurement • We might want to know whether serum cholesterol levels, measured in mg/100ml, are more variable than body weight, measured in pounds Another Advantage of CV • Variance or st.dev. if applied to 2 groups with very different means may give misleading idea of variability • Wts of 11year-old boys compared to weights of 25 yr-olds 11’s, X = 80, s = 10 25’s, X = 145, s = 10 Are they equally variable? • CV’s = 12.5% & 6.9% • Young boys wt more variable End of Ungrouped Descriptives • Continue these topics using Grouped Data Grouped Data We do this all the time Change in my pocket 3 dimes, 2 nickels, 4 quarters, 6 pennies (3X10) + 2(5) + (4X25) + (6X1) Sum = $1.46 Group Mean or Weighted Mean Average Age of Dr. Jones’ Patients 12, 14, 15, 14, 8, 7, 8, 9, 14, 12 n = 10 ∑ = (2X12) + (3X14) + (2X8) + 7 + 9 + 15 X = (24 + 42 + 16 + 31)/10 = 113/10 = 11.3 Frequency Distribution • Group values within intervals • All intervals of equal size • Intervals not overlapping Midpoint of Intervals • Use midpoint as though it were the actual value for the individuals in the interval • Calculate midpoint: (LL + UL)/2 The End for now . . . • Let’s just try a frequency table on excel. On Course Outline, go to Excel & Descriptive Stats