Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Class Schedule • • • • Class next week Holiday the next week (Monday schedule) Exam the next week So only one more class before the first exam • Will cover Chapters 6, 3, 2. Only material included in class as shown in Course Outline. Class 3 • • • • • Return homewk from first class Get soc sec # from 2 students Review Assignment 2 Collect Assignment 2 Quiz • Continue Ungrouped Data • Start Grouped Data Very Important Hint for Excel • I will be putting models for calculation on the web site. • You will see the values, not the formulas • BUT, you can change to view the formulas • Hold down control and hit the accent key. • It’s the same key, upper left of the keyboard, that we use for the tilde ~. Copying Formulas • To copy to the next cells, highlight the cell and drag the little box in the lower right of the cell. • Remember that the cells to be used in the formula will be adjusted • To copy the value in the cell not the location, use a $ before the column letter or before the row number or before each. Formatting Excel • To make a column wide enough for the label or values, double-click on the right border • To highlight cells adjacent to each other, click on one, hold down shift key and move to the other cells. • To underline a cell, not just the letters in it, use the little box on the toolbar • To merge several cells and center the label, use the little box with 2 arrows Excel Doing All the Work • You now know the summation sign on the toolbar. Click on the little arrow next to it. We will find lots of “functions” there. • Find the Mean. Click on an empty cell, probably the one just under the mean you already calculated. Click on the functions arrow. • Click Average & highlight the column of ages, Enter. • You should have the same answer. To Calculate the Variance • • • • Put ages in a column & get the sum Make a labeled cell for n Make a labeled cell for n-1 Put label “Mean” & under it put =, click on the cell with sum, then /, click on cell with value for n, Enter. Deviations from the Mean • Go to the column next to the values, label it X-X • In the top cell, put = ,click on the cell with the first value, put minus sign, put in the row and column for the mean. This last value needs dollar signs, e.g. $b$4. Hit enter. • Highlight that cell and pull down the corner box. • When all cells are filled, go to below the column and get the autosum. Take the squares of the dev’ns • Label the next column “squares” • In the top cell, put =, click on the top deviation, put *, click on the top dev’n again. • Pull down to all cells in the column • Take an autosum. Variance • The sum of the deviations squared, divided by n-1is the variance • So in an appropriate cell, put =, highlight the cell with the sum, /, highlight the n-1 cell • Put the label “variance” or s2 next to it. Variance • Click cell under the variance you calculated • Click Autofunction (the little arrow by the summation sign) • Click More Functions & In select a category, choose statistical • Go down to VAR • Next to the Number 1 box, click on the little graph, highlight the ages data, Enter, Enter. • Should have same answer. Standard Deviation • May be the most widely used measure of dispersion • Related to the Variance • It’s just the square root of the variance. Root-Mean-Square • Descriptive name for the St Dev • (∑(Xn-X)2/n-1)-1/2 • We used the square to get rid of neg signs in the sum but then we go back to the same range by taking a square root later. • Go back to excel and get the standard deviation by taking square root and by letting excel do it directly from data Standard Deviation • Find a suitable empty cell. • Go to autofunctions and choose math & trig functions and click on SQRT. • Highlight the variance so that it will be used by the function. • Put the label st. dev., or s, in an empty cell above or next to your answer Autofunction Standard Deviation • Choose another appropriate empty cell • Label it • In the empty cell, go to autofunction, choose statistical, click on STDEV. • Choose the range of the original data. • Should get same answer as the calculated one. To Compare s from 2 Samples • Use Coefficient of Variation • s/X times 100. Answer is a percentage. • Why use CV? CV • Removes differences due to units of measurement • We might want to know whether serum cholesterol levels, measured in mg/100ml, are more variable than body weight, measured in pounds Another Advantage of CV • Variance or st.dev. if applied to 2 groups with very different means may give misleading idea of variability • Wts of 11year-old boys compared to weights of 25 yr-olds 11’s, X = 80, s = 10 25’s, X = 145, s = 10 Are they equally variable? • CV’s = 12.5% & 6.9% • Young boys wt more variable Finding the Quartiles • Have to find locations for Q3 and Q1 • For Q3 loc, find n*3/4 • If it’s not an integer, take the next higher whole number. • Find it in the array. That is Q3 • For Q1 loc, find n/4. If it’s not an integer, take the next higher whole number. • Find it in the array. That is Q1 Interquartile Range • Q3 – Q1 • Use autofunction to check your answer. Percentiles • Location = n * p/100 • e.g., the 90th percentile is located at n * 90/100. If its not an integer, go to the next higher integer. • Pick that value from the array • 10% of the values will be higher than P90 and 90% of the values will be lower. Special Problem • Biostatistics books give these directions for finding the percentiles • However, excel autofunction for percentile uses an interpolation method • Our answers will not agree End of Ungrouped Descriptives • Continue these topics using Grouped Data Grouped Data Grouped Data We do this all the time Change in my pocket 3 dimes, 2 nickels, 4 quarters, 6 pennies (3X10) + 2(5) + (4X25) + (6X1) Sum = $1.46 Group Mean or Weighted Mean Average Age of Dr. Jones’ Patients 12, 14, 15, 14, 8, 7, 8, 9, 14, 12 n = 10 ∑ = (2X12) + (3X14) + (2X8) + 7 + 9 + 15 X = (24 + 42 + 16 + 31)/10 = 113/10 = 11.3 Frequency • No. of times each value occurs is called its frequency • Instead of n, use ∑fi Grouped Variance • Use exactly the same concept as for Grouped Mean Types of Data • Before we continue with grouping, let’s go back and look at types of data • Ch. 2 Types of Values • Nominal • Ordinal • Ranked • Discrete • Continuous Nominal • • • • Comes from “name” Qualitative Classes: diabetes, asthma … Types or categories, may be coded in numbers: diabetes =1, asthma = 2 … • May be completely qualitative or semiquantitative, e.g over 45, over 65 … • In statistics, use the frequency of occurrence of each class Ordinal • When classes progress whether in ascending order or descending order • Staging of Cancer Stage 1 is least serious. Each more serious. Stage 2, 3 to Stage 4 • An order exists but there is no intrinsic magnitude, e.g. the difference between Stage 2 and Stage 3 may be much greater than the difference between Stage 3 and Stage 4 Ranked • Very similar to ordinal data • Difference is that the order depends on a quantitative difference • e.g. Rank students according to highest average • Rank diseases according to number of deaths caused by each • But still, differences between the sequential ranks may not be equal Quantitative Data • Has intrinsic magnitude • A count or a weight • Types: Discrete & Continuous • Types: Interval & Ratio Discrete vs Continuous • • • • • Discrete Eggs. We quantify them by counting. Number of births. Cannot be a fraction. Continuous Butter. We quantify it by weight. Can be fractional We can make differences infinitesimally small Continuous Can make intervals infinitesimally small Height: 5 feet 6 inches, or 5 ft. 6 1/4 in. or 66.21 inches Ages: Even if given in years, it is a continuous variable. 10 yrs or 10 ½ yrs or 122 months or even months, days, etc. Interval Data • Differences are equal • Temperature Difference between 20 and 30 degrees equals difference between 30 and 40 • These do not have a real zero Interval vs Ratio Data • This distinction not made in your text. Interval • Temperature. The scale we use does not have a true zero. Intervals are equal but ratios are not. 400 is not twice as warm as 200. Ratio • Weight or height. They do have a true zero. 4 inches is twice as long as 2 inches. Data Presentation • Start with Frequency Tables. • Next class, graphs. Frequency Distribution • Group values within intervals • All intervals of equal size • Intervals contiguous, not overlapping • Not too many groups, not too few Look at a Frequency Distribution • Pagano page 12, Table 2.6 • How many intervals? • Does it look like a good number? • Look at the distribution of data. Is it clear? Interval Sizes • Table 2.6 of Pagano • Look at left hand column. LL of first interval is • 80 • LL of second interval is • 120 • What is the interval size • 120-80 Class Limits • We can call the groups “classes” as well as “intervals”. • Why doesn’t the first class go from 80 to 120? • Would be ambiguous. Where would we put levels of 120, in the first class or the second? Using a Frequency Distribution • Can we use it to calculate measures of central tendency? • Measures of dispersion? • Yes to both questions, of course. • But how, we no longer have the actual values. • Usually, use midpoints of each class Midpoint of Intervals • Use midpoint as though it were the actual value for the individuals in the interval • Calculate midpoint: (LL + UL)/2 • (80 + 119)/2 = 99.5 • For each of the next classes, just add 40 to 99.5. Write out Mid-pts • Or even re-write the table using mid-pts • 99.5, 139.5, 179.5, 219.5, 259.5, 299.5, 339.5, 379.5 • Use this to find the mean Mean from F.D. • Mid-pt., mi, times frequency, fi, within that class • Take the sum • Also take the sum of the frequencies. • Mean = ∑ fi, mi, / ∑ fi, • Before we do an example, let’s take a look at calculating variance or st. dev. Variance from Grouped Data • From ungrouped data ∑(Xn-X)2/n-1. Review this. Does it make sense? • For grouped data (∑fi(mi-X)2/(∑fi-1) • What did we do? • Substitute m for X and the sum of fi for n. How get Standard Deviation? • Just take square root of the variance. Other Measures • Mode very easy, just the class with highest frequency. • Range, just highest limit minus lowest limit • Median and quartiles, I think they’re not in your book for grouped data. If we don’t come across it, will omit. Calculating from Frequency Distributions • Go to Course Outline on Internet. • Click on Calculations from Grouped. End of Class 3 • Hope we get this far. • Check assignments if we do not get this far in Class 3. • You only have to do the topics covered in class. Get a Data Set • Later we will use the cd from back of book • For now, go to our course outline, lect 3, data set 1, death rates. Click & open. • Save onto your computer hard drive. Put it into a place you can find again. • At home, do this also. • In fact, save under two names: one call death_crude”, other call “death_freqdist” Making the Frequency Distribution • Make another copy of the data. Highlight both columns & pull the copy square. • Insert a column between the raw data and the new data. • Make an array. Highlight the column of data. Click on the Sort Ascending & when it asks about the adjoining data, click to include it. Making a Frequency Table • Have to get data. Take cd from back of book. • Put in cd slot. If it doesn’t open automatically, go to My Computer & choose that drive.