Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistical Analysis - Chapter 2 “Organizing and Analyzing Data” Fashion Institute of Technology Dr. Roderick Graham Showing Data Graphically When we collect a sample, we initially want to get a picture of how the data “looks”. We can show our “stakeholders” easily what the patterns in the data are What do we mean by “stakeholders”? Three of the ways to show data are Histograms, Frequency Polygons, and Circle Graphs Showing Data Graphically Look at the listing of numbers on p.17 This is called “ungrouped” data Sometimes it is better to “group” data into categories…this makes it easier to represent data graphically (p.18) Histograms Let’s look at the move from “ungrouped data” to the construction of a histogram in your textbook…(pp. 17 – 18) 1. Start with a survey of numbers…or “ungrouped data” 2. Decide on the categories you want to use and “group” the numbers into the categories that fit it 3. Now the data has been changed from a series of ages, to GROUPS of ages 4. We can compute statistics for both grouped and ungrouped data Histograms Let’s figure out this Histogram (taken from actual data I am using)… 1 = 18 – 24 2 = 25 – 34 3 = 35 – 44 4 = 45 – 54 5 = 55 – 64 6 – 65+ How many people are between ages 45 and 54? Frequency Polygon (Line Graph) This is a line graph representing the shape of a histogram Usually when you have “too many bars” (categories) you may want to use line graph This can be used to show trends easier than a histogram. Circle Graph These graphs are used to show what percentage (proportion) of a sample is doing what. Your textbook goes into some detail about how to create circle graphs with a protractor…lucky for us we have Excel! Below is an example from the CDC showing the percentages of how people have become infected with HIV… Key Points It is up to you (researcher) to decide what graph is most important for presenting your data. For me… 1. If am showing a small amount of categories, I use a histogram 2. If I am showing trends through time, or a large number of categories, I use a line graph 3. If I want to show percentages, I use a circle graph (this always the best way to show percentages) Our first “statistics” Remember that statistics are values that we compute from our sample of data that we have collected. We will learn two basic and important types of statistics: Measures of Central Tendency – What are the middle values for our data? Measures of Dispersion or Spread – How much diverse is our data…or how widely scattered is our data? You can compute these statistics for both grouped and ungrouped data Measures of Central Tendency (ungrouped) What if we had collected data about one measure, and we wanted to know what the middle value was for this measure? Ex. What is the middle value, in age, for those who listen to Lady Gaga? Ex. How many times do young Hispanic women report shopping at H&M? Knowing this middle, or central, value is important for describing our data. There are three measures of central tendency… Measures of Central Tendency (ungrouped) Mean (p.24) Median (p.26) This is the middle value of a set of data that has been arranged from lowest to highest Mode (p. 27) This is the mathematical average of a set of numbers The value that occurs the most in a set of data We can use income as a good way of discussing these three measures. Imagine that we wanted to know the average incomes for FIT students. Imagine that we took a random sample of incomes for FIT students. … Measures of Central Tendency (ungrouped) The sample gives these values: 5000, 6000, 30000, 110000, 15000, 6000, 17000, 13000, 12000, 11000, 8000, 6000, 15000, 6000, 11500 The Mean This is the average…. Sum of values = 271500 Total N = 15 Mean = 18100 Measures of Central Tendency (ungrouped) The sample gives these values: 5000, 6000, 30000, 110000, 15000, 6000, 17000, 13000, 12000, 11000, 8000, 6000, 15000, 6000, 11500 The Median This is the middle values: 5000, 6000, 6000, 6000, 6000, 8000, 11000, 11500, 12000, 13000, 15000, 15000, 17000, 30000, 110000 The median here is 11500 In cases where there are two middle values, we average the two. Measures of Central Tendency (ungrouped) The sample gives these values: 5000, 6000, 30000, 110000, 15000, 6000, 17000, 13000, 12000, 11000, 8000, 6000, 15000, 6000, 11500 The Mode This is the most numerous value: 5000, 6000, 6000, 6000, 6000, 8000, 11000, 11500, 12000, 13000, 15000, 15000, 17000, 30000, 110000 The Mode here is 6000. Sometimes there is no mode…or even two modes! Measures of Central Tendency (ungrouped) So given these values… 5000, 6000, 6000, 6000, 6000, 8000, 11000, 11500, 12000, 13000, 15000, 15000, 17000, 30000, 110000 …what is the best measure of central tendency for this random sample of FIT students? Mean?...18100 Median?...11500 Mode?...6000 Measures of Dispersion or Spread (ungrouped) Range (p.29) The highest value minus the lowest value…. From our last example, the range would be: 115000 – 5000 = 110000 Standard Deviation (p.29 – 35) This is the average distance your values have from the mean score. Best shown through example… Measures of Dispersion or Spread (ungrouped) Standard Deviation Let’s return to our FIT random sample… 1. 5000, 6000, 6000, 6000, 6000, 8000, 11000, 11500, 12000, 13000, 15000, 15000, 17000, 30000, 110000 3. Follow the steps on the right while we(I) calculate the standard deviation as a class on the board 2. 4. 5. Calculate the mean…which is 18100 Find the distance that each value has from the mean Square the distance Add up these distances and divide by the sample size – 1 (at this point, this number is called the variance). Then we get the square root of this number Standard Deviation X Mean (x-bar) X – x-bar (X – x-bar)2 5000 18100 -13100 17161 + E4 6000 18100 -12100 14641 + E4 6000 18100 -12100 14641 + E4 6000 18100 -12100 14641 + E4 6000 18100 -12100 14641 + E4 8000 18100 -10100 10201 + E4 11000 18100 -7100 5041 + E4 11500 18100 -6600 4356 + E4 12000 18100 -6100 3721 + E4 13000 18100 -5100 2601 + E4 15000 18100 -3100 961 + E4 15000 18100 -3100 961 + E4 17000 18100 -1100 121 + E4 30000 18100 11900 14161 + E4 110000 18100 91900 844561 + E4 Standard Deviation We sum (x – x-bar)2, and get the square root of this sum. This is the standard deviation. What is the square root of the sum? Appx. 26,219 Right now, this number means very little…but in the following chapters we will gain a better understanding of the standard deviation Measures of Central Tendency and Dispersion (Grouped Data) Remember that grouped data is a collection of data that has been placed into categories… Thus we need to calculate the mean and standard deviation differently, but the idea is the same. P. 36 – 39 show the formulas for these measures. Calculating the Mean for Grouped Data Let’s say we conducted a random sample of FIT students, and asked them their GPA. We decided to group GPA into categories. Here is the data below: GPA Category Number of Students 3.5 – 4.0 15 3.0 – 3.49 25 2.0 – 2.9 50 Below 2.0 11 So…what is the mean? Look at pages 36 – 38 and I will wait for someone to tell me how to go about answering this question? Calculating the Mean for Grouped Data X = the average of the categories f = number of students So can someone answer this question on the board (with help from classmates)? GPA Category Number of Students 3.5 – 4.0 15 3.0 – 3.49 25 2.0 – 2.9 50 Below 2.0 11 GPA Category X Number of Students (f) 3.5 – 4.0 3.75 15 3.0 – 3.49 3.245 25 2.0 – 2.9 2.45 50 Below 2.0 (0 – 1.9) .95 11 Calculating the Standard Deviation of Grouped Data Now let’s calculate the standard deviation for this same set of data… GPA Category Number of Students 3.5 – 4.0 15 3.0 – 3.49 25 2.0 – 2.9 50 Below 2.0 11 Who can do this one on the board? Writing Research Reports (pp. 48 – 50) Background Statement (5 pts) I will give you data…use your imagination Why was the study performed (why was the data collected)? Design and Procedures of the Study (10 pts) How did you conduct the study How was the study internally valid/externally valid These two sections are not the most important…simply use your imagination to complete these two sections Writing Research Reports (pp. 48 – 50) Results (55 pts.) Analysis and Discussion (10 pts.) The most important section. For this first report, this is where you present your data graphically, show measures of dispersion, and central tendency What is interesting to you about the results? Conclusions and Recommendations (20 pts.) (this section you will not do for your report…this is where you present your results and analysis to the class. The class can ask you questions, so be on point!) END