Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Question What are data and what do they mean to a scientist? Dinner at the Urquhart House Brought to you by the Briggs Multiracial Alliance Sunday night All food provided (probably Chinese) Contact Mimi Reddy, [email protected] for details Data, Statistics, and Spreadsheets What are data? What are statistics? What are spreadsheets? How can you analyze data with spreadsheets? Data Data are pieces of information Data can be numbers, words, descriptions Data have UNITS The word data is PLURAL, datum is singular Data about Willoughby: • • • • • • Age: 5 (years) Height: 47 (inches) Weight: 66 (pounds) Eyes: Blue Favorite word: Wrestle Favorite letter: W Types of Data Numbers – two types – Real #s – rational numbers – 28.75 lbs – Integers – whole numbers – 18 months Letters – called characters in programming – W is a character Words – called strings in programming – “No thanks” is a strings, can be individual words or phrases Statistics and Data Test Scores: – Jeff: 88 – Mollie: 92 – Marcie: 88 – Dave: 47 – Karim: 99 – Willoughby: 42 – Benjamin: 0 What statistics can you calculate to describe these data? – Try to think of four things to describe the data stop Statistics Statistics are derived from the data Statistics are descriptions of data Statistics are meant to simplify the data Statistics can be misleading Typical Statistics Sample Size - number of individuals measured = n Sum = S Average or Mean = S/n Median – Value of 50th percentile, half of values fall above, half below Maximum, Minimum, Range (Max-Min) Mode - most common value Standard deviation 2 Variance (SD ) Analyze these data... Mean, max, min, sample size (n) range, median, mode • • • • • • • • • 18 33 4 47 49 38 29 4 55 Sum S mean=average=S/n • denoted x median = halfway mode = most common Spreadsheets Spreadsheets are tables Rainforest Dry Forest Total CostaRica Nicaragua 625,000 3,712,000 50,000 300,000 675,000 4,012,000 Spreadsheets allow calculations and manipulations of data • Calculations: mean, standard deviation • Manipulations: sort, Make a data table: Fly 1, length 13.4 mm, velocity 27 Kph, age 21 days Fly 2, length 9.4 mm, velocity 0 Kph, age 220 days Fly 3, length 9.3 mm, velocity 44 Kph, age 1 days Fly 4, length 13.4 mm, velocity 17 Kph, age 32 days Fly 5, length 17.4 mm, velocity 33 Kph, age 11 days How many columns? How many rows? #s go down or across? Data Table Fly # 1 2 3 4 5 Length Velocity Age Microsoft Excel Typical spreadsheet program – Lotus 1-2-3 is original commercial spreadsheet Has similar controls to MS Word Now allows graphing (charts) • very restricted formats, hard to get exactly what you want Excel tables and graphs can be copied into MS Word Friday’s Assignment We will work with Microsoft Excel to analyze some data Groups of two will submit one finished spreadsheet for the assignment Graphs Many different types of graphs – Points – Lines – Bars – Pies Point Graphs Called X-Y Scatter in MS Excel Plot points based on X and Y value Can fit a “REGRESSION LINE” to the data – Line that best fits the data X-Y Scatter Bar Graphs Categorize data into counts or percents Categories can be descriptive categories (Windows 98, Windows 2000, …) Can also be numeric categories – Height: 60-63, 63-66, etc. or just 61, 62, 63… – Count up number of people in each group Histograms are a particular type of bar graph Bar Graph Starting Salary $50,000 $40,000 $30,000 Starting Salary $20,000 $10,000 $0 1988 1989 1990 1991 1992 1993 1994 Histogram X axis is categories Y axis is a number or proportion of observations in that category Number of Crashes Histogram Bar Graph Regular Bar Graph vs. Histogram Bar Graph Starting Salary $50,000 $40,000 $30,000 Starting Salary $20,000 $10,000 $0 1988 1989 1990 1991 1992 1993 1994 Distributions Special type of histogram with continuous numeric scale at bottom Normal distribution is a key concept in statistics Skewed distribution is one that is unbalanced Sample distribution histograms Danyoungyoo, Katanchalee, and Srichawla, www.s-t.au.ac.th/handout/st2204/week5-Univariate-Des.ppt Robert D. Duval, PS 400 Lecture, www.polsci.wvu.edu/duval/ps400/Notes/400Notes.ppt The NORMAL Distribution A NORMAL DISTRIBUTION is the theoretical distribution of values given natural variation around a MEAN It is balanced, humped distribution Distributions Skew is an imbalance in the distribution Danyoungyoo, Katanchalee, and Srichawla, www.s-t.au.ac.th/handout/st2204/week5-Univariate-Des.ppt Hypothesis Testing Statistical Tests are how scientists decide if data support their hypothesis (NOT PROVE their hypothesis) Four major statistical tests: T-test, X2 Test, Regression, ANOVA Hypothesis Processor speed has an effect on the performance of the computer. Null Hypothesis – H0: Processor speed has NO EFFECT on the performance of a computer. Statistical Tests and Probability Statistical tests give a value That value can be related to a probability Probability is likelihood that NULL hypothesis is correct given the data you have If P < 0.05 (1/20), then you conclude NULL hypothesis is FALSE T-Test Compares differences between two means Formula: T = (x1-x2)/SEM – SEM is Standard Error of Mean [SD/(N-1)] T Values: Difference between mean in comparison to the amount of spread in your data T-Values If T > 2.5 or 3.0, difference is usually significant (this depends on your sample sizes)