Download Introduction to Statistics

Central Tendency Harry R. Erwin, PhD School of Computing and Technology University of Sunderland Resources • Crawley, MJ (2005) Statistics: An Introduction Using R. Wiley. • Gonick, L., and Woollcott Smith (1993) A Cartoon Guide to Statistics. HarperResource (for fun). Clustering • Most data cluster around an intermediate value. • If the data values you measure are actually a sum of multiple independent random variables, you can prove this is the case. • This is known as the Central Limit Theorem: the sum of a large number of independent random variables has a normal (bell-shaped) distribution. • In particular, this is why estimates of the mean (or ‘average’) are distributed normally. This will be the case in repeated experiments. Example: Normal Distribution Other Measures of Clustering • The median is the middle value of a sample or a distribution. • The mode is the most frequent value in a sample or a distribution. • These can be convenient to use, especially if the data are not normally distributed. Application to Experimental Design • One way you to disprove a null hypothesis: – show the mean (average) value of your experimental data is far enough different from the mean value implied by the null hypothesis that its chance of occurring is very small. – You first need to show that your data are normally distributed to be able to estimate this chance. To Check the Data are Normal • yvals<-read.table("c:\\wherever\\yvalues.txt", header = T) • attach(yvals) • hist(y) • qqnorm(y) • qqline(y,lty=2) What it Looks Like Normal Data • • • • y<-rnorm(1000) hist(y) qqnorm(y) qqline(y,lty=2) Appearance of Normal Data Non-Normal Data • • • • y<-seq(0,1.0,0.001) hist(y) qqnorm(y) qqline(y,lty=2) Appearance of Non-Normal Data Geometric Mean • This is used when the data are generated as the product rather than the sum of independent random variables. An example might be a series of risks, each being the product of a rate, a probability of success, and an estimate of the consequences. • The geometric mean is calculated as (∏yi)1/n • Where there are n elements being averaged over. • In R, you calculate this as exp(mean(log(data))) Harmonic Mean • If your concern is not the absolute value of the random variables, but rather their ratios, the mean of interest is the harmonic mean. An example might be current population relative to the ‘carrying capacity’ of a region. • This is the ‘reciprocal of the average of the reciprocals’. • To calculate this in R, use 1/mean(1/data)) R Demonstrations of all this… • From the book.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Introduction to Statistics