Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
What are Data? Quantitative Data o Sets of measurements of objective descriptions of physical and behavioural events; susceptible to statistical analysis Qualitative data o Descriptive, views, actions and activities, non-verbal behaviour and interactions; susceptible to interpretation The Research Question (Randomised Controlled Trials (RCTs)) P = Population Who is the question about I = Intervention What is happening/ being done to ‘P’ C = Comparison What could be done instead of ‘I’ O = Outcome (s) What happens to ‘P’ as a result of ‘I’ The Research Question (Non RCTs) P = Population Who is the question about? I = Intervention The group with the disease / characteristic of interest C = Comparison The group without the disease / characteristic of interest O = Outcome (s) The variable we are measuring for both the ‘I’ and ‘C’ groups Descriptive Statistics Data and methods that say something about a complete population Inferential statistics Data and methods that say something about a larger population which is probably true What are we measuring Need to know what we are measuring and how it is being measured. How we measure the variables will influence the types of analysis we can carry out on our data 2 main types of variable Categorical – categories e.g. age ranges, gender, cat/dog Metric – e.g. actual values, not grouped, weight, time Levels of measurement Metric Categorical 4 types of scle for measuring variables Nominal: These are categories and lists e.g. dog, cat, mouse, yes, no Ordinal: These are ordered of ranked positions, not true numbers e.g. Educational achievements, income bandings Continuous: Values can lie anywhere within the possible range, are true numbers e.g. height, can be any point on a scale Discrete: Whole numbers, arise from counting things e.g. number of decayed missing teeth Identifying data type Can the data be out in order? No Nominal No Ordinal Yes Do the data have units? (inc. numbers of things) Yes Metric Do the data come from measuring or counting things? Measuring Counting Continuous Discrete How do you describe Data? The role of summary statistics Central tendency The typical values in a set of scores Mode – most frequently occurring category of score 1122234455556 Median – the mid-point in a set of scores 1122234455556 Mean – average score Sum of X (scores) N (number of scores) = 3.5 Summarising Date Percentage The frequency of people with a given characteristic expressed as a number out of 100. E.g. 52 people out of every 100 studied had blue eyes, can also be expressed as 52%. A percentage can also be defined as a rate per 100. Rate The frequency of people with given characteristic expressed as a number out of a total population, (usually multiples of 100). E.g. the rate of people with blue eyes can be expressed as 52 per 100, 520 per 1000, 5200 per 10,000. What can we do with our data? Prevalence Defined as the proportion of individuals with a particular disease P= total number of cases at a given time total population at that time Prevalence is measured at a particular point in time, and as such may be referred to as a point prevalence. Incidence Defined as the proportion of new cases in a population previously without disease in a specified period. I= number of new cases in a period of time Population at risk N.B. the time period involved must always be specified when presenting incidence rates. This is also referred to as the cumulative incidence. Which summary statistic should I use? Nominal – Mode /percentage Ordinal – Median Metric – Normal distribution? Yes – Mean No – Median The Normal Distribution = A lot of biological measures In a distribution of values that looks like this when plotted, the mean, median and mode are the same. Negative and Positive skews Negative = mean is less than median Positive = mean is greater then median How do you know if the data is normally distributed? To test for this you can either: o o o Plot a frequency diagram See if mean= median= mode If the standard deviation does not fit twice into the mean then it definitely isn’t normally distributed (this is a good tip when looking a research papers). Measures of Dispersion 25% of observations 25% 25% of observations 25% of observations 25% of observations 25% of observations Q1 Q2 Q3 Minimum Maximum Inter-quartile range – (IQR) (observations in ascending order) Examples of median and IQR 2 sets of data: 13, 2, 16, 1, 17 13, 14, 13, 14, 13 First sort them into numerical order 1, 2, 13, 16, 17 13, 13, 13, 14, 14 - Median is iddle value, so for both it is 13. We can calculate the poition of the lower quartile by (n+1)0.25 (n= number of values) The upper quartile is (n+1)0.75 1, 2, 13, 16, 17 LQ = 1.5 UQ= 16.5 this shows that slthough they have the same 13, 13, 13, 14, 14 LQ= 13 UQ= 14 median, they have very different ranges. - If both the median and IQR are presented we can see that the data are different The values are more dispersed in the 1st data set. Standard Deviation The standard deviation is very useful as statisticians have calculated that 68% of a normally distributed population will have observations within 1 standard deviation of the mean, approximately 95% within 2SD and approx 99% within 3 SD. However, this statistical estimation assumes a mean of 0 and a SD of 1. The obvious problem is that we rarely collect data with a mean of 0 and a SD of 1 – often the data we collect only has positive values, for example the mean assessment score in a class may be 55 with a SD of 12 and nobody achieving a mark less than 0. What statistics programmes such as spss do is convert this data so that it has a mean of 0 and a SD of 1 and generate Z scores i.e. Normalised score Measure of dispersion when mean is used as measure of central tendency Based on all the individual scores Describes how individual scores typically vary from the mean The larger the SD the more spread out the scores are about the mean