Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Practice & Communication of Science Measurement What is Measurement? Assigning comparative labels to things to help explain their relationships… sounds a bit abstract… …but things/relationships is all there is… …and that’s all that science is about! so measurement is rather central to science Measurements are typically, but not exclusively, numerical not all types of measurement are equivalent four different levels of measurement nominal, ordinal, interval and ratio Nominal Scales The ‘lowest’ level of measurement nominal implies ‘names’ Just labels to stick things into categories and separate them No implicit order eg yes/no shirt numbers of footballers 1 – goalie, 10 – striker (but not ‘10x better’!) can serve to separate and provide some info if refer to #10, it’s likely to be about a striker not a goalie blue, yellow, red, green etc no implicit order (though underlying spectrum has) Ordinal Scales The measurements can be ‘ordered’ (ranked) the order of finishers in a race (first, second, etc) but time between each can vary dramatically equal gaps not implied the Likert scale (1 5) agree strongly, agree, neutral, disagree, disagree strongly again, the ‘gaps’ between each are not equal agree - neutral doesn’t ‘equal’ neutral - disagree Interval Scales The ‘gaps’ (intervals) between units of measurement are equal the Centigrade scale the temp difference between 20 and 30 C is the same as between 10 and 20 C there is no absolute reference point 0 C is arbitrary though they might not ‘feel’ that way! water’s freezing point used to define the baseline Much more common in science Ratio Scales An interval scale that has an absolute reference point the Kelvin temperature scale for our everyday lives, time is a ratio scale 0 K is -273.16 C the reference point is absolute absolute zero (0 K) is, well, absolute! zero time is absolute like interval scales, ratio scales common in science These measurement scales are important as they determine the types of datahandling/statistics that can be performed Summaries of Data Scientists seldom take single measurements need repeated measurements to… minimise error permit extrapolation to the general case eg my eyes are blue 32 out of 100 subjects studied had blue eyes 32% of the general population have blue eyes Data is plural (datum is singular) Could just report all measurements… contains unadulterated ‘info’ about what you did but doesn’t carry a ‘message’ about the findings can’t see the wood for the trees Summaries of Data Here is a set of ordered data… Mode the central value in the ordered data (19) Mean the most common value(s) of a list of data (18) Median 17, 18, 18, 18, 19, 19, 20 21, 21 sum of values/sample size (171 / 9 = 19) Range (or maybe Maximum and Minimum) highest minus lowest (21 – 17 = 4) starts to indicate variability, but biased by extremes Indicating Variability This is an important aspect of measurement Need a way to summarise data both in terms of ‘central tendency’ and ‘spread’ 17, 18, 18, 18, 19, 19, 20 21, 21 and 18, 19, 19, 19, 19, 19, 19, 19, 20 and 19, 19, 19, 19, 19, 19, 19, 19, 19 all have the same mean mean and standard deviation median and quartiles Measures of variation covered in detail elsewhere Summary Measurements are labels assigned to things to explain relationships Four levels… Nominal – names; no inherent order Ordinal – ordered; ‘gaps’ not equal Interval – ordered; ‘gaps’ are equal Ratio – ordered; equal gaps; absolute ref point Summaries of data needed to ease interpretation – eg mode, mean, median, range Need indicators of ‘spread’ as well as ‘centre’ eg range, max, min, standard deviation Practice & Communication of Science Probability What is Probability? We cannot know everything about everything So uncertainty is a central feature of science uncertainty in observations/measurements uncertainty in explanations uncertainty in predictions Probability is a way of quantifying (un)certainty we cannot measure everything our measurements are prone to error scale of 0 1 (or 0% 100%) Probability reflects random influences ‘randomness’ reflects our lack of knowledge Randomness Predictable Rules Individual outcomes cannot be predicted, but repeated runs are very predictable eg individual coin-toss H or T ‘infinite’ repeats 50:50 H:T (if fair) Modelling a system in terms of probabilities can be done through observation or from theory Throwing dice theory Red vs Blue in sport observation (Hill & Barton) Frequencies and Probabilities Red and Blue football teams in 140 matches, Red won 60 and drew 30 relative frequency = 60/140 probability of red winning (in the future) is also 60/140 = 0.429 For throwing a die relative frequency of getting a ‘3’ is 1/6 probability is also 1/6 = 0.167 Combining Probabilities For independent events, eg probability of throwing a five and then a two multiply the individual probabilities P(A and B) = P(A) x P(B) eg 1/6 x 1/6 = 1/36 = 0.028 For incompatible events, eg probability of throwing a five or a two add the individual probabilities P(A or B) = P(A) + P(B) eg 1/6 + 1/6 = 1/3 = 0.333 Probability and common sense In playing the lottery, which choice of numbers is more likely to win? I won the lottery last week; the chances of me winning the lottery this week are… 3, 5, 15, 27, 29, 44 1, 2, 3, 4, 5, 6 less, the same, greater? I won the lottery last week; the chances of me winning the lottery twice in a row are… less, the same, greater? Probability and common sense What are the chances of two people on a football field sharing the same birthday? For 23 people, prob of not sharing b’day with previously considered people is… 1%, 11%, 21%, 31%, 41%, 51% person 1 : 365/365 person 2 : 364/365 … person 23 (includes ref!) : 343/365 Multiply them all together 0.493 1 – 0.493 = 0.507 = 51% Probability and common sense Linda is thirty-one, single, outspoken and very bright She studied political science at Uni; she was concerned with discrimination and social justice, and took part in CND demonstrations Which of the following statements about Linda is more likely? Linda works as an estate agent Linda works as an estate agent and is active in the feminist movement Probability is context-sensitive In tossing a coin ten times, which sequence is most probable? HTTHHTHTHT HHHHHHHHHH In coin-tossing, which sequence is more probable? A mix of heads and tails 10 heads in a row 1 in 1024 Probability is context-sensitive Derren Brown can toss a coin heads 10x in a row Incredible motor control over ‘random’ variables? Probability can be counter-intuitive Flip a coin three times to get HH or HT Are the two outcomes equally probable? HHH HH first HHT HH first HTH TH first HTT TTT TTH TH first THT TH first THH TH first Probability can be counter-intuitive Defendant’s DNA match was 1 in 1 billion Lab’s false positive error rate (not disclosed) is 1% What is the probability of the defendant being falsely convicted on that evidence? 1 1 1 1 in 1 billion billion x 1% = 1 in 10 million billion x 99% = 1 in 990,000,000 in 100.000000001 Probability can be counter-intuitive The Monty Hall conundrum You are on a game show You have a choice of three doors Behind one is a car, behind other two are goats You choose a door The host (who knows where the goats are) opens one to show you a goat Should you now change the door you have chosen? Probability can be counter-intuitive A mother gives birth to twins (not identical) What is the chance they will be of different sexes? 25% 33% 50% A mother gives birth to twins (not identical) One is a girl What is the chance that they will both be girls? 25% 33% 50% Conditional Probability Previous questions not conditional/conditional… What is the chance that they will both be girls? Cond - A mum gives birth to twins (not identical). What is the chance that they will both be girls if one is a girl? The if clause makes all the difference Not con - A mum gives birth to twins (not identical). It provides extra information that alters the odds The influence of additional info on odds was developed by Thomas Bayes (b 1702) Bayesian odds Prior prob (1 in 4) and posterior prob (1 in 3) Three variations on a theme… A family has two children; what are the chances that both children are girls? A family has two children; what are the chances that both children are girls if one is a girl? 1 in 4, 1 in 3, 1 in 2? 1 in 4, 1 in 3, 1 in 2? A family has two children; what are the chances that both children are girls, if one is a girl named Florida? 1 in 4, 1 in 3, 1 in 2? Three variations on a theme… 2 children; P that both children are girls? 2 children; P both girls if one is a girl? GB, BG, BB, GG 1 in 4 GB, BG, BB, GG 1 in 3 2 children; P both girls if 1 girl named Florida? BB,BGF,BGNF,GFB,GNFB,GFGNF,GNFGF,GNFGNF,GFGF BB,BGF,BGNF,GFB,GNFB,GFGNF,GNFGF,GNFGNF,GFGF 1 in 2 Conspiracy Theories and Probability Are these equivalent? 1) The P of a series of events happening if due to a huge conspiracy 2) The P of a huge conspiracy existing if a series of events happened P of 1) > P of 2) a ‘single’ explanation vs ‘many’ other explanations think 9/11, moon landings, paranoia, confabulation Bayes’ theory supports this noncorrespondence Conditional Probability and Testing A test for disease ‘X’ comes back positive But. Is… the chance of not having the disease if I tested positive …the same as… And the false-positive rate is low at only 1 in 1000 Only a 0.1% chance of not having the disease?! the chance of testing positive if I didn’t have the disease? No. Think of the ‘sample space’ ‘categories’ of people tested The Test’s Sample Space 1) tested +ve and have ‘X’ (true positives) 2) tested +ve put don’t have ‘X’ (false positive) 3) tested –ve and don’t have ‘X’ (true negative) 4) tested –ve and have ‘X’ (false negatives) Reported false positive rate is 1 in 1000 Incidence rate: say 1 in 10,000 tested have the disease (and false neg effectively 0) incidence rate not usually mentioned/considered For 10,000 tested, there will be 10 false positives and only 1 true positive so 10/11 chance of not having the disease! Summary Probability estimates the odds of future events based on theory or observation Probability cannot predict an individual event Probability can predict pattern of events Probability, P, 0 1 or 0% 100% Probability is often not ‘intuitive’, it fools us Combining probabilities Independent events: p(A and B) = p(A) x p(B) Incompatible events: p(A or B) = p(A) + p(B) Conditional probabilities (prob of A if B) Baysian probability Prior probability + extra info posterior probabilities prob of A if B often different to prob of B if A