Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Experiments: Method and Methodology Mícheál Ó Foghlú Executive Director Research TSSG, WIT [email protected] March 2009 Revised Schedule Mon 12th Jan Wed 14th Jan Wed 21st Jan Wed 28th Jan Wed 4th Feb Wed 11th Feb Wed 25th Feb Wed 4th Mar Wed 11th Mar Wed 18th Mar Wed 25th Mar Wed 1st Apr Wed 22nd Apr Sessions 01-05 to be delivered by Mícheál Ó Foghlú Thomas Magedanz - Guest Lecture on IMS [DONE] Presentations [DONE] Presentations [DONE] IPv6 Summit (Dublin Castle) [DONE] EMPTY Session 01 [DONE] EMPTY Session 02 [DONE] Session 03 [DONE] EMPTY EMPTY Session 04 [Today] Session 05 Copyright © Mícheál Ó Foghlú 2009 Schedule Detail 01 What is research? – Philosophy, Epistemology, Methodology and Method 02 How to write academically? – Some simple language rules – Some simple structure rules 03 What’s the big deal with plagiarism? – Bibliographies, references and citations, … – Doing it in Word – Doing it with other tools like LaTeX/BibTeX 04 Results - how to do experiments – Support tools: simulation, data analysis, … 05 Discussion Copyright © Mícheál Ó Foghlú 2009 Structure Experimental Design (basics) Statistical Analysis (basics) Copyright © Mícheál Ó Foghlú 2009 Experimental Design How to conduct a valid experiment. http://www.slideshare.net/mrmularella/experimental-design A Good Experiment Tests one variable at a time. If more than one thing is tested at a time, it won’t be clear which variable caused the end result. Must be fair and unbiased. This means that the experimenter must not allow his or her opinions to influence the experiment. Does not allow any outside factors to affect the outcome of the experiment. Copyright © Mícheál Ó Foghlú 2009 A Good Experiment Is valid. The experimental procedure must test your hypothesis to see if it is correct. If the procedure does not test your hypothesis, the experiment is not valid and the data will make no sense! Has repeated trials. Repeating the trials in the experiment will reduce the effect of experimental errors and give a more accurate conclusion. Copyright © Mícheál Ó Foghlú 2009 Variables A variable is anything in an experiment that can change or vary. It is any factor that can have an effect on the outcome of the experiment. There are three main types of variables. Copyright © Mícheál Ó Foghlú 2009 3 Kinds of Variables Independent Variable (IV) – something that is intentionally changed by the scientist – – – – What is tested What is manipulated Also called a “Manipulated Variable” You can only change ONE variable in an experiment!!! Copyright © Mícheál Ó Foghlú 2009 3 Kinds of Variables Independent Variable (IV) To determine the independent variable, ask yourself: “What is being changed?” Finish this sentence… “I will change the _____________” Copyright © Mícheál Ó Foghlú 2009 Independent Variable Levels of the IV These are different ways you will change the independent variable Example: Assume you are testing five brands of popcorn to see which has the most unpopped kernels. The IV would be the different brands of popcorn. The five different brands would be the different levels of the IV. Copyright © Mícheál Ó Foghlú 2009 3 Kinds of Variables Dependent Variable (DV) – something that might be affected by the change in the independent variable – What is observed and measured – The data collected during the investigation – Also called a “Responding Variable” Copyright © Mícheál Ó Foghlú 2009 3 Kinds of Variables Dependent Variable (DV) To determine the dependent variable, ask yourself: “What will I measure and observe?” Finish this sentence… “I will measure and observe ________________” Copyright © Mícheál Ó Foghlú 2009 Dependent Variable Operational Definition: Define exactly how the dependent variable will be measured. Example: Assume your DV in an experiment is “plant growth.” How will you measure this?! It could be… Height (cm), mass (g), # of leaves, etc. Be specific and include all necessary units! Copyright © Mícheál Ó Foghlú 2009 3 Kinds of Variables Controlled Variable (CV) – a variable that is not changed and kept the same – Also called constants – Allows for a “fair test” – NOT the same as a “control”!! – Any given experiment will have many controlled variables Copyright © Mícheál Ó Foghlú 2009 3 Kinds of Variables Controlled Variable (CV) To determine the controlled variables, ask yourself: “What should not be allowed to change?” Finish this sentence… “I will not allow the ______________ to change.” Copyright © Mícheál Ó Foghlú 2009 Control A group or individual in the experiment that is not tested, but is used for comparison as a reference for what “normal” would be like. Not all experiments have a control (though all experiments have controlled variables). Example: If you tested different pollutants to see their affect on plant growth, the control would only receive water. Copyright © Mícheál Ó Foghlú 2009 Example Students of different ages were given the same jigsaw puzzle to put together. They were timed to see how long it took to finish the puzzle. Copyright © Mícheál Ó Foghlú 2009 Identify the variables in this investigation! Copyright © Mícheál Ó Foghlú 2009 What was the independent variable? Ages of the students – Different ages were tested by the scientist Copyright © Mícheál Ó Foghlú 2009 What was the dependent variable? The time it to put the puzzle together – The time was observed and measured by the scientist Copyright © Mícheál Ó Foghlú 2009 What was a controlled variable? Same puzzle – All of the participants were tested with the same puzzle. – It would not have been a fair test if some had an easy 30 piece puzzle and some had a harder 500 piece puzzle. Copyright © Mícheál Ó Foghlú 2009 Another Example: An investigation was done with an electromagnetic system made from a battery and wire wrapped around a nail. Different sizes of nails were used. The number of paper clips the electromagnet could pick up was measured. Copyright © Mícheál Ó Foghlú 2009 What are the variables in this investigation? Copyright © Mícheál Ó Foghlú 2009 Independent variable: Sizes of nails – These were changed by the scientist. – They used different sizes of nails in their experiment to see what effect that would have. Copyright © Mícheál Ó Foghlú 2009 Dependent variable: Number of paper clips picked up – The number of paper clips were observed and counted (measured) Copyright © Mícheál Ó Foghlú 2009 Controlled variables: Battery, wire, type of nail – None of these items were changed – They had used the same battery, same wire, and same type of nail. – Changing any of these things would have made it an unfair test. Copyright © Mícheál Ó Foghlú 2009 Here’s another: The temperature of water was measured at different depths of a pond. Copyright © Mícheál Ó Foghlú 2009 Independent variable – depth of the water Dependent variable – temperature Controlled variables – same pond; same thermometer Copyright © Mícheál Ó Foghlú 2009 Last one: Students modified paper airplanes by cutting pieces off, adding tape, or adding paper clips to increase the distance thrown. Copyright © Mícheál Ó Foghlú 2009 Independent variable – weight of plane, center of gravity, air resistance (depended on student choice-but only one was tested) Dependent variable – distance thrown Controlled variables – same plane design; same paper; same throwing technique Copyright © Mícheál Ó Foghlú 2009 Now let’s take what we know about these variables and use them in an experiment! Copyright © Mícheál Ó Foghlú 2009 We are going to test how many drops of water will fit on different sized coins. Let’s think about how we could test this. – Identify the variables – What exactly will be changed? How will it be changed? – What exactly will be measured? How will it be measured? Copyright © Mícheál Ó Foghlú 2009 What are my variables? Independent variable – size of the coin (penny, nickel, dime, quarter) Dependent variable – amount of water held on coin (# of drops) Controlled variables – – – – Same eye dropper Same water Same side of coin (pick heads or tails) Same technique (height/angle of dropper) Copyright © Mícheál Ó Foghlú 2009 Statistical Analysis http://www.slideshare.net/sababutt/statistical-analysis-of-datafinal-presentation Copyright © Mícheál Ó Foghlú 2009 SIGNIFICANCE OF STATISTICS FOR ANALYSIS AND RESEARCH Copyright © Mícheál Ó Foghlú 2009 STATISTICS IS NECESSARY FOR ALL FIELDS OF LIFE REQUIRING RESEARCH AND DATA ANALYSIS In all fields of life we have to analyze facts and interpret from these to make conclusions. The analysis needs statistics – to compare the qualities and quantities to help reach some conclusion, which will lead to decision making in business, government, industry etc and development of theories in science. Copyright © Mícheál Ó Foghlú 2009 BIOSTATISTICS IS A DISCIPLINE THAT IS CONCERNED WITH: designing experiments and other data collection, summarizing information to aid understanding, drawing conclusions from data, and estimating the present or predicting the future. In making predictions, Statistics uses the companion subject of Probability, which models chance mathematically and enables calculations of chance in complicated cases. Copyright © Mícheál Ó Foghlú 2009 SOME IMPORTANT DEFINITIONS Copyright © Mícheál Ó Foghlú 2009 POPULATION AND SAMPLE POPULATION: A population consists of an entire set of objects, observations, or scores that have something in common. For example, a population might be defined as all males between the ages of 15 and 18. SAMPLE: A sample is a subset of a Population Since it is usually impractical to test every member of a population, a sample from the population is typically the best approach available. Copyright © Mícheál Ó Foghlú 2009 PARAMETER AND STATISTIC PARAMETER: A parameter is a numerical quantity measuring some aspect of a population of scores. For example, the mean is a measure of central tendency in a population. STATISTIC: A "statistic" is defined as a numerical quantity (such as the mean calculated in a sample). Copyright © Mícheál Ó Foghlú 2009 MEASURES OF CENTRAL TENDENCY Mean (Arithmetic Mean) Average value of a sample or population Median Middle value of sample or population Mode The value repeated most Copyright © Mícheál Ó Foghlú 2009 The Arithmetic Mean or Mean is what is commonly called the average: When the word "mean" is used without a modifier, it can be assumed that it refers to the arithmetic mean. The mean is the sum of all the scores divided by the number of scores. Formula of calculating Population Mean is: μ = ΣX/N, where μ = population mean, and N = number of scores. If the scores are from a sample, then the symbol X refers to the mean and n refers to the sample size, formula written as: X = ΣX/n Copyright © Mícheál Ó Foghlú 2009 Median: The median is the middle of a distribution: half the scores are above the median and half are below the median. The median is less sensitive to extreme scores than the mean and this makes it a better measure than the mean for highly skewed distributions. 5 3 4 2.5 6 Mode: The mode is the most frequently occurring score in a distribution and is used as a measure of central tendency. The advantage of the mode as a measure of central tendency is that its meaning is obvious. 5 3 4 5 6 Copyright © Mícheál Ó Foghlú 2009 MEASURES OF DISPERSION After measuring the central value i.e., mean, next is to know that to which extent this central value represents all values, that is, to know the scattering or dispersion of the data. There are certain measures which gives values of dispersion. The most important and widely used of these in research are: Variance Standard Deviation Standard Error of Mean Copyright © Mícheál Ó Foghlú 2009 HYPOTHESIS TESTING T test F test ANOVA Correlation Regression Copyright © Mícheál Ó Foghlú 2009 EXAMPLE OF DATA ANALYSIS Comparison of Weight to Height Ratio expressed by Body Mass Index of a population. BMI is calculated as weight in Kg / Height in Meter2. General surveys in USA and Europe showed that young population is overweight which is enhancing chances of diseases. We surveyed young female population of Punjab University for BMI. We measured BMI of 400 students randomly. Copyright © Mícheál Ó Foghlú 2009 Subject No. M-1 M-2 M-3 M-4 M-5 M-6 M-7 M-8 M-9 M-10 M-11 M-12 M-13 M-14 M-15 M-16 M-17 M-18 M-19 M-20 M-21 BMI 36.66 20.21 30.29 29.33 31.97 27.58 25.33 26.90 27.74 27.01 26.82 22.65 31.90 30.81 20.84 25.19 22.98 28.68 22.73 22.86 27.73 Subject No. F-1 F-2 F-3 F-4 F-5 F-6 F-7 F-8 F-9 F-10 F-11 F-12 F-13 F-14 F-15 F-16 F-17 F-18 F-19 F-20 F-21 BMI 30.11 28.00 16.87 38.94 35.63 32.69 23.92 25.55 30.87 43.43 35.34 19.65 36.45 34.35 34.15 38.86 26.28 29.52 24.99 29.75 34.58 Copyright © Mícheál Ó Foghlú 2009 ARITHMETIC MEAN We have two tables of data: one giving BMI of girls, other BMI of boys. These are long data tables. Now, we have to analyze it to conclude something from this data . What we need, now? We need a measure of central tendency to indicate average BMI to compare with other populations, between boys and girls and with the normal range. The most common and useful measure for the purpose is the Arithmetic Mean. Arithmetic Mean is calculated by taking sum of all values and dividing it by No. of observations. Copyright © Mícheál Ó Foghlú 2009 SAMPLING ERROR Then next, we have an average value but is this average representative of all values really. Is it possible that some values be very large and some very small? If it is so, the Mean is not representative of whole data. This is called sampling error because some students may have strong genetic tendency to being overweight, these values are somewhat different from population. This will make our result erroneous, i.e., our Mean does not represent all data. Copyright © Mícheál Ó Foghlú 2009 EXAMPLE We have four values - 2, 3, 4, 10 Mean = Sum of values / No of Observations 2 + 3 + 4 + 10 / 4 = 4.75 This is far from three values in the data. This is because of a large value that exists in the data i.e. 10. Copyright © Mícheál Ó Foghlú 2009 STANDARD DEVIATION Now, we need some statistical measure that tell us how to rule out sampling error. This is the standard deviation – measure to find how the individual values vary from the average value, i.e., Mean. Copyright © Mícheál Ó Foghlú 2009 Standard Deviation of that Data SD = s = ∑ (x – x) 2 n-1 Descriptive Statistics from MINITAB Variable C1 N 4 Mean Median 4.75 3.50 StDev 3.59 SE Mean 1.80 Copyright © Mícheál Ó Foghlú 2009 T Test Two Sample T-Test and Confidence Interval Two sample T for BMI-F vs BMI-M N Mean StDev Mean BMI-F 30 31.35 6.26 1.1 BMI-M 21 26.96 4.11 0.90 SE 95% CI for mu BMI-F - mu BMI-M: ( 1.5, 7.31) T-Test mu BMI-F = mu BMI-M (vs not =): T= 3.02 P=0.0040 DF= 48 Copyright © Mícheál Ó Foghlú 2009 Other Issues Covered – Basics of experimental design – Basics of statistical analysis Not covered - experimental design – Block structured design (e.g. Latin Squares) – Understanding experimental errors Not covered - statistical analysis – Understanding the T Test and the large battery of other tests (e.g. ANOVA) – Assumptions of tests (e.g. that observations are normally distributed) and when it is invalid to use a test – Discussion of significance So this talk just scratched the surface! Copyright © Mícheál Ó Foghlú 2009