Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Transcript

Data Analysis Does the data support the hypothesis? • Once the appropriate inquiry method has been selected and the data gathered. The last and most important step is to determine what the data is telling you. • They key method of data analysis is statistics. Data analysis • Statistics – the use of mathematics to organize, summarize, and interpret numerical data. There are two kinds: • Descriptive statistics – used to organize and summarize data. • Inferential statistics – used to interpret data and draw conclusions. Descriptive statistics • Statistics can be used to organize data so it clearly describes what has occurred. • There are three main descriptive statistics: – Central tendency – Variability – Coefficient of correlation Central tendency • Central tendency – patterns of frequency, predictability, or typical results in a set of numerical data. There are three measures… • Mode • Median • Mean Mode – the most frequent score in a distribution. • The mode is not affected by extreme scores • The mode is a quick method of describing central tendency however it is not particularly useful or descriptive of the data • What is the mode for the following set of scores? • 32 32 35 36 38 38 39 39 39 40 40 42 45 On a frequency graph, the mode would be the highest point. If the two most frequently occurring scores occur the same number of times then the distribution is bimodal. • Eg: • 32 32 32 36 37 38 38 39 39 39 40 40 42 45 Median – is the score that falls exactly in the centre of a distribution of scores. • It is the score value that cuts the distribution set (n) in half, half the scores fall above and half the scores below. • Rank order the data set from lowest to highest • If there is an odd number of values (n) then the median is simply the middle number 32 32 35 36 36 37 38 38 39 39 39 40 40 45 46 If there is an even number of values (n) then the midpoint can be found by adding the two middle values together and dividing by 2 32 35 36 36 37 38 38 39 39 39 40 40 42 45 Mean – the arithmetic average of the scores in a distribution. • The sum of the scores divided by the number of scores. • Sensitive to extreme scores • Most commonly used way of describing a data set. • Mean is not necessarily the average – mode and median can also be called the average. • As is commonly known, KIWI-birds are native to New Zealand. They are born exactly one foot tall and grow in one foot intervals. That is, one moment they are one foot tall and the next they are two feet tall. They are also very rare. An investigator goes to New Zealand and finds four birds. The mean of the four birds is 4, the median is 3, and the mode is 2. What are the heights of the four birds? Skew • When the mean, median and mode fall at the same point in the distribution, then the results are symmetrical. • If the results are asymmetrical, then the distribution is said to be skewed. The mean will be pulled away from the median and the mode by the extreme scores • Positive skew – asymmetrical distribution pointing to the positive direction • Negative skew – asymmetrical distribution pointing in the negative direction Variability • Is what you have recorded accurate? Does it support your hypothesis? • Variability – how much the scores in a data set vary from each other and from the mean. It is the dispersion or spread, of scores Standard Deviation • Standard deviation – an index of the amount of variability in a data set. • When the standard deviation is large, the variability is great, if it is small ,then the variability is small as well. Correlation • Is there a relationship between two variables? If there is, how reliable is it and can I make certain predictions with this data? • Correlation – when two variables are related to each other. They can be positive in the same direction, or negative in the opposite direction, and weak or strong. • Scatterplot – data table with one variable represented on the the X axis and the independent variable on the Y axis. Individual results are plotted to observe a correlation. • Positive correlation – a high value of variable X means a high value of Y. • Negative correlation – a high value of X means a low value of Y. • Correlation coefficient – a numerical index of the degree of relationship between two variables. It indicates which direction (positive or negative) the relationship works and how strongly (.0 to 1.0) the two variables are related. Causal fallacy • Causation – where one variable causes a change in another. Even though two variables are strongly correlated, we do not know HOW they are related, they may not have a causal relationship. Testing the hypothesis • Null hypothesis (Hₒ) – the statement of a zero (or null) difference that is statistically tested. The negative version of the original claim of a study. We use this because statistically, we test for the negative version of the hypothesis to see if we accept or reject it. • Alternative hypothesis (H1) – the statement we must accept if the statistical test of the null hypothesis is false. Error • Type I error – the mistake of rejecting the null hypothesis when it is true (symbol – α) • Type II error – the mistake of failing to reject the null hypothesis when it is false (symbol – β) • To decrease α or β we should increase the sample size • We also use this error judgement to determine our acceptable error levels (.01, .05) • We should form claims so that the most serious error is the type 1 error • The condition of equality should become the Ho Null hypothesis is true Null hypothesis is false We decide to reject the null Type 1 error Right! We decide to accept the null Right! Type 2 error Confirm the null hypothesis • • • Test statistic – a computed sample value based on the sample data that is used to make the final determination confirming or rejecting the null hypothesis Critical region – the set of all values of the test statistic that would cause us to reject the null hypothesis Critical values – the value or values that separates the critical region from the values of the test statistic that would lead us to reject the null hypothesis, this will depend on – the type of hypothesis (one or two tailed) – the sampling distribution (normal or skewed) – the level of significance (type of possible error and consequence) α t,p,z σ – determined error value Test statistic • z – calculation: • t – calculation: • The difference between both is the size of the sample or population under investigation Nested vs. Crossed designs • When the means of two groups of scores are calculated and compared for critical difference… • if different subjects in each group, then the design is nested or between subjects • If same subjects used for different levels of treatment, the design is crossed or within subjects Degrees of freedom • Based on the size of the sample • Affects the height of the distribution • Along with the type of hypothesis, used to calculate the critical region • Df = N-1 Is our experiment statically significant? • • • • What we need: N – number of subjects X – mean SD – standard deviation (or s² - variance) • Step 1 – find the standard deviation of the sample set • Step 2 – calculate the difference in variance • Step 3 – calculate the t-value • Step 4 - Using an error =.01 and a one-tailed t-test, the value of t found in the t-tables would be: Apply this formula to our experiment… • In groups, you are to compare the mean of the silence scores to the mean of one other assigned group of scores. • 6 trials, 1 silent = 5 comparisons • X2 groups for each comparison (remember inter-rater reliability) • Report your t-obs data on the white board. • The t score you calculate Chance…. • Statistical significance – is said to exist when the probability that the observed findings are due to chance is very low. • For psychological findings, the threshold of 5% or .05 is considered statistically significant (this is called the pvalue). That is, 5 chance results in 100. Apply these principles to the Memory study… • Is there a statistically significant difference between the mean of the silence and the mean of all the other music types? • What should be done about our results? Your homework… • Identify one study you would like to perform on the population of the school. You must explain: • The hypothesis • The type of study • The independent and dependant variables • Your anticipated result • How that result would change the way the school is run.