Download Lecture 8/02

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Lecture 8/02
Population = a group of organisms of a given species that live in a particular area at a given time.
Sample = A subset of organisms extracted from a population. What is a random sample?
Data type
1. Quantitative (Numerical) variables
a. discrete = are units
Examples: number of children in a family, number of attacks of asthma per week
b. continuous = can take any value within a given range.
Examples: height, weight
2.
Qualitative (Categorical) variables that can be
a.
ordinal (Ordered categories) - exam results (pass or fail), socio-economic status (low, middle or
high)
b. nominal (Unordered categories) – examples: eye color, hair color
The Concept of ``Distribution''
Measurements on any variable, even the same variable on the same subject, will always vary. The pattern of
variation of a variable is called its distribution. The distribution records all possible numerical values of a variable
and how often each value occurs (its frequency). The most common example of a distribution is the bell-shaped.
The Normal distribution is defined by two parameters, which are the mean and the standard deviation of the
population. The curves are symmetrically bell shaped, but the extent to which the bell is compressed or flattened
out depends on the standard deviation of the population.
The distribution of a variable is of biological interest because it tells us that there is, selection for or against
organisms falling in one of the tails of the distribution.
If a sample of immature insects, we discover that the measurements are bimodally distributed (with two peaks) this
could indicate that the population has different species mix together in the sample.
Descriptive statistics
A. Statistics of location – describe the position of a sample along a given dimension representing a variable.
Mean = widely use measure of central tendency. It is the sum of the measurements / by the number of
measurements in the set.
Median = It seems to be the preferred measure for describing economic, sociological and educational data.
Why? Many of the frequency distributions of measurements are skewed.
Calculate the median:
Example: A sample of 7 students was given a reading achievement test. Find the median for the test scores:
86 78 90 62 73 89
we must arrange the scores in order of magnitude
62 73 78 86 89 90 95
We have an odd number of 7 measurements; the median is the middle score- 86.
Example: Suppose 3 more students took the achievement test and scored: 73 75 91. Determine the median.
73 73 75 78 86 89 90 91 95
Median = 78 + 86/2 = 82
Mode = is the measurement that occurs most often in the set. Is useful in business for identifying products
that are in greatest demand
B. Statistics of dispersion – how spread is the data around the mean
Range - distance between the lowest & the highest reading
Standard deviation – is a summary measure of the differences of each observation from the mean, which is
the square root of the Variance
Variance – The average of the squared deviation of each score from the mean.
S2 = x-m)2 / n-1
add up
x= value for the variable
m = mean of the sample
n = total number of measurements or size of the sample
/ = divided by
Hypothesis testing
1. Restate the research hypothesis as a statistical hypothesis and an alternative hypothesis.
2. The statistical hypothesis is called the Null Hypothesis which is symbolized by (H0 );
The alternative hypothesis is called the Alternative Hypothesis (H1 ).
you will test the Null hypothesis.
4. Specify a statistical test to be used on the data.
5. Compare the numerical value calculated from the statistical test to a critical value derived from the specific
distribution.
6. Reject the Null hypotheses if your calculated value is greater than or equal to the critical or tabled value.
7. State your results in statistical terms using probability terms -- p < .05 (reject the Null Hypothesis) or p > .05
(fail to reject the Null Hypothesis; (p) means probability).