Download glossary for the scientific process

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
GLOSSARY FOR THE SCIENTIFIC PROCESS
Data – Small amount of information about the subject of an investigation. (data = plural;
datum = singular)
Discrete Data – Each point can be only a whole number. Cats would be discrete units
because there is no possibility of a fraction of a cat. We count or tally these data. Chisquare analyses are based on discrete data.
Continuous Data – Points taken along a scale that can be infinitely subdivided. Time,
weight, and temperature are examples. We measure these data.
Categorical Data – Each point falls into a non-numeric group or category, e.g., male or
female.
Distribution – The pattern of occurrence of a set of data.
Class – A group of data points whose limits are set by an upper and a lower value. Used
in making a frequency distribution. We divide the overall range of the values in our data
set into a number of classes and count the number of data points that fall into each of
these classes.
Frequency Distribution – The pattern of number of measurements that fall into each
class.
Normal Distribution – Data in a frequency distribution spread out equally on either
side of a central high point. Its appearance is that of a symmetrical bell-shaped curve
with the tails of the curve extending infinitely far in both positive and negative
directions. Central to probability in analytical statistics. (See below.)
Experimental Design – The formal design worked out to test the prediction of a hypothesis.
It designates independent variables to create so that responses in dependent variables that
depend on the manipulations can be analyzed.
Independent Variable – The factor to be varied by direct manipulation by the
investigator or by natural categorization in the experiment. It is expected to cause an
effect in the dependent variable. In a graph this variable occurs on the x (horizontal)
axis.
Dependent Variable – The variable whose response we measure in the experiment. It
is expected to result from variation in the independent variable. In a graph this variable
occurs on the y (vertical) axis. It may be a continuous or discrete variable.
Treatment – One of the categories varied in a categorical independent variable.
Control – A special treatment in a manipulative experiment. It is the standard, the
group left unmanipulated, and provides baseline comparative data to evaluate the effect
of the manipulated treatment.
Item Measured/Counted – The item that is measured or counted in an experiment (e.g.
a tree, a leaf, a quadrat, a population).
Sampling Unit – The number of items measured/counted included in one datum point.
(e.g. number per sampling unit; number of trees/quadrat; number of seeds/dish)
Replicates (N) – The number of sampling units in each treatment.
Experimentation – Methods used to test predictions of hypotheses.
Manipulation – Alterations in the independent variable are created by the investigator.
Observation – Natural variation in the independent variable occurs, requiring no
alteration, only direct observation of the dependent variable by the investigator.
Measurement – Performed on continuous variables.
Count or Tally – Performed on discrete and categorical variables.
Graph (Figure) (Chart) – A diagram that represents the variation of a variable in
comparison with that of one or more other variables.
Axes – Horizontal (x axis, abscissa) for independent variable(s).
Vertical (y axis, ordinate) for dependent variable(s).
Bar Graph – Used when the independent variable is categorical or divided into classes
(otherwise known as Column Graph).
Box and Whisker Graph – Used to display differences among treatments in means,
ranges, and standard deviations.
Column Graph – (see Bar Graph).
Histogram – Used to display frequency distributions. Classes of measurement occur on
the x axis and frequency in each class on the y axis.
Line Graph – Used when the x axis represents a continuous variable. Sometimes the x
variable is the independent variable. Other times, as in showing a correlation, neither x
nor y variables are designated as independent or dependent variables.
Hypothesis – A formal statement of a possible explanation for an observed phenomenon. A
“might-be” about the way the world works. It leads to predictions.
Null Hypothesis – A statistical hypothesis stating that there is no association between two
variables or no difference among means. (e.g. H0: A=B)
Alternative Hypothesis – A statistical hypothesis stating the pattern in the data that is
expected if the predictions holds true. (e.g. HA: A>B)
Speculation – A first informal attempt at explaining an observed phenomenon.
Prediction – A consequence expected by the logic of the hypothesis. An experiment
arises out of the predictions.
If…,then logic – A formal conditional statement of the hypothesis and prediction that
uses deductive logic. If the phenomenon I observed can be explained in this way, then
these consequences should occur. The “if” clause contains the hypothesis and the “then”
clause the prediction that is to be tested.
Cause…Effect – In a manipulation experiment, we test whether the independent
variable causes an effect (response) in the dependent variable.
Testability – The hypothesis must be susceptible to testing through the scientific
process, where science is limited to the study of the physical world.
Assumption – A fact that is taken-for-granted in the experiment. If the experiment fails
to falsify the hypothesis, the assumption may not have been true and now itself would
need to be tested.
Population – Any set of individuals or objects having some common observable
characteristic. The unit from which the data sample is taken.
Sample or Sample Set (N) – The sub-set of the population measured or counted in the
experiment.
Random Sample – A sample taken with no bias.
Scientific Process or Method – The logical process by which scientific information is
gathered by asking and answering questions about the physical world.
Statistics – A tool used 1) to describe trends and relationships and 2) to decide whether to
accept or reject an hypothesis based on the probability of whether the results of an
experiment could have occurred by chance or not.
Descriptive Statistics – A summary of data in a variable that provides information about its
central tendencies and dispersion.
Parameter – A measurable characteristic of a given distribution, e.g., mean, variance,
standard deviation.
Central Tendency – A measurement that represents the center point of a data set.
Mean ( X ) – The numerical average of a data set. Calculated by adding up the values of
a sample and dividing by the number of observations, N. A mean is always strongly
affected by extreme readings. When reported alone, the mean may not be very
meaningful. If the data are very skewed or bimodal, the mean might be deceptive.
Always report a measure of dispersion as well.
Median – The middlemost value of a data set. If all data points are organized from
smallest to largest, the median is the middle point. It is less susceptible to distortion by
an extreme reading.
Dispersion – A measurement of the spread of the data around the mean.
Range – The distance between the lowest and the highest values.
Variance (s2) – The square of the standard deviation (see below)
Standard Deviations (s) or S.D. – A sort of average of the deviation of all observed
values from the mean. If the S.D. is small, then most of the sample values lie quite close
to the sample mean, but if the S.D. is large, then many of the sample values lie rather far
from the sample mean. In a data set that fits a normal distribution, the S.D. can be found
by drawing to the horizontal axis a line from the point of inflection of the normal curve
and from the mean point. Then, the S.D. is equal to the distance between the mean point
on the baseline and the point of inflection on the baseline. Of the data from a normally
distributed population, 34% by definition falls within 1 S.D. from one side of the mean,
48% within 2 S.D., and 49.87% within 3 S.D. One S.D. on either side of the mean
includes 68% of the data, 2 S.D. includes 95%, and 3 S.D. includes 99.74% of the data.
Standard Error (S.E.) – The range within which the mean is found 68.2% of the time.
S.E. = standard deviation divided by the square root of N, the sample size.
Analytical (Comparative) Statistics – A series of tests used to examine different kinds of
data to determine whether or not to accept or reject a hypothesis.
Null hypothesis (H0) – A statistical hypothesis stating that there is no difference
between treatments (populations) in an experiment. Statistical tests are designed to see
whether or not you can reject your null hypothesis.
Alternative Hypothesis (H1) – A standard hypothesis stating that there is a difference
between treatments (populations) in an experiment. The statistical test is designed to
provide support for the alternative hypothesis only if the null hypothesis is rejected.
There may be more than one alternative hypothesis to explain an observed phenomenon.
Probability – A mathematical theory that provides a basis for the evaluation of the
reliability of the conclusions and inference based on the data.
Level of Significance (probability value) (alpha level) – The probability of making a
Type I error. It furnishes the probability basis upon which we accept or reject a
hypothesis. The size of the discrepancy between the value of the null hypothesis and the
alternative hypothesis provides the basis of judging the probability of obtaining the
discrepancy. Small discrepancies from a valid hypothesis due to sampling error
(chance) are common; large discrepancies are rare. If we assign values to the level of
discrepancy, we say that with a true null hypothesis large discrepancies occur only 5%
of the time and small discrepancies the remaining 95% of the time. Using this arbitrary
criterion, we can then propose to reject the null hypothesis if the discrepancy is so large
that it occurs only 5% of the time by chance. The 5% frequency values that enables us
to reject the null hypothesis is called the 5% level of significance. When alpha is set at
.05, the chances are 1 out of 20 that a true null hypothesis will be accidentally rejected.
A small value for alpha is used to provide protection against rejecting true null
hypotheses.
Degrees of Freedom – The number of independent classes. The number of classes
about which you need information in order to know the distribution of data points in all
classes.
Statistically Significant – The discrepancy between null and alternative hypotheses is
so large (occurs less than 5% of the time by chance) that it causes us to reject the null
hypothesis.
Statistically Non-significant – The discrepancy between null and alternative hypothesis
is so small (occurs more than 5% of the time by chance) that it causes us to accept the
null hypothesis.
Sampling Error – The fact that by chance, and chance alone, the characteristics of a
sample differ from those of the population.
Type I Error – The rejection of a true null hypothesis. If you flip a fair coin and get
ten heads in a row, your H0 : 10:0=50:50 would be rejected and you would conclude that
the coin was unfair.
Type II Error – The failure to reject a false H0. If you flip a coin and throw 4 heads
and 6 tails, you would fail to reject the null hypothesis that 4:6=50:50. A much larger
sample size, giving results of 400:600 would have caused you to reject the null. If the
coin really was unfair but you had concluded that it was fair on the basis of your toss of
4:6, you would have made a Type II error by failing to reject a false null hypothesis.
Proof – The scientific process and statistical analysis are designed to reject (falsify)
hypotheses. There is no such thing as proof or proving a hypothesis. No matter how
many times a hypothesis is confirmed, the next observation may prove it to be false. We
“accept” a hypothesis as if it were true if it has been reasonably confirmed and there is
no evidence to contradict it; but it may be shown to be false at any time.
Falsify – The scientific process and statistical analysis are designed to indicate whether
the hypothesis is accepted (not proved) or falsified (shown to be incorrect).
Conclusion – The final step of the scientific process. Based upon statistical evidence, a
decision is made as to whether the hypothesis is accepted or falsified by the data sets.
Variable – Some factor that can have more than one value.
Categorical Variable – A discrete variable in which a value takes on a certain nonnumeric state or category. Here, the mean of categories usually doesn’t make much
sense, e.g., a flower is caged or non-caged. Used in contingency tables and X2 tests.
Continuous Variable – An analytical variable that can take any value limited only by
our ability to differentiate values. Here, the mean of a sample has some meaning, e.g.,
length of time a bee visits a flower. Used in t-tests, ANOVA, regression, and
correlation.
Discrete Variable – A variable that can take a limited number of values, those with a
separate identity that cannot be subdivided, e.g., population size.