Download Powerpoint promissed

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Least squares wikipedia , lookup

Regression analysis wikipedia , lookup

Choice modelling wikipedia , lookup

Linear regression wikipedia , lookup

Data assimilation wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Time series wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
Statistics in Theses
Dr. John P. Abraham
Professor
University of Texas Pan American
Describe an egg
 Students try to do this
Differences in description
 Children’s view
 Adults’ view
 Shopper’s view
 Seller’s view
 Producer’s view
 Chicken’s view
 Biologist’s view
 Dietician’s view
 Chemist’s view
Measurements
 You need to describe using some measurements
 Errors in measurements
Descriptive statistics
 summarizing a collection of data in a clear and
understandable way.
 Numerical
 Graphical
Numerical descriptive statistics
 Spread
 Range
 Semi-interquartile range
 Std deviation
 central tendency
 Mean
 Median
 Mode
Inferential Statistics
 Infer about a population based on a sample
 Infer about the future based on past
Hypothesis testing using variables
 A variable is characteristic of an object of a study that can be
measured.
 The measurements will be different for different objects.
 Can be quantitative or qualitative
 Can be independent or dependent
 Continuous or discrete (when we create a 1 to 5 ranking)
Necessity for control
 What is a control group
 A control group study uses a control group to compare to an
experimental group in a test of a causal hypothesis.
 The control and experimental groups must be identical in all
relevant ways except for the introduction of a suspected causal
agent into the experimental group.
 For example, if 'C' causes 'E', when we introduce 'C' into the
experimental group but not into the control group, we should
find 'E' occurring in the experimental group at a significantly
greater rate than in the control group.
 Significance is measured by relation to chance: if an event is not
likely due to chance, then its occurrence is significant.
Double blind study
 a control group test where neither the evaluator nor the
subject knows which items are controls
 A randomized test is one that randomly assigns items to the
control and the experimental groups.
 The purpose of controls, double-blind, and randomized
testing is to reduce error, self-deception and bias.
Placebo
 Many control group studies use a placebo in control groups
to keep the subjects in the dark as to whether they are being
given the causal agent that is being tested.
 For example, both the control and experimental groups will
be given identical looking pills in a study testing the
effectiveness of a new drug. Only one pill will contain the
agent being tested; the other pill will be a placebo.
 In a double-blind study, the evaluator of the results would not
know which subjects got the placebo until his or her
evaluation of observed results was completed. This is to avoid
evaluator bias from influencing observations and
measurements.
Inferential statistics
 we use inferential statistics to make inferences from our data
to more general conditions; we use descriptive statistics
simply to describe what's going on in our data.
 we use inferential statistics to make judgments of the
probability that an observed difference between groups is a
dependable one or one that might have happened by chance
in this study.
T-test
 compare the average performance of two groups on a single
measure to see if there is a difference.
 You might want to know whether there is a difference
between girls and boys in their math abilities.
 Whenever you wish to compare the average performance
between two groups you should consider the t-test for
differences between groups.
Is there a difference?
How about here?
T-test example
 The Acme Company has developed a new battery. The engineer in
charge claims that the new battery will operate continuously for at least
7 minutes longer than the old battery.
 To test the claim, the company selects a simple random sample of 100
new batteries and 100 old batteries. The old batteries run continuously
for 190 minutes with a standard deviation of 20 minutes; the new
batteries, 200 minutes with a standard deviation of 40 minutes.
 Test the engineer's claim that the new batteries run at least 7 minutes
longer than the old. Use a 0.05 level of significance. (Assume that there
are no outliers in either sample.)
 See next slie
4 steps needed
 (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze





sample data, and (4) interpret results
State the hypotheses. The first step is to state the null hypothesis and
an alternative hypothesis.
Null hypothesis: μ1 - μ2 >= 7
Alternative hypothesis: μ1 - μ2 < 7
Formulate an analysis plan. For this analysis, the significance level is
0.05. Using sample data, we will conduct a two-sample t-test of the null
hypothesis.
Analyze sample data. Using sample data, we compute the standard
error (SE), degrees of freedom (DF), and the t-score test statistic (t). t
= [ (x1 - x2) - d ] / SE = [(200 - 190) - 7] / 4.472 = 3/4.472 = 0.67
Interpret results. Since the P-value (0.75) is greater than the
significance level (0.05), we cannot reject the null hypothesis.
Standard Score
 Problem
 A national achievement test is administered annually to 3rd





graders. The test has a mean score of 100 and a standard
deviation of 15. If Jane's z-score is 1.20, what was her score
on the test?
From the z-score equation, we know
z = (X - μ) / σ
where z is the z-score, X is the value of the element, μ is the
mean of the population, and σ is the standard deviation.
Solving for Jane's test score (X), we get
X = ( z * σ) + 100 = ( 1.20 * 15) + 100 = 18 + 100 = 118
Probability
 Mathematically, the probability that an event will occur is expressed as a
number between 0 and 1.
 Notationally, the probability of event A is represented by P(A).
 A coin is tossed three times. What is the probability that it lands on
heads exactly one time?
 If you toss a coin three times, there are a total of eight possible
outcomes. They are: HHH, HHT, HTH, THH, HTT, THT, TTH, and
TTT. Of the eight possible outcomes, three have exactly one head. They
are: HTT, THT, and TTH. Therefore, the probability that three flips of a
coin will produce exactly one head is 3/8 or 0.375.

ANOVA (Analysis of Variance)
 gives a statistical test of whether the means of several groups
are all equal
 MANOVA (multivariate analysis of variance)
 Multivariate analysis of variance (MANOVA) is used when there
is more than one dependent variable.
Correlation
 Statistical correlation is a statistical technique which tells us if
two variables are related.
If the change in one variable is accompanied by a change in
the other, then the variables are said to be correlated. We can
therefore say that family income and family expenditure,
price and demand are correlated.
 You should measure manipulated variables rather than: one
could compute 'r' between the size of shoe and intelligence of
individuals, heights and income. Irrespective of the value of
'r', it makes no sense and is hence termed chance or non–
sense correlation.
r Value
 In general, r > 0 indicates positive relationship, r < 0
indicates negative relationship while r = 0 indicates no
relationship (or that the variables are independent and not
related). Here r = +1.0 describes a perfect positive
correlation and r = -1.0 describes a perfect negative
correlation.
 value of rStrength of relationship-1.0 to –0.5 or 1.0 to
0.5Strong-0.5 to –0.3 or 0.3 to 0.5Moderate-0.3 to –0.1 or
0.1 to 0.3Weak–0.1 to 0.1None or very weak
Analysis of Covariance
 Anova mixed with regression analysis
 ANCOVA tests whether certain factors have an effect on the
outcome variable after removing the variance for which
quantitative predictors (covariates) account.
 Suppose you analyze the results of a clinical trial of three
types of treatment of a disease - "Placebo", "Drug 1", and
"Drug 2". The results are three sets of survival times,
corresponding to patients from the three treatment groups.
The question of interest is whether there is a difference
between the three types of treatment in the average survival
time.
ANCOVA cont.
 You might use analysis of variance to answer this question.
But, if you have supplementary information, for example,
each patient's age, then analysis of covariance allows you to
adjust the treatment effect (survival time, in this case) to a
particular age, say, the mean age of all patients. Age in this
case is a "covariate" - it is not related to treatment, but can
affect the survival time. This adjustment allows you to reduce
the observed variation between the three groups caused not
by the treatment itself but by variation of age.
Regression Analysis
 Regression analysis provides a "best-fit" mathematical
equation for the relationship between the dependent variable
(response) and independent variable(s) (covariates).
 In linear regression, the function is a linear (straight-line)
equation. For example, if we assume the value of an
automobile decreases by a constant amount each year after its
purchase, and for each mile it is driven, we can create a
formula to find the value.
Summarize the course
 Why use share point services?
 You will have several faculty members on your committee
 All will have to comment on your thesis and correct.
 Best way to make appointments with many people
 One central repository for all your files.
 Different versions are kept. In case of a mistaken edit can go
back.
Why review different theses
 Discussed style
 Discussed chapters
 Discussed content
 How to get ideas for your research from suggestions
References
 Discussed different types of references and what is acceptable
and what is not.
 Discussed plagiarism at length
 Discussed how to quote and how to cite
Theses and Project
 Differences
 Similarities
 Report writing
Formal research studies
 Hypothesis formulation
 Collect raw data
 Conduct statistical analysis
 Make concultions
 Report