Download Q1:- State the properties of Chi-Square Distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Q1:- State the properties of Chi-Square Distribution.
Ans:-
i)
0<x2<∞
ii)
Mean=n
iii)
Moment Generating Function is
iv)
The curve of a Chi-Square distribution is positively skewed.
v)
π’™πŸ -distributionβ†’Normal Distribution as d.fβ†’βˆž
&
Variance=2n
M0(t)=(1-2t)-n/2 ,
for t<1/2
vi)
Additive Property:If X & Y are independent π’™πŸ -random variables with n1 & n2 d.f
respectively. Then sum X+Y is a π’™πŸ -random variable with n1+n2 d.f.
vii)
Portioning Property:-
Μ… )𝟐 +n(𝒙
βˆ‘π’π’Š=𝟏(π‘Ώπ’Š βˆ’ µ) 2= βˆ‘π’π’Š=𝟏 (π‘Ώπ’Š βˆ’ 𝒙
Μ…
X2(n)
viii)
=
X2(n-1)
+
βˆ’ µ)2 &
X2(1)
An important approximation to the π’™πŸ -distribution is given by R.A Fisher for
sufficiently large n, the random variable βˆšπŸπ’™πŸ is normally distributed as (βˆšπŸπ’ βˆ’ 𝟏 , 1)
Q2:- State the important application of π’™πŸ (Chi Square) statistics.
Ans:-
The important applications of π’™πŸ are as
(i)
A test of goodness of fit (ii)
A test of independence (iii)
(i)
A test of goodness of fit:-
A test of homogeneity.
A π’™πŸ statistic can be applied when the cell probabilities are not known and they
depend upon the unknown parameters of a specified distribution such as Binomial
distribution, the Poisson distribution & the normal distribution.
When there are k classes/categories and the class probabilities are known, the
number of degrees of freedom is (k-1). When the probabilities depend upon m
parameters, the degrees of freedom would be K-1-m i.e. d.f=number of classess-1number of parameters estimated from the sample.
For example:- In normal distribution, there are two parameters µ & Ο¬ , therefore the
degree of freedom is (K-1-2), i.e (K-3).
(ii)
Independence of two variables:The π’™πŸ statistic can also be used to test the hypothesis about independence of two
variables, each of which is classified into a number of categories or attributes or
qualitative data such as male or female, tall or short, satisfied or dissatisfied, high or
low, healthy or diseased, positive or negative. The designated attributes are A, B, C…..
& the absence of these attributes are∝, β and𝜸.
(iii)
A test of homogeneity.
The chi-square statistic can also be used when the rows of a table which look like a
contingency table, represent each a different sample or set of observations. The
hypothesis to be tested is that two or more different random samples come from the
same population or that samples are homogeneous as the word homogenous in
statistic is often used to indicate the same are equal. The π’™πŸ -test applied in such a
situation is called a test of homogeneity.
A simpler method proposed by Brandt & Snedecor is used to calculate the value of π’™πŸ
for two random samples.
Q3:- Discuss the𝝌𝟐 βˆ’ 𝒕𝒆𝒔𝒕 𝒐𝒇 π’ˆπ’π’π’…π’π’†π’”π’” 𝒐𝒇 π’‡π’Šπ’•. What are the assumptions in
the application of these tests to practical problem?
(i)
A test of goodness of fit:A π’™πŸ statistic can be applied when the cell probabilities are not known and they
depend upon the unknown parameters of a specified distribution such as Binomial
distribution, the Poisson distribution & the normal distribution.
When there are k classes/categories and the class probabilities are known, the
number of degrees of freedom is (k-1). When the probabilities depend upon m
parameters, the degrees of freedom would be K-1-m i.e. d.f=number of classess-1number of parameters estimated from the sample.
For example:- In normal distribution, there are two parameters µ & Ο¬ , therefore the
degree of freedom is (K-1-2), i.e (K-3).
Assumptions
A goodness of fit test is a hypothesis that is concerned with the determination
whether results of a sample conform to a hypothesized distribution which may be the
Uniform, Binomial, Poisson, Normal or any other distribution. This is a kind of
hypothesis for problems where we do not know the probability distribution of the
random variable under consideration, say X, and we wish to test of hypothesis that X
follows a particular distribution.
In the test procedure, the range of all possible values of the random variable assumed
to follow a particular distribution is divided into k mutually exclusive classes and the
probabilities pi’s, are calculated for each of the classes, using the estimates of the
parameters of the probability distribution specified in H0.
The npi^ represents the expected number of observations that fall in the ith class and
ni represents the observed number of observations in that class. The difference
between observed and expected number of observations can arise from sampling
error or from H0 being class. Small differences are generally attributed to sampling
error or from H0 being class are unlikely the hypothesized distribution gives a
satisfactory fit to the sample data (H0 true).
Q4:- What is coefficient of Contingency for an r×c Contingency table?
Describe its limits.
Ans:-
The Chi-Square statistics shows only whether the sample data do or do not conform to the
hypothesis. It does not tell a thing about the strength of the association which we sometimes
desire to measure. For this Karl Pearson has defined a Co-efficient C , by the relation.
𝟐
𝒙
C=βˆšπ’+π’™πŸ
When C=0
n=Sample Size
Shows complete independence of classification.
(π‘²βˆ’πŸ)
𝑲
When C=√
Shows perfect association of classification.
Where K is the small of r & c.
e.g
(π‘²βˆ’πŸ)
.
𝑲
So 0<C<
In a 2×3 Contingency table
Max value of C=
(πŸβˆ’πŸ)
=0.707.
𝟐
The larger the value of C, the stronger is the association of independence.
Another measure known as Cramer’s Co-efficient of Contingency, is defined as
π’™πŸ
Q=𝒏(π‘²βˆ’πŸ)
;
n=Sample Size ;
K is the smaller of r & c .
If the variables are completely independent
then
Q=0
If the variables are completely associated
then
Q=1
Q5:-
Define Chi-Square Random Variable. Explain how you determine a
Confidence Interval Estimate of Ο¬2 of a Normal Population.
Ans:-
Let Z1, Z2, Z3,………Zn be normally and independently distributed variables with zero means and
unit variances then a random variables
π’™πŸ = βˆ‘π’π’Š=𝟏 π’πŸ
Μ… and S2 be the mean and variance of a random sample X1, X2, X3, …….., Xn of size n
Let 𝒙
drawn from a normal population with variance Ο¬2. Then the Statistic
𝒏 π’πŸ
=
𝝈𝟐
Μ… )𝟐
βˆ‘(π‘Ώβˆ’π’™
π›”πŸ
To conduct a two sided Confidence Interval for Ο¬2. We find two values of π’™πŸ -distribution with
(n-1) d.f , say a and b, such that
𝒂
Ξ±
βˆ«π’ 𝒇(π’™πŸ )𝒅(π’™πŸ ) = 𝟐
∞
Ξ±
βˆ«π’ƒ 𝒇(π’™πŸ )𝒅(π’™πŸ ) = 𝟐
and
1βˆ’π›Ό
𝛼
πœ’2
0
Now
P[a<
𝒏 π’πŸ
𝝈𝟐
<b] = 𝟏 βˆ’
Ξ±
Divide by ns2
P[
P[
P[
P[
𝒂
𝒏 π’πŸ
𝒏 π’πŸ
𝐚
<
𝟏
𝝈𝟐
>𝝈
𝒏 π’πŸ
Μ… )𝟐
βˆ‘(π‘Ώβˆ’π’™
𝐛
𝒏 π’πŸ
>
𝟐
𝐛
<
<𝝈
]=πŸβˆ’
𝒏 π’πŸ
𝟐
𝐚
<
Μ…) 𝟐
βˆ‘(π‘Ώβˆ’π’™
π‘₯2 Ξ±/2
Ξ±
]=πŸβˆ’
Ξ±
]=πŸβˆ’
Ξ±
Μ… )𝟐
βˆ‘(π‘Ώβˆ’π’™
𝐚
a = π’™πŸ1βˆ’Ξ±/2
Where
P[
𝒏 π’πŸ
𝟐
<𝝈
𝐛
𝒃
<
<𝝈𝟐
<
]=πŸβˆ’
and
Μ… )𝟐
βˆ‘(π‘Ώβˆ’π’™
π’™πŸ 1βˆ’Ξ±/2
Ξ±
b = π‘₯ 2 Ξ±/2
]=πŸβˆ’
Ξ±
Q:-
what is meant by analysis of variance and degree of freedom?
Ans:- The analysis of variance
is a technique that partitions the total
variation- a term distinct from variance and measured by the sum of squares of
deviations from the mean- into its components parts, each of which is associated with
a different source of variation.
These component parts of variance are then analyzed in such a manner that certain
hypothesis can be tested. This technique is based on the facts that
(i)
The more the sample means differ the larger the variance becomes, and
(ii)
The separate components provide independent and unbiased of the common
population variance.
The ANOVA procedure therefore compares two different estimates of variance by
using F-distribution to determine whether the population means are equal. The
ANOVA has been shown the most powerful technique whenever the statistical data
can be categorized in groups.
Degree of freedom:-
The number degree of freedom represents is given as the
number of observations in a sample minus the number of population parameters that
must be estimated from the sample data.
In π’™πŸ , it contains only one parameter, called the number of degree of freedom.
In other words the term degree of freedom represents the number of independent
random variables that express the Chi-square.
If the random variables entering a Chi-Square are subjected to linear restrictions, then
the number of degree of freedom is reduced by the number of restrictions involved.
Q:- Discuss why using multiple two sample t-test is not an appropriate to
analysis of variance
Ans:- When we compared two population means by using a two-sample t-test.
However, we are often required to compare more than two population means
simultaneously. Then we apply t-test pair wise comparison of means as
If we compare 4 population means, there will be (πŸ’πŸ) = 6 separate pairs
If we compare 10 population means, there will be (𝟏𝟎
) = 45 separate pairs.
𝟐
This sort of running multiple two sample t-tests for comparing means has two
advantages
(i)
(ii)
First the procedure is tedious and time consuming.
The overall level of significance greatly increases as the number of t-test
increases.
Thus a series of two-sample t-tests is not appropriate procedure to test the
quality of several means simultaneously.