Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Q1:- State the properties of Chi-Square Distribution. Ans:- i) 0<x2<β ii) Mean=n iii) Moment Generating Function is iv) The curve of a Chi-Square distribution is positively skewed. v) ππ -distributionβNormal Distribution as d.fββ & Variance=2n M0(t)=(1-2t)-n/2 , for t<1/2 vi) Additive Property:If X & Y are independent ππ -random variables with n1 & n2 d.f respectively. Then sum X+Y is a ππ -random variable with n1+n2 d.f. vii) Portioning Property:- Μ )π +n(π βππ=π(πΏπ β µ) 2= βππ=π (πΏπ β π Μ X2(n) viii) = X2(n-1) + β µ)2 & X2(1) An important approximation to the ππ -distribution is given by R.A Fisher for sufficiently large n, the random variable βπππ is normally distributed as (βππ β π , 1) Q2:- State the important application of ππ (Chi Square) statistics. Ans:- The important applications of ππ are as (i) A test of goodness of fit (ii) A test of independence (iii) (i) A test of goodness of fit:- A test of homogeneity. A ππ statistic can be applied when the cell probabilities are not known and they depend upon the unknown parameters of a specified distribution such as Binomial distribution, the Poisson distribution & the normal distribution. When there are k classes/categories and the class probabilities are known, the number of degrees of freedom is (k-1). When the probabilities depend upon m parameters, the degrees of freedom would be K-1-m i.e. d.f=number of classess-1number of parameters estimated from the sample. For example:- In normal distribution, there are two parameters µ & Ο¬ , therefore the degree of freedom is (K-1-2), i.e (K-3). (ii) Independence of two variables:The ππ statistic can also be used to test the hypothesis about independence of two variables, each of which is classified into a number of categories or attributes or qualitative data such as male or female, tall or short, satisfied or dissatisfied, high or low, healthy or diseased, positive or negative. The designated attributes are A, B, Cβ¦.. & the absence of these attributes areβ, Ξ² andπΈ. (iii) A test of homogeneity. The chi-square statistic can also be used when the rows of a table which look like a contingency table, represent each a different sample or set of observations. The hypothesis to be tested is that two or more different random samples come from the same population or that samples are homogeneous as the word homogenous in statistic is often used to indicate the same are equal. The ππ -test applied in such a situation is called a test of homogeneity. A simpler method proposed by Brandt & Snedecor is used to calculate the value of ππ for two random samples. Q3:- Discuss theππ β ππππ ππ ππππ ππππ ππ πππ. What are the assumptions in the application of these tests to practical problem? (i) A test of goodness of fit:A ππ statistic can be applied when the cell probabilities are not known and they depend upon the unknown parameters of a specified distribution such as Binomial distribution, the Poisson distribution & the normal distribution. When there are k classes/categories and the class probabilities are known, the number of degrees of freedom is (k-1). When the probabilities depend upon m parameters, the degrees of freedom would be K-1-m i.e. d.f=number of classess-1number of parameters estimated from the sample. For example:- In normal distribution, there are two parameters µ & Ο¬ , therefore the degree of freedom is (K-1-2), i.e (K-3). Assumptions A goodness of fit test is a hypothesis that is concerned with the determination whether results of a sample conform to a hypothesized distribution which may be the Uniform, Binomial, Poisson, Normal or any other distribution. This is a kind of hypothesis for problems where we do not know the probability distribution of the random variable under consideration, say X, and we wish to test of hypothesis that X follows a particular distribution. In the test procedure, the range of all possible values of the random variable assumed to follow a particular distribution is divided into k mutually exclusive classes and the probabilities piβs, are calculated for each of the classes, using the estimates of the parameters of the probability distribution specified in H0. The npi^ represents the expected number of observations that fall in the ith class and ni represents the observed number of observations in that class. The difference between observed and expected number of observations can arise from sampling error or from H0 being class. Small differences are generally attributed to sampling error or from H0 being class are unlikely the hypothesized distribution gives a satisfactory fit to the sample data (H0 true). Q4:- What is coefficient of Contingency for an r×c Contingency table? Describe its limits. Ans:- The Chi-Square statistics shows only whether the sample data do or do not conform to the hypothesis. It does not tell a thing about the strength of the association which we sometimes desire to measure. For this Karl Pearson has defined a Co-efficient C , by the relation. π π C=βπ+ππ When C=0 n=Sample Size Shows complete independence of classification. (π²βπ) π² When C=β Shows perfect association of classification. Where K is the small of r & c. e.g (π²βπ) . π² So 0<C< In a 2×3 Contingency table Max value of C= (πβπ) =0.707. π The larger the value of C, the stronger is the association of independence. Another measure known as Cramerβs Co-efficient of Contingency, is defined as ππ Q=π(π²βπ) ; n=Sample Size ; K is the smaller of r & c . If the variables are completely independent then Q=0 If the variables are completely associated then Q=1 Q5:- Define Chi-Square Random Variable. Explain how you determine a Confidence Interval Estimate of Ο¬2 of a Normal Population. Ans:- Let Z1, Z2, Z3,β¦β¦β¦Zn be normally and independently distributed variables with zero means and unit variances then a random variables ππ = βππ=π ππ Μ and S2 be the mean and variance of a random sample X1, X2, X3, β¦β¦.., Xn of size n Let π drawn from a normal population with variance Ο¬2. Then the Statistic π ππ = ππ Μ )π β(πΏβπ ππ To conduct a two sided Confidence Interval for Ο¬2. We find two values of ππ -distribution with (n-1) d.f , say a and b, such that π Ξ± β«π π(ππ )π (ππ ) = π β Ξ± β«π π(ππ )π (ππ ) = π and 1βπΌ πΌ π2 0 Now P[a< π ππ ππ <b] = π β Ξ± Divide by ns2 P[ P[ P[ P[ π π ππ π ππ π < π ππ >π π ππ Μ )π β(πΏβπ π π ππ > π π < <π ]=πβ π ππ π π < Μ ) π β(πΏβπ π₯2 Ξ±/2 Ξ± ]=πβ Ξ± ]=πβ Ξ± Μ )π β(πΏβπ π a = ππ1βΞ±/2 Where P[ π ππ π <π π π < <ππ < ]=πβ and Μ )π β(πΏβπ ππ 1βΞ±/2 Ξ± b = π₯ 2 Ξ±/2 ]=πβ Ξ± Q:- what is meant by analysis of variance and degree of freedom? Ans:- The analysis of variance is a technique that partitions the total variation- a term distinct from variance and measured by the sum of squares of deviations from the mean- into its components parts, each of which is associated with a different source of variation. These component parts of variance are then analyzed in such a manner that certain hypothesis can be tested. This technique is based on the facts that (i) The more the sample means differ the larger the variance becomes, and (ii) The separate components provide independent and unbiased of the common population variance. The ANOVA procedure therefore compares two different estimates of variance by using F-distribution to determine whether the population means are equal. The ANOVA has been shown the most powerful technique whenever the statistical data can be categorized in groups. Degree of freedom:- The number degree of freedom represents is given as the number of observations in a sample minus the number of population parameters that must be estimated from the sample data. In ππ , it contains only one parameter, called the number of degree of freedom. In other words the term degree of freedom represents the number of independent random variables that express the Chi-square. If the random variables entering a Chi-Square are subjected to linear restrictions, then the number of degree of freedom is reduced by the number of restrictions involved. Q:- Discuss why using multiple two sample t-test is not an appropriate to analysis of variance Ans:- When we compared two population means by using a two-sample t-test. However, we are often required to compare more than two population means simultaneously. Then we apply t-test pair wise comparison of means as If we compare 4 population means, there will be (ππ) = 6 separate pairs If we compare 10 population means, there will be (ππ ) = 45 separate pairs. π This sort of running multiple two sample t-tests for comparing means has two advantages (i) (ii) First the procedure is tedious and time consuming. The overall level of significance greatly increases as the number of t-test increases. Thus a series of two-sample t-tests is not appropriate procedure to test the quality of several means simultaneously.