Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistical Decision Theory Overview Definitions • Definitions Experiment: process of following a well-defined procedure where the outcome is not known prior to the experiment • Estimation • Central Limit Theorem Population: collection of all elements (N ) under investigation. • Test Properties Target Population: population about which information is wanted. • Parametric Hypothesis Tests Sample Population: population to be sampled. • Examples Sample: collection of some elements (n) of a population. • Parametric vs. Nonparametric Random Sample: sample in which each element in the population has an equal probability of being selected in the sample. Alternatively, a sequence of independent and identically distributed (i.i.d.) random variables, X1 , X2 , . . . Xn . • Nonparametric Tests Theoretically, random samples must be drawn with replacement. However, for large populations, there is little difference (n ≤ N/10) where n = size of sample and N = size of the sample population. J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 1 J. McNames Target vs. Sampled Portland State University ECE 4/557 Decision Theory Ver. 1.19 2 What is a Point Statistic? • The target population and the sample population are usually different • A point statistic is a number computed from a random sample: X1 , X2 , . . . Xn • Difficult to collect unbiased samples • It is also a random variable • Many studies are self selecting • Example: Sample average n • This is less of an issue for engineering problems X̄ = • Must be careful to collect “training” data and “test” data under same conditions 1 Xi n i=1 • Conveys a type of summary of the data • Is a function of multiple random variables • Specifically, a function that assigns real numbers to the points of a sample space. J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 3 J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 4 More Definitions Example 1: MATLAB’s Prctile Function Order Statistic of rank k, X (k) : statistic that takes as its value the kth smallest element x(k) in each observation (x1 , x2 , . . . xn ) of (X1 , X2 , . . . Xn ) 11 10 9 pth Sample Quantile: number Qp that satisfies 1. The fraction of Xi s that are strictly less than Qp is ≤ p 2. The fraction of Xi s that are strictly greater than Qp is ≤ 1 − p. • If more than one value meets criteria, choose average of smallest and largest. • There are other estimates of quantiles that do not assign 0% and 100% to the smallest and largest observations pth percentile 8 7 6 5 4 3 2 1 0 J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 5 J. McNames 0 10 20 30 40 Portland State University Example 1: MATLAB Code 50 p (%) 60 ECE 4/557 Sample Mean: 90 Decision Theory 100 Ver. 1.19 6 Ver. 1.19 8 n FigureSet(1,’LTX’); X̄ = μ̂X = p = 0:0.1:100; y = prctile(1:10,p); h = plot(p,y); set(h,’LineWidth’,1.5); xlim([0 100]); ylim([0 11]); xlabel(’p (%)’); ylabel(’pth percentile’); box off; grid on; 1 Xi n i=1 Sample Variance: n 2 s2X = σ̂X = AxisSet(8); print -depsc PrctilePlot; 1 (Xi − X̄)2 n − 1 i=1 Sample Standard Deviation: sX = σ̂X = Portland State University 80 Sample Mean and Variance function [] = PrctilePlot(); J. McNames 70 ECE 4/557 Decision Theory Ver. 1.19 7 J. McNames Portland State University ECE 4/557 s2X Decision Theory Point vs. Interval Estimators Biased Estimation • Our discussion so far has generated point estimates • An estimator θ̂ is an unbiased estimator of the population parameter θ if E[θ̂] = θ • Given a random sample, we estimate a single descriptive statistic • Sample mean of a random sample is an unbiased estimate of population (true) mean n 1 2 • Sample variance s2X = n−1 i=1 (Xi − X̄) is unbiased estimate of true (population) variance 1 – Why n−1 ? • Usually preferred to have interval estimates – Example: “We are 95% confident the unknown mean lies between 1.3 and 2.7.” – Usually more difficult to obtain – Consist of 2 statistics (each endpoint of the interval) and a confidence coefficient – We lose one degree of freedom by estimating E[X] with X̄ – Is one of your homework problems • Also called a confidence interval • sX is a biased estimate of the true population standard deviation J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 9 Central Limit Theorem Let Yn be the sum of n i.i.d. random variables X1 , X2 , . . . , Xn , let μYn be the mean of Yn , and σY2 n be the variance of Yn . As n → ∞, the distribution of the z-score Yn − μYn Z= σYn J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 10 Central Limit Theorem Continued • For large sums, the normal approximation is frequently used instead of the exact distribution • Also “works” empirically when Xi not identically distributed • Xi must be independent • For most data sets, n = 30 is generally accepted as large enough for the CLT to apply approaches the standard normal distribution. • In many cases, it is assumed that the random sample was drawn from a normal distribution • This is remarkable considering the theorem only applies as n → ∞ • The center of the distribution becomes normally distributed more quickly (i.e., with smaller n) than the tails • This is justified by the central limit theorem (CLT) • There are many variations • Key conditions: 1 RV’s Xi must have finite mean 2 RV’s must have finite variance • RV’s can have any distribution as long as they meet this criteria J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 11 J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 12 Example 2: Central Limit Theorem for Binomial RV E[Y ] = np Example 3: CLT Applied to Binomial RV σY2 = np(1 − p) Normalized Binomial Histogram for z N:2 NS:1000 Binomial Estimated Gaussian 5 • A sum of Bernoulli random variables, Y = binomial distribution n i=1 4.5 Xi , has a 4 PDF Estimate • Xi are Bernoulli random variables that take on either a 1 with probability p or a 0 with probability 1 − p • Define Y − E[Y ] σY • Let us approximate the PDF of Z from a random sample of 1000 points Z= Portland State University ECE 4/557 Decision Theory Ver. 1.19 3 2.5 2 1.5 1 0.5 • How good is the PDF of Z approximated by a normal distribution? J. McNames 3.5 0 13 J. McNames −1.5 −1 −0.5 Portland State University 0 0.5 ECE 4/557 1 Decision Theory Example 3: CLT Applied to Binomial RV Example 3: CLT Applied to Binomial RV Normalized Binomial Histogram for z N:5 NS:1000 Normalized Binomial Histogram for z N:10 NS:1000 3.5 Binomial Estimated Gaussian 1.5 Ver. 1.19 14 Binomial Estimated Gaussian 2.5 3 2 PDF Estimate PDF Estimate 2.5 2 1.5 1.5 1 1 0.5 0.5 0 J. McNames −2.5 −2 −1.5 −1 −0.5 Portland State University 0 0.5 ECE 4/557 1 1.5 2 Decision Theory 0 2.5 Ver. 1.19 15 J. McNames −3 −2 −1 Portland State University 0 ECE 4/557 1 2 Decision Theory 3 Ver. 1.19 16 Example 3: CLT Applied to Binomial RV Example 3: CLT Applied to Binomial RV Normalized Binomial Histogram for z N:20 NS:1000 Normalized Binomial Histogram for z N:30 NS:1000 1.6 Binomial Estimated Gaussian 2 1.8 1.4 1.6 1.2 1.4 PDF Estimate PDF Estimate Binomial Estimated Gaussian 1.2 1 0.8 0.6 1 0.8 0.6 0.4 0.4 0.2 0.2 0 −4 J. McNames −3 −2 −1 Portland State University 0 1 ECE 4/557 2 0 3 Decision Theory Ver. 1.19 17 J. McNames −3 0 1 ECE 4/557 2 3 Decision Theory Ver. 1.19 18 Example 3: CLT Applied to Binomial RV Normalized Binomial Histogram for z N:100 NS:1000 Normalized Binomial Histogram for z N:1000 NS:1000 Binomial Estimated Gaussian 0.8 −1 Portland State University Example 3: CLT Applied to Binomial RV 0.9 −2 Binomial Estimated Gaussian 0.5 0.7 PDF Estimate PDF Estimate 0.4 0.6 0.5 0.4 0.3 0.2 0.3 0.2 0.1 0.1 0 −4 J. McNames −3 −2 −1 Portland State University 0 ECE 4/557 1 2 3 Decision Theory 0 4 Ver. 1.19 19 J. McNames −3 −2 −1 Portland State University 0 1 ECE 4/557 2 3 Decision Theory 4 Ver. 1.19 20 Example 3: CLT Applied to Binomial RV Example 3: MATLAB Code function [] = CLTBinomial(); Normalized Binomial Histogram for z N:5000 NS:1000 close all; Binomial Estimated Gaussian 0.5 FigureSet(1,’LTX’); NP = [2,5,10,20,30,100,1000,5000]; NS = 1000; % No. Samples p = 0.5; % Probability of 1 for cnt = 1:length(NP), np = NP(cnt); r = binornd(np,p,NS,1); % Random sample of NS sums mu = np*p; s2 = np*p*(1-p); z = (r-mu)/sqrt(s2); figure; FigureSet(1,’LTX’); Histogram(z,0.1,0.20); hold on; x = -5:0.02:5; y1 = 1/sqrt(2*pi).*exp(-x.^2/2); h = plot(x,y1,’r’); x2 = 0:np; y2 = binopdf(x2,np,p); x2 = (x2-mu)/sqrt(s2); %h = goodstem(x2,y2,’g’); %set(h,’LineWidth’,1.5); hold off; %set(gca,’XLim’,[-5 5]); st = sprintf(’Normalized Binomial Histogram for z N:%d title(st); box off; AxisSet(8); h = get(gca,’Children’); PDF Estimate 0.4 0.3 0.2 0.1 0 J. McNames −3 −2 −1 Portland State University 0 ECE 4/557 1 2 Decision Theory % No. histograms 3 Ver. 1.19 21 J. McNames Portland State University NS:%d’,np,NS); ECE 4/557 Decision Theory Ver. 1.19 22 Example 4: CLT Applied to Exponential RV legend([h(2) h(1)],’Binomial Estimated’,’Gaussian’); st = sprintf(’print -depsc CLTBinomial%04d’,np); eval(st); drawnow; %fprintf(’Pausing...\n’); pause; end; Normalized Exponential Sum Histogram for z N:1 NS:1000 1 Exponential Sum Estimated Gaussian 0.9 0.8 PDF Estimate 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 −5 J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 23 J. McNames −4 −3 −2 −1 Portland State University 0 1 ECE 4/557 2 3 Decision Theory 4 5 Ver. 1.19 24 Example 4: CLT Applied to Exponential RV Example 4: CLT Applied to Exponential RV Normalized Exponential Sum Histogram for z N:2 NS:1000 0.6 Normalized Exponential Sum Histogram for z N:5 NS:1000 Exponential Sum Estimated Gaussian Exponential Sum Estimated Gaussian 0.5 0.5 PDF Estimate PDF Estimate 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 −5 J. McNames −4 −3 −2 −1 Portland State University 0 1 ECE 4/557 2 3 4 Decision Theory 0 −5 5 Ver. 1.19 25 J. McNames −3 −2 −1 Portland State University Example 4: CLT Applied to Exponential RV 0 1 2 ECE 4/557 3 4 Decision Theory 5 Ver. 1.19 26 Example 4: CLT Applied to Exponential RV Normalized Exponential Sum Histogram for z N:10 NS:1000 0.6 −4 0.5 Exponential Sum Estimated Gaussian Normalized Exponential Sum Histogram for z N:20 NS:1000 Exponential Sum Estimated Gaussian 0.45 0.4 0.5 PDF Estimate PDF Estimate 0.35 0.4 0.3 0.2 0.3 0.25 0.2 0.15 0.1 0.1 0.05 0 −5 J. McNames −4 −3 −2 −1 Portland State University 0 1 ECE 4/557 2 3 Decision Theory 4 0 −5 5 Ver. 1.19 27 J. McNames −4 −3 −2 Portland State University −1 0 ECE 4/557 1 2 3 Decision Theory 4 5 Ver. 1.19 28 Example 4: CLT Applied to Exponential RV Example 4: CLT Applied to Exponential RV Normalized Exponential Sum Histogram for z N:30 NS:1000 Normalized Exponential Sum Histogram for z N:100 NS:1000 0.5 Exponential Sum Estimated Gaussian 0.5 0.4 PDF Estimate 0.4 PDF Estimate Exponential Sum Estimated Gaussian 0.45 0.3 0.2 0.35 0.3 0.25 0.2 0.15 0.1 0.1 0.05 0 −5 J. McNames −4 −3 −2 −1 Portland State University 0 1 ECE 4/557 2 3 4 Decision Theory 0 −5 5 Ver. 1.19 29 J. McNames Example 4: CLT Applied to Exponential RV Exponential Sum Estimated Gaussian Portland State University 0 1 ECE 4/557 2 3 4 Decision Theory 5 Ver. 1.19 30 Exponential Sum Estimated Gaussian 0.4 PDF Estimate PDF Estimate −1 0.5 0.3 0.2 0.1 J. McNames −2 Normalized Exponential Sum Histogram for z N:5000 NS:1000 0.4 0 −5 −3 Example 4: CLT Applied to Exponential RV Normalized Exponential Sum Histogram for z N:1000 NS:1000 0.5 −4 0.3 0.2 0.1 −4 −3 −2 −1 Portland State University 0 1 ECE 4/557 2 3 Decision Theory 4 0 −5 5 Ver. 1.19 31 J. McNames −4 −3 −2 −1 Portland State University 0 1 ECE 4/557 2 3 Decision Theory 4 5 Ver. 1.19 32 Example 4: MATLAB Code box off; h = get(gca,’Children’); legend([h(2) h(1)],’Exponential Sum Estimated’,’Gaussian’); st = sprintf(’print -depsc CLTExponential%04d’,np); eval(st); drawnow; end; function [] = CLTExponential(); close all; FigureSet(1,’LTX’); NP = [1,2,5,10,20,30,100,1000,5000]; NS = 1000; % No. Samples p = 0.5; % Probability of 1 lambda = 2; % Exponential Parameter % No. histograms for cnt = 1:length(NP), np = NP(cnt); r = exprnd(1/lambda,np,NS); % Random sample of NS sums if np~=1, r = sum(r); end; mu = np/lambda; s2 = np/lambda^2; z = (r-mu)/sqrt(s2); figure; FigureSet(1,’LTX’); Histogram(z,0.1,0.20); hold on; x = -5:0.02:5; y1 = 1/sqrt(2*pi).*exp(-x.^2/2); h = plot(x,y1,’r’); x2 = 0:np; y2 = binopdf(x2,np,p); x2 = (x2-mu)/sqrt(s2); hold off; set(gca,’XLim’,[-5 5]); st = sprintf(’Normalized Exponential Sum Histogram for z title(st); AxisSet(8); J. McNames Portland State University N:%d NS:%d’,np,NS); ECE 4/557 Decision Theory Ver. 1.19 33 J. McNames Approximate Confidence Intervals Yn = Decision Theory Ver. 1.19 34 Thus, we may approximate the probability that Z is within an interquantile range X̄ − μX √ ≤ z1−α/2 = 1 − α Pr −z1−α/2 ≤ σX / n Xi i=1 2 and let μX be the mean of Xi and σX be the variance of Xi . As n → ∞, the distribution of Z= ECE 4/557 Approximate Confidence Intervals Continued 1 Let Yn be the sum of n i.i.d. random variables X1 , X2 , . . . , Xn , n Portland State University Note −z1-α/2 = zα/2 because the normal distribution is symmetric about the mean. X̄ − μX Yn − nμX √ = 2 σX / n nσX This can be rearranged as σX σX =1−α Pr X̄ − z1−α/2 √ ≤ μX ≤ X̄ + z1−α/2 √ n n approaches the standard normal distribution. • σX is seldom known • Usually approximate with usual sample estimate σ̂X = s2X J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 35 J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 36 Approximate Confidence Continued 2 Example 5: Approximate Confidence Intervals A random sample of 32 parts at a manufacturing plant had an average diameter of 3.2 mm and a sample standard deviation of 1.7 mm. What is the approximate 95% confidence interval for the true mean of the part? Hint: norminv(1-0.05/2) = 1.96. zα/2 z1−α/2 • α controls the probability of making a mistake • Recall −z1-α/2 = zα/2 • The probability of a point falling in the red region is α • Called the level of significance • Important concept for Hypothesis testing J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 37 J. McNames Hypothesis Testing Ver. 1.19 38 • There are always two • Usually, the statement that we would like to prove is called the alternative hypothesis or research hypothesis (denoted H1 ) • The statement is called the hypothesis • Negation of the alternative hypothesis is called the null hypothesis (denoted H0 ) • Examples – “Women are more likely than men to have automobile accidents” – “Machine A is more likely to produce faulty parts than Machine B” – “Rabbits can distinguish between red flowers and blue flowers” – “The defendant is guilty” • The test is always biased in favor of the null hypothesis – If data strongly disagree with H0 , we reject H0 – If sample doesn’t conflict with H0 or if there is insufficient data, H0 is not rejected – Failure to reject H0 does not imply H0 is true – Sometimes the phrase “Accept the null hypothesis” is used. Don’t be misled by this • The hypothesis is always a true/false statement • Test does not determine the degree of truth of the statement ECE 4/557 Decision Theory Step 1: State the hypotheses in terms of the population. • Hypothesis testing is the process of inferring from a sample whether a given statement is true Portland State University ECE 4/557 Hypothesis Testing: Step 1 • Hypothesis testing or statistical decision theory is an important part of statistical inference J. McNames Portland State University Decision Theory Ver. 1.19 39 J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 40 Hypothesis Testing: Steps 2–4 Hypothesis Testing Example Step 2: Select a test statistic T . The average lifetime of a sample of 100 integrated circuits (IC) is 523 days with a standard deviation of 97 days. If μ is the mean lifetime of all the IC’s produced by the company, test the hypothesis μ = 500 days against the alternative hypothesis μ = 500 days using levels of significance 0.05 and 0.01. • Want test statistic to take on some values when H0 is true • Want to taken on others when H1 is true • Want to be sensitive indicator of whether the data agree or disagree with H0 Step 1: The hypotheses are stated in terms of the population. H0 : μ = 500 days H1 : μ = 500 days • Test statistic is usually chosen such that its distribution is known (at least approximately) when H0 is true Step 2: Select a test statistic T . Since n = 100, we can assume the CLT applies and let us use as our test statistic X̄ − μ √ T = s/ n By the CLT, we know the distribution is approximately normal. Step 3: Pick the decision rule. • Choose a decision rule in terms of the possible values of T • Usually accompanied by the level of significance α Step 4: Based on the random sample, evaluate T and make a decision J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 41 J. McNames Hypothesis Testing Example Continued ECE 4/557 Decision Theory Ver. 1.19 42 Hypothesis Testing: Definitions Step 3: Pick the decision rule. The necessary quantiles of a normal distribution are given by z0.05/2 = −1.9600 z0.01/2 = −2.5758 Portland State University Upper Tailed Test z1−0.05/2 = 1.9600 z1−0.01/2 = 2.5758 Lower Tailed Test z1−α Step 4: Evaluate T and make decision. Two Tailed Test zα/2 zα z1−α/2 • Critical Region: the set of all points in the sample space that result in the decision to reject H0 X̄ − μ 523 − 500 √ √ = = 2.3711 σ/ n 97/ 100 We reject H0 at the 0.05 significance level. We fail to reject H0 at the 0.01 significance level. T = • Acceptance Region: the set of all points in the sample space not in the critical region • Upper-Tailed Test: H0 is rejected for large values of T • Lower-Tailed Test: H0 is rejected for small values of T • Two-Tailed Test: H0 is rejected for large or small values of T J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 43 J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 44 Hypothesis Testing: More Definitions Hypothesis Testing: Error Table Summary • Type I Error: Error of rejecting H0 when it is true – Denoted α – Sometimes called a false positive Fail to Reject H0 Reject H0 • Type II Error: Error of accepting H0 when it is false – Denoted β – Sometimes called a false negative H0 is True 1 − α (TN) α (type I error, FP) H0 is False β (type II error, FN) 1 − β (power, TP) • The user selects α • Generally, – α = 0.05 interpreted as probably significant – α = 0.01 interpreted as highly significant • Level of Significance: maximum probability of rejecting H0 when it is true (type I error) – Denoted α • α = 0.05: 5 times in 100 we would make a type I error • α = 0.01: 1 time in 100 we would make a type I error • Power: probability of rejecting H0 when it is false. – Denoted 1 − β – This is the probability of not making a type II error. – Sometimes called the true positive rate • Often β is unknown • Null Distribution: the PDF of T when H0 is true. J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 45 Hypothesis Testing: p-Value J. McNames Portland State University Then T =z= X̄ − μX σX √ n = X̄ − μX̄ σX̄ has a distribution that is approximately normal. • Especially large p-values indicate data is consistent with H0 • E[X̄] = E[X] • Usually stated along with the name of the test and the level of significance, α 2 • σX̄ = 1 2 n σX • s2X̄ = 1 2 n sX • If σX̄ is not known, the estimated standard deviation sX̄ can be used n 1 2 • s2X = n−1 i=1 (Xi − X̄) • In two-tailed tests the p-value can be stated as twice the smaller of the one-tailed p values Decision Theory 46 • Xi are i.i.d. • Especially small p-values indicate H0 is strongly rejected ECE 4/557 Ver. 1.19 • Let n be the size of the sample • p-value: – Smallest significance level at which H0 would be rejected for the observed T – Probability that the sample outcome could have been more extreme than the observed one when H0 is true Portland State University Decision Theory Testing Sample Means n • Let T = X̄ = n1 i=1 Xi • Selection of critical region depends only on user’s preference J. McNames ECE 4/557 Ver. 1.19 47 J. McNames 2 ≈ σX̄ Portland State University ECE 4/557 Decision Theory Ver. 1.19 48 Example 6: Sample Means Example 6: Workspace An IC manufacturer knows that their integrated circuits have a mean maximum operating frequency of 500 MHz with a standard deviation of 50 MHz. With a new process, it is claimed that the maximum operating frequency can be increased. To test this claim, a sample of 50 IC’s were tested and it was found that the average maximum operating frequency was 520 MHz. Can we support the claim at the 0.01 significance level? Hint: z1−0.01 = 2.3263 and z0.9977 = 2.8284. J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 49 J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 Testing Proportions Example 7: Testing Proportions • Consider a binary criterion or test that yields either a success or failure. A student in ECE 4/557 class creates an algorithm to predict whether Intel’s stock will increase or decrease. The algorithm is tested on the closing price over a period of 31 days (1 month). The algorithm correctly predicted increases and decreases in 20 of the 31 days. Determine whether the results are significant (better than chance) at the 0.05 and 0.01 significance levels. Hints: z1−0.05/2 = 1.96, z1−0.01/2 = 2.5758, z1−0.05 = 1.6449, z1−0.01 = 2.3263, z0.9470 = 1.6164. • Let T = p̂ where p̂ is the proportion of successes • Let n be the size of the sample • Let p denote the proportion of “true” successes (if the test were applied to the entire population) Then 50 p̂ − p z= p(1−p) n has a distribution that is approximately normal. • σ2 = J. McNames p(1−p) n Portland State University ECE 4/557 Decision Theory Ver. 1.19 51 J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 52 Example 7: Workspace Small Sampling Theory • Test statistics are often chosen as sums of values in a sample so that CLT applies and the normal distribution can be assumed • Only considered valid for large samples (say n > 30) • For small samples, many other tricks can be used • If the values being recorded are known to come from a Gaussian distribution, the t tests can be used J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 53 J. McNames Small Sample Mean Tests T = ECE 4/557 Decision Theory Ver. 1.19 54 Ver. 1.19 56 Small Sample Mean Tests Comments Suppose we have a random sample of n observations X1 , X2 , . . . Xn that is drawn independently from a normal population with mean μ and standard deviation σ. As before, the sample mean and standard deviation are given by n n 1 1 X̄ = Xi sX = (Xi − X̄)2 n i=1 n − 1 i=1 and the estimated standard deviation of X̄ is given by sX̄ = the normalized random variable Portland State University sX √ n • The approach is the same as before • Only difference: different distribution • t distribution is symmetric • MATLAB functions: tinv, tcdf, & tpdf • Confidence intervals for μ: X̄ ± t(1 − α/2; n − 1)sX̄ • For large n, the t distribution is approximately normal Then X̄ − μ sX̄ is distributed as the student’s t distribution with n − 1 degrees of freedom. J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 55 J. McNames Portland State University ECE 4/557 Decision Theory Small Sample Mean Tests Concise Summary T = X̄ − μ sX̄ H0 : μ = μ0 H1 : μ = μ0 If |T | ≤ t(1 − α/2; n − 1), conclude H0 If |T | > t(1 − α/2; n − 1), conclude H1 H0 : μ ≥ μ0 H1 : μ < μ0 If T ≥ t(α; n − 1), conclude H0 If T < t(α; n − 1), conclude H1 H0 : μ ≤ μ0 H1 : μ > μ0 If T ≤ t(1 − α; n − 1), conclude H0 If T > t(1 − α; n − 1), conclude H1 J. McNames Portland State University Example 8: Small Sample Mean Tests ECE 4/557 Decision Theory Students in ECE 4/557 chose 10 pairs of numbers “close to 5.” The mean of the first set of numbers was 4.8837 with a sample standard deviation of 0.3165. The second set had X̄ = 5.1198 and sX = 0.3157. Assuming each set was drawn from a normal distribution, determine whether each set was drawn from a distribution with a mean of 5. Hints: tinv(1-0.05/2,9) = 2.2622, 1-tcdf(abs((4.8837-5)/(0.3165/sqrt(10))),9)=0.1376, 1-tcdf(abs((5.1198-5)/(0.3157/sqrt(10))),9)=0.1304. Ver. 1.19 57 J. McNames Example 9: Small Sample Mean Tests Decision Theory Ver. 1.19 58 Choose between the alternatives H0 :μ ≤ 20 H1 :μ > 20 H0 :μ = 10 H1 :μ = 10 when α is to be controlled at 0.05, n = 13, X̄ = 24 and sX = 5. Hints: tinv(0.95,12) = 1.7823 and 1-tcdf(2.88,12) = 0.0069. Portland State University ECE 4/557 Example 10: Small Sample Mean Tests Choose between the alternatives J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 when α is to be controlled at 0.02, n = 15, X̄ = 14 and sX = 6. Hints: tinv(1-0.02/2,14) = 2.6245 and 2*(1-tcdf(2.582,14)) = 0.0217. 59 J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 60 Comparing Two Population Means Comparing Two Population Means Summary Let there be two normal populations with two means μX and μY and the same standard deviation σ. The means μx and μy are to be compared for the two populations. Define estimators of the two sample means and the common standard deviation as follows nx 1 Xi X̄ = nx i=1 s = 2 nx i=1 (Xi has a t distribution with nx + ny − 2 degrees of freedom Confidence Limits: ny 1 Yi Ȳ = ny i=1 (X̄ − Ȳ ) ± t(1 − α/2; nx + nY − 2)sX̄−Ȳ ny − X̄)2 + i=1 (Yi − Ȳ )2 nx + ny − 2 Hypothesis Tests: H0 : μX = μY If |T | ≤ t(1 − α/2; nX + nY − 2), conclude H0 H1 : μX = μY If |T | > t(1 − α/2; nX + nY − 2), conclude H1 2 can be estimated as The variance of the difference σX̄− Ȳ 1 1 2 2 sX̄−Ȳ = s + nx ny J. McNames Portland State University ECE 4/557 Decision Theory X̄ − Ȳ sX̄−Ȳ T = Ver. 1.19 61 H0 : μX ≥ μY H1 : μX < μY If T ≥ t(α; nX + nY − 2), conclude H0 If T < t(α; nX + nY − 2), conclude H1 H0 : μX ≤ μY H1 : μX > μY If T ≤ t(1 − α; nX + nY − 2), conclude H0 If T > t(1 − α; nX + nY − 2), conclude H1 J. McNames Example 11: Two Population Means Portland State University ECE 4/557 Decision Theory Ver. 1.19 62 Ver. 1.19 64 Example 12: Two Population Means Obtain a 95% confidence interval for μX − μY when X̄ = 14 (Xi − X̄)2 = 105 nX = 10 nY = 20 Ȳ = 8 (Yi − Ȳ )2 = 224 Students in ECE 4/557 chose 10 pairs of numbers “close to 5.” The mean of the first set of numbers was X̄ = 4.8837 with a sample standard deviation of sX = 0.3165. The second set had Ȳ = 5.1198 and sY = 0.3157. Assuming each set was drawn from a normal distribution, determine whether each was drawn from a distribution with the same mean. Hints: tinv(0.05/2,28) = -2.0484. Hint: s = 0.0999, tinv(1-0.05/2,18) = 2.101. J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 63 J. McNames Portland State University ECE 4/557 Decision Theory Example 13: Two Population Means Population Variance Inference Choose between the alternatives When sampling from a normal population and the sample variance is given by s2 , (n − 1)s2 σ2 is distributed as χ2 with n − 1 degrees of freedom. H0 :μ1 = μ2 H1 :μ1 = μ2 with 0.10 level of confidence. Same data as the previous example. • The χ2 is just another distribution n • It is also the distribution of i=1 Xi2 where Xi ∼ N (0, 1) (i.e. Xi s are RV’s drawn from a standard normal distribution) and n is the degrees of freedom Hints: tinv(0.95,28) = 1.07011 and 1-tcdf(4.52,28) = 0.000051. • χ2 is not symmetric • The rules and concepts are the same J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 65 J. McNames Population Variance Inference Summary T = Portland State University ECE 4/557 Decision Theory Ver. 1.19 66 Example 14: Population Variance Inference Obtain a 98% confidence interval for σ 2 . The data consists of 10 points with s = 4. (n − 1)s2 σ02 Hints: chi2inv(0.01,9) = 2.088 and chi2inv(0.99,9) = 21.666. Confidence Limits: (n − 1)s2 (n − 1)s2 2 ≤ ≤ σ χ2 (1 − α/2; n − 1) χ2 (α/2; n − 1) Hypothesis Tests: H0 : σ 2 = σ02 If χ2 (α/2; n − 1) ≤ T ≤ χ2 (1 − α/2; n − 1), H1 : σ 2 = σ02 conclude H0 . Otherwise, conclude H1 H0 : σ 2 ≥ σ02 H1 : σ 2 < σ02 If T ≥ χ2 (α; n − 1), conclude H0 If T < χ2 (α; n − 1), conclude H1 H0 : σ 2 ≤ σ02 H1 : σ 2 > σ02 If T ≤ χ2 (1 − α; n − 1), conclude H0 If T > χ2 (1 − α; n − 1), conclude H1 J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 67 J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 68 Comparing Two Population Variances Comparing Two Population Variances Summary Let independent samples be drawn from two normal populations with 2 and μY and σY2 , respectively. The means and variances μX and σX sample variances are s2X = 1 nX − 1 nX (Xi − X̄)2 s2Y = i=1 Then 1 nY − 1 nY T = Confidence Limits: 1 1 s2 s2X σ2 ≤ X ≤ X 2 2 sY F (1 − α/2; nX − 1, nY − 1) σY s2Y F (α/2; nX − 1, nY − 1) (Yi − Ȳ )2 i=1 Hypothesis Tests: s2X 2 σX s2Y 2 σY is distributed as F (nx − 1, nY − 1). The F test is sensitive to departures from the normality assumption. J. McNames Portland State University ECE 4/557 s2X s2Y Decision Theory Ver. 1.19 69 2 H0 : σX = σY2 2 H1 : σX = σY2 If F (α/2; nX -1, nY -1) ≤ T ≤ F (1 − α/2; nX -1, nY -1), conclude H0 . Otherwise, conclude H1 2 H0 : σX ≥ σY2 2 H1 : σX < σY2 If T ≥ F (α; nX − 1, nY − 1), conclude H0 If T < F (α; nX − 1, nY − 1), conclude H1 2 H0 : σX ≤ σY2 2 H1 : σX > σY2 If T ≤ F (1 − α; nX − 1, nY − 1), conclude H0 If T > F (1 − α; nX − 1, nY − 1), conclude H1 J. McNames Example 15: Two Population Variances Portland State University ECE 4/557 Decision Theory Ver. 1.19 70 Ver. 1.19 72 Example 16: Two Population Variances 2 Obtain a 90% confidence interval for σX /σY2 when the data are Choose between the alternatives nX = 16 nY = 21 2 H0 :σX = σY2 s2X = 54.2 s2Y = 17.8 2 2 H1 :σX = σX with α controlled at 0.02. Hints: finv(0.05,15,20) = 0.4296, finv(0.95,15,20) = 2.2033. nX = 16 nY = 21 s2X s2Y = 17.8 = 54.2 Hints: finv(0.01,15,20) = 0.2966, finv(0.99,15,20) = 3.09. J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 71 J. McNames Portland State University ECE 4/557 Decision Theory Parametric versus Nonparametric Introduction Parametric versus Nonparametric Introduction Continued • After a model for an experiment has been defined, to conduct a test, the probabilities associated with the model must be found • Caught momentum in the late 1930s • This can be very hard • Simple methods to find good approximation of probabilities • Makes few assumptions • Statisticians have often changed the model slightly to solve for the probabilities • Can the find approximate solutions to exact problems • Safer than parametric methods • Change is slight enough to ensure model is realistic • Use when price of making wrong decision is high • Can then find exact solutions to approximate problems • Often require less computation • This is parametric statistics • Most of the theory is fairly simple • Includes tests we have discussed J. McNames Portland State University ECE 4/557 • Often more powerful than parametric methods Decision Theory Ver. 1.19 73 J. McNames Parametric vs. Nonparametric Portland State University ECE 4/557 Decision Theory Ver. 1.19 74 Nonparametric Defined • Parametric methods – depend on knowledge of F (x) – The parameter values may be unknown – How can we be certain F (x) is really what we think it is? – If we are certain, the parametric test is probably the most powerful – All tests discussed so far are parametric – Good for light tails, poor for heavy tails (outliers) A statistical method is nonparametric if it satisfies at least one of the following criteria 1. May be used on nominal data 2. May be used on ordinal data 3. May be used on interval or ratio data where the distribution is unspecified • Nonparametric = distribution free – No assumption about F (x) – Apply to all distributions – Always work well – Usually best with heavy tails (outliers) J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 75 J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 76 More Definitions The Binomial Test ∗ Unbiased Test: test in which the probability of rejecting H0 when H0 is false is always greater than or equal to the probability of rejecting H0 when H0 is true • Goal: Is p the probability of Class 1? • Data: Two classes: Class 1 or Class 2 (not both) • Definitions – n1 = observations in class 1 – n2 = observations in class 2 – n = number of observations Consistent Test: test in which the power approaches 1 as the sample size approaches infinity for some fixed level of significance α>0 Conservative Test: test in which the actual level of significance is smaller than the stated level • Assumptions – The n trials are independent – Each trial has a probability p of generating class 1 Robust Test: test that is approximately valid even if the assumptions behind the method are not true • The t test is robust against normality • Test statistic: n1 • Null Distribution: Binomial with probability p J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 77 J. McNames Example 17: Binomial Test Portland State University ECE 4/557 Decision Theory Ver. 1.19 Example 18: Binomial Test A company estimates that 20% of the die they manufacture are faulty. A new method is developed to reduce the number of faulty die. Out of 29 die, only 3 were faulty. Is it safe to conclude that the new method reduces the number of faulty die at a 0.05 level of significance? What is the p value? A circuit is expected to produce a true output (1) 80% of the time. The technician suspects the circuit is not working. Out of 682 clock cycles, the circuit produced 520 true outputs. With a 0.01 level of significance, can we say the circuit is not working? What is the p value? Hint: binocdf(3,29,0.20) = 0.1404. Hints: binocdf(520,682,0.8) = 0.0091. J. McNames Portland State University ECE 4/557 Decision Theory 78 Ver. 1.19 79 J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 80 Example 19: Binomial Test Example 20: Binomial Test A researcher finds what they believe is a significant leading indicator of the outcome of Trail Blazers’ games. The indicator was used to prospectively generate a correct outcome in 11 out of 15 games. Determine whether the results are significant (better than chance) at the 0.05 significance level. Hints: 1-binocdf(10,15,0.5) = 0.0592, 1-binocdf(11,15,0.5) = 0.0176. A student in ECE 4/557 class creates an algorithm to predict whether Intel’s stock will increase or decrease. The algorithm is tested on the closing price over a period of 31 days (1 month). The algorithm correctly predicted increases and decreases in 20 of the 31 days. Determine whether the results are significant (better than chance) at the 0.05 significance level. Hints: 1-binocdf(19,31,0.5) = 0.0592, 1-binocdf(20,31,0.5) = 0.0354. J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 81 J. McNames Binomial Confidence Intervals on p Portland State University ECE 4/557 Decision Theory Ver. 1.19 82 Example 21: Binomial Confidence Intervals • Same data and assumptions as before • What is the x% confidence interval on p? Twenty students were selected at random to see if they could do nodal analysis in ECE 221. Seven students could successfully complete the problems. What is a 95% confidence interval of p, the proportion of students in the class who could do nodal analysis? • Can be obtained directly from MATLAB’s binofit function Answer: [phat,pci] = binofit(7,20) • p is unknown p̂ = 0.35 and with 95% confidence 0.1539 ≤ p ≤ 0.5922 J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 83 J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 84 The Quantile Test Example 22: Quantile Test • Goal: Is x∗ the p∗ quantile of F (X)? It has been well established that the upper quartile of exam scores at Portland State is 85%. A statistics professor is concerned about grade inflation and randomly selects an exam score from 15 classes. She obtains the following data • Data: x1 , x2 , . . . , xn • Definitions: – n = No. observations – p = stated probability p 72 93 46 • Assumptions: – The data is i.i.d. – The measurement scale is at least ordinal ECE 4/557 87 82 87 82 85 86 Hints: binocdf(8,15,0.75) = 0.0566, binocdf(9,15,0.75) = 0.1484. • Null Distribution: Binomial with probability p Portland State University 91 79 94 Hypotheses H0 : The upper quartile is 85% H1 : The upper quartile is not 85% • Test statistic: T = No. observations ≤ x∗ J. McNames 63 78 60 Decision Theory Ver. 1.19 85 J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 The Sign Test Example 23: Sign Test • Goal: Does one variable in a pair (X, Y ) tend to be larger than the other? A student develops a new method of analyzing DC circuits containing op amps. He wishes to determine whether the new method is faster than the method described in class. He selects 10 problems and measures the time required to solve the problem using each of the methods. He receives the following results • Data: – If X > Y classified as + – If X < Y , classified as − – If X = Y , classified as 0 86 • 8 = No. +’s • 1 = No. −’s • Definitions: – n = No. of untied pairs – p = stated P (X = Y ) • 1 = No. ties Is the new method faster with a 0.05 level of significance? What is the p value? • Assumptions: – The data is i.i.d. – The measurement scale is at least ordinal Hints: 1-binocdf(8,9,0.5)=0.002 ,1-binocdf(7,9,0.5) = 0.0195. • Test statistic: T = No. of +’s • Null Distribution: Binomial with probability J. McNames Portland State University ECE 4/557 1 2 Decision Theory Ver. 1.19 87 J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 88 Other Binomial Tests Other nonparametric tests are available based on the binomial distribution. See me for details. Kolmogorov Test • Goal: Was the data drawn from a distribution with F ∗ (x)? • Data: X1 , X2 , . . . , Xn • Confidence Interval for a Quantile: Given data, find P (X (r) ≤ xp∗ ≤ X (s) ) = 1 − α where r,s, p∗ , and α are specified • Definitions: – n = No. points – S(x) is the empirical distribution function • Tolerance Limits: – How large should n be to ensure with 95% confidence 90% of the data lies between X (r) and X (s) ? – What proportion of the population lies between X (r) and X (s) with 95% confidence? • Assumptions: – The sample is a random sample • Test statistic: – Two sided: T = maxx |F ∗ (x) − S(x)| – Lower: T + = maxx F ∗ (x) − S(x) – Upper: T − = maxx S(x) − F ∗ (x) • McNemar Test for Significance of Changes: All data pairs (X, Y ) are binary. Does P (0, 1) = P (1, 0)? – Let X = state before a treatment – Let Y = state after a treatment – Did the treatment have an effect? J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 • Null Distribution: Ask MATLAB 89 J. McNames Portland State University Example 24: Kolmogorov Test ECE 4/557 Decision Theory Ver. 1.19 90 Example 24: Kolmogorov Test Plot MATLAB was used to generate an exponential random variable. The Kolmogorov-Smirnov Test was used as follows. Exponential Emperical Distribution Function N:25 1 lambda = 2; N = 25; 0.8 CDF = [R F]; [H,P] = kstest(R,CDF,0.05) 0.6 S(x) rand(’state’,5); R = exprnd(1/lambda,N,1); F = 1-exp(-lambda*R); MATLAB returned H = 0 (do not reject null hypothesis) and P = 0.0747 (p value). 0.4 0.2 0 J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 91 0 J. McNames 0.1 0.2 0.3 0.4 Portland State University 0.5 x 0.6 ECE 4/557 0.7 0.8 0.9 Decision Theory 1 Ver. 1.19 92 Smirnov Test Example 25: Smirnov Test • Goal: Was the data (X, Y ) drawn from the same distribution? MATLAB was used to generate 50 points from an exponential distribution and a gaussian distribution. Both had zero mean and unit variance. • Data: (X1 , Y1 ), (X2 , Y2 ), . . . , (Xn , Yn ) • Definitions: – n = No. points – S(x) is the empirical distribution function lambda = 2; N = 50; rand(’state’,5); R1 = exprnd(1/lambda,N,1); R1 = R1 - 1/lambda; R1 = R1*lambda; R2 = randn(N,1); • Assumptions: – The sample is a random sample • Test statistic: – Two sided: T = maxx |SX (x) − SY (x)| – Lower: T + = maxx SX (x) − SY (x) – Upper: T − = maxx SY (x) − SX (x) [H,P] = kstest2(R1,R2) MATLAB returned H = 1 (reject null hypothesis) and P = 0.0089 (p value). • Null Distribution: Ask MATLAB J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 93 J. McNames Example 25: Smirnov Test Distribution Functions Plot Portland State University ECE 4/557 Decision Theory Ver. 1.19 94 Ver. 1.19 96 Example 25: MATLAB Code Exponential and Normal Distribution Function N:50 Exponential Normal 1 S(x) 0.8 0.6 0.4 0.2 0 −2 −1.5 −1 −0.5 0 0.5 1 1.5 x J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 95 J. McNames Portland State University ECE 4/557 Decision Theory Tests for Normality • See lillietest for testing against normal populations • Also popular is the chi-squared goodness of fit test J. McNames Portland State University ECE 4/557 Decision Theory Ver. 1.19 97