* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Stochastic Simulation - University of Kentucky College of Engineering
Bootstrapping (statistics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
Confidence interval wikipedia , lookup
Taylor's law wikipedia , lookup
History of statistics wikipedia , lookup
Inductive probability wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Probability amplitude wikipedia , lookup
Stochastic Simulation Notes by Kevin D. Donohue Department of Electrical Engineering University of Kentucky Lexington, KY 40506 These notes were developed to provide basic understanding, rules, and practical tricks for obtaining reliable answers from computer simulations, particularly for evaluating the performance of communication and signal processing systems. Performance measures examined include integral performance measures such as mean square error, bias, and variance/efficiency, and probability measures such as probability of detection and false alarm. Stochastic Simulation: Purpose: Stochastic simulations can establish relationships between system inputs, outputs, and parameters in a statistically reliable manner. (Simulations are especially helpful when a closed-form relationship cannot be derived.) For signal processing and communication systems, simulations can be useful for: comparing performances of systems under statistically similar conditions optimizing processing and system parameters developing an algorithm through identifying its limitations in a given application . Components of Simulation Design: Problem Definition: Be as quantitative as possible. Identify quantities to be determined. Determine system conditions and inputs affecting these quantities. System Model: Develop a system model suitable for computer implementation that includes the interactions of all critical phenomena whose inputs, outputs, and model parameters are consistent with ones in the defined problem. Experimental Design: Statistically design/analyze the simulation (i.e. ensemble of runs) to determine the reliability of the estimated quantities from a finite number of runs. Implementation: Design programs to implement simulation model. Consider required computing resources (processing time and memory) in developing code so program will finish in a reasonable amount of time. Example of Monte Carlo Simulation: The classic example of a Monte Carlo simulation is evaluating a multiple integral. Obviously, the best place to use it is where the dimensions are so high that a direct iterative technique will take too many years to compute and a closed form solution cannot be found. However, for the sake of illustration and comparison, the following example can be solved quite easily by all 3 methods (closed-form solution, iteration, and Monte Carlo). Problem Definition: Find the probability that 2 identical and independent Rayleigh distributed signal values, x1 and x2 are bounded such that 0 x12 x2 2 r 2 System Model: Express probability in terms of Rayleigh distribution and limits indicated in problem statement. 2 2 r r 2 x 22 0 0 Pr [0 x1 x2 r ] 2 x 2 x x 2 x2 exp 2 1 exp 1 dx1 dx2 2b b 2b b Solution 1 (Exact Solution): Evaluate integral in terms of b and r. Pr [0 x1 2 x2 2 r2 2 r2 2b 2b exp r exp 2b 2b 2 r ] 2b Solution 2 (Numerical Solution): Illustration of surface representing the integrand for b=1. Develop iterative program to break the double integral down into smaller volumes, using 1 and 2 increments along the x1 and x2 axes, respectively: r 2 r 2 (n2 2 ) 2 n2 0 n1 0 Pr [0 x1 2 x2 2 r 2 ] 1 ( n1 1 ) 2 n11 e 2b b (n2 2 ) 2 n2 2 e 2b 1 b 2 The evaluation of the integrand on a finer the resolution grid results in a more accurate the solution (provided there are no singularities over the region of integration); however, the number of iterations increases as a function of grid resolution according to: r2 r r r2 2 Number of iterations 2 1 2 1 2 2 Solution 3 (Stochastic or Monte Carlo Simulation): The integration problem can be reformulated as follows. The volume under the integrand's surface equals its mean value times the area of the integration region. (This is true! Think about it for a while, if it is not obvious. You may want to think in terms of some simple one-dimensional examples first, or recall the mean-value theorem from Calculus). So now the problem can be reformulated in terms of estimating the mean based on a finite number of samples taken from an infinite (continuous surface) population. In some sense, this is done with numerical approach where the sampling strategy is a uniform grid over the region of integration. For the Monte Carlo approach however, points are chosen at random in the integration region (random sampling). Thus, a critical simulation issue relates to the design of the random experiment (i.e. how reliable is the answer for the given number of samples taken?). This will be addressed later, for now let's compare the 3 solutions for accuracy and efficiency. Solve Problem for r = 1 and b=1. Exact solution: Evaluate closed from expression to obtain 0.09020401043105 Numerical solution: Matlab Code: % The following code numerically evaluates the double integral of: % x1*x2*exp(-((x1)^2+(x2)^2)/2) over the range 0 < x1^2 + x2^2 < r^2 % It used different increments on the evaluation grid to compare accuracy % Loop through exponentially decreasing increments for evaluation grid for k=1:3 d1 = .1^k; % Increment along x1 d2 = .1^k; % Increment along x2 r = 1; % Bound on integration region true_value = (2*b-2*b*exp(-r^2/2)-r^2*exp(-r^2/(2*b)))/(2*b); % Closed-form solution probability_value(k) = 0; % Clear value to accumulate % summation of volumes % Step through x2 axis until limit reached n2=0; % Initialize increment on x2 axis while (n2 <= floor(r/d2)) % Step through x1 axis and evaluate function and accumulate % function values until limit reached n1=0; % Initialize increment on x1 axis while (n1 <= floor((sqrt(r^2-(n2*d2)^2))/d1)) function_value = (n1*d1)*(n2*d2)*exp(-((n1*d1)^2+(n2*d2)^2)/2); probability_value(k) = probability_value(k) + function_value; n1=n1+1; end n2=n2+1; end % Scale by incremental area probability_value(k) = probability_value(k)*d1*d2 end % Compute error normalized by the true value nme = (probability_value-true_value)/true_value; Results: Increment Number of Iterations Result Percent Error 10-1 10-2 70.7 7.07103 0.08723514188359 0.09021074482542 3.29 7.510-3 10-3 7.07105 0.09020434991276 3.710-4 Stochastic Solution: Matlab Code % The following code stochastically evaluates the double integral of % x1*x2*exp(-((x1)^2+(x2)^2)/2) over the range 0 < x1^2 + x2^2 < r^2. % It generates 10^1, 10^2, … 10^5 random points over the function domain % to compute the integral for "ntrials" times. The mean and variance of % the estimate errors are computed to examine the performance/convergence % properties of this estimator. r=1; % Radius limit b = 1; % distribution parameter ntrials=100; % Number of times for estimate the area for a fixed number of points true_value = (2*b-2*b*exp(-r^2/2)-r^2*exp(-r^2/(2*b)))/(2*b); % Closed-form solution % Loop to increase the number of random points used to determine sample mean for kn = 1:5 % Loop to try stochastic integration many times to statistically access error for k=1:ntrials; n=round((4/pi)*10^kn); % Number of random points over r by r region x2=r*rand(1,n); % Generate random numbers over rectangular region x1 = r*rand(1,n); % Generate random numbers over rectangular region % Find those values that fall in the region of integration (sector) at = find((x1.^2 + x2.^2)<= r^2); % Evaluate function at those points psamp=x1(at).*x2(at).*exp(-(x1(at).^2+x2(at).^2)/2); % Find mean value and scale to estimate volume probability_value(kn,k)=mean(psamp)*(pi/4)*r^2; end end % Compute Root Mean Square Error and normalize by true value. rmspe = sqrt(mean((probability_value - true_value),2).^2)/true_value; % mean error % Compute mean value of all the area estimates me = mean(probability_value,2); % Computer the plus/minus 95% confidence limit for the mean values given above se = 2*std(probability_value,0,2)/sqrt(ntrials); Results: Number of random 102 103 105 samples Mean Estimate from 0.09036053355664 0.09022315422829 0.09019620490268 100 Trials 95% Confidence 0.0013512814137 0.0004370462649 0.0000387358233 Interval Root Mean Square 1.735 10-3 2.12210-4 8.65310-5 Percent Error over 100 trails Monte Carlo simulation the description of the error is more involved. Probabilities and confidence intervals describe the nature of the error. The error given in the table indicates that we are 95% confident of the true value being within the specified interval around the mean value (the assumption here is the that the error is Gaussian). Monte-Carlo Integration – Statistical Analysis: Many statistical simulations compute an expected value, given by: E[ g ( x)] g ( x) f ( x)dx g(x) may be unknown or known, but intractable or costly to compute fX(x) is the probability density function of x The expected value for many common distributions can be estimated from: N E[ g ( x)] gˆ ( x) å g ( xi ) i 1 N where gˆ ( x) is referred to as the sample mean. The estimate can be shown to be unbiased by taking the expected valve of the estimator: N E[ g ( xi )] E ( gˆ ( x)) i 1 N N g ( x) f ( x)dx N E[ gˆ ( x)] unbiased Estimator efficiency and consistency can be observed through its variance: 2 N g ( xi ) var[ gˆ ( x)] E[ gˆ ( x) E[ g ( x)]2 ] E i 1 g ( x) f X ( x)dx N 2 N N g ( xi ) g ( xi ) 2 i 1 E i 1 2 g ( x ) f ( x ) dx g ( x ) f ( x ) dx X X N N N N 2 2 2 E g ( xi ) g (xk ) 2 g ( x) f X ( x)dx g ( x) f X ( x)dx N i 1 k 1 1 N N N 2 2 2 E g ( xi ) g ( xi ) g (xk ) g ( x) f ( x)dx N i 1 i 1 k 1 1 N2 N 2 2 1 2 g ( x ) f ( x ) dx g ( x ) f ( x ) dx g ( x ) f ( x ) dx i X X X N2 N 2 1 1 2 g ( x ) f ( x ) dx g ( x ) f ( x ) dx i X i X N N 1 g ( x) E[ g ( x)]2 f X ( x)dx N Therefore, the variance of the estimate is a function of N and the variance of function g(x) is reduced by a factor of N: 1 var(gˆ ( x)) var(g ( x)) N 1 The standard error (or standard deviation) decreases proportionally to N To determine N beforehand for a specified level of precision, var(g(x)) must be known. However, this information is rarely available, so initial estimates of this value have to be obtained. Once a reasonable estimate of var(g(x)) is obtained, the precision expressed in confidence intervals is given by: var g ( x) gˆ ( x) k N 1 2 True value can be anywhere in interval about gˆ ( x) with probability 1 . 1 var( g ( x)) 2 N where if gˆ ( x) is assumed Gaussian, is computed from: 1 2 var( g ( x )) var( g ( x )) ˆ 2 N g ( x ), dt 2 erfc k t N N 1 var( g ( x )) 2 gˆ ( x ) k N Note that if f X (x) is a uniform distribution over the region of integration, the expected value integral for g(x) becomes: 1 for x A f ( x ) where g ( x ) f ( x ) dx D( A ) X X 0 A elsewhere 1 g ( x ) dx A D( A) Multiply above integral by size of region D(A) to obtain: A g ( x)dx Example: Find the mean-square error (MSE) between the limit spectrum and the estimated spectrum magnitude of transfer function H(f) from the system output driven by white noise (assume H(f) represents a linear time invariant system). Plot error as a function of frequency resolution for a fixed data length. Simulation design questions: What level of precision is required in the result? What model(s) should be used for H(f)? What length should the data segments be for performing the spectrum estimate? What estimators should be used? What should vary over each run? How many runs should be made for a given frequency point? Precision is typically addressed through confidence intervals. Confidence Interval Estimation A confidence interval is a random interval whose end points and are functions of the observed random variables such that the probability of the inequality is the estimate is satisfied to some predetermined probability (1 ): Pr[ ] 1 Confidence level A useful distribution for mean estimates from a finite number of samples is the Student's t Distribution. Student’s t Distribution 1 N Let x1 , x2, x3 ,...xN be N( , ) with sample mean x xi and sample variance N i 1 x 1 N 2 with degrees of freedom N 1 has the S2 ( xi x ) . Then t v N 1 i 1 S/ N student’s t distribution Let t / 2, and t1 / 2, denote values of the distribution with probability density function areas to the left of these values being 100% and 1 100% , respectively. 2 2 Student`s t distribution =5 0.4 Area = 1 - 0.35 0.3 0.25 Area = /2 Area = /2 0.2 0.15 0.1 0.05 0 -5 -4 -3 -2 -1 0 X t/2, By symmetry t / 2, t1 / 2, Note for =.05, area between t values is 0.95. 1 2 3 4 t(1-)/2, 5 Therefore, the probability that the true mean is in a neighborhood of the sample mean is given by: S S Pr x (t1 / 2, ) u x (t1 / 2, ) N N Comparison of .95 = 1- points for t and Gaussian distributions. % Generate Degree of Freedom axis new = [3:500]; % Plot number of standard deviations requared to achieve 95% critical points for t statistics and normal statistics plot(new,tinv(.975,new),'b',new,norminv(.975,0,1)*ones(size(new)),'r--') title('Standard deviations for 95% confidence interval (Blue/Solid- t distribution, Red/Dashed- Normal)') xlabel('Degrees of freedom') ylabel('Number of standard deviations') axis([0 500 1 3]) Determination of Sample Size and Stopping Rules The level of confidence (precision) desired for the output measure ultimately determines the number of runs and sample size. To determine output variable mean x to within units at confidence level 1- , use the following formula: Pr| x | 1 where (t1 / 2, N 1 ) N (t1 / 2, N 1 ) 2 S , therefore N S2 2 (t1 / 2, N 1 ) can be obtained from the student’s t-distribution table. S is the variance of the integrand and can be approximated from preliminary sample runs while observing the convergence for increasing N as shown in the next example. Example: Determine the performance of a PSD-based peak detector that estimates the center frequency of band-pass channel from measurement taken at the output of a channel driven by white noise. The center frequencies for this channel has a lower bound of 100 Hz and an upper bounded of 1000 Hz. The bandwidth of each channel is 20%. The error is defined as the actual center frequency minus the estimate (i.e. f c fˆc ) Simulation design questions: What level of precision is required in the result? The precision level is application dependent. Since problem does not say, we can make it a reasonable number. So let there be good precision in the first 2 significant digits. Therefore, set the 95% confidence limits to 0.5% of the smallest center frequency that may occur. In this case it is (100Hz*0.005) = 0.5 What model(s) should be used for the channel? Since not specified, we will use an IIR filter (Butterworth filter) driven with white noise (one of my favorites). How long should the data segments be for estimating the spectrum? The length of the data segment may be related to physical constraints (i.e. lack of stationarity, hardware, throughput demands, resolution requirements, …). For this problem assume for smallest bandwidth, 20 Hz (at 100Hz center frequency), we want at least 5 independent points (4 Hz resolution), so segment length must be at least (1/4Hz) 0.25 seconds. Note a stochasitc precision/resolution better than 4 Hz will not provide any benefit (Why?). Sufficient averaging for the PSD estimation, again, depends on the application and SNR. Let's set an arbitrary limit of 10 independent segments. So 2.5 seconds of data must be collected per estimate. With an effective maximum frequency of about 1000 + 3*.2*1000 = 1600Hz, let's sample at 4000 Hz. What estimators should be used? Since nothing is specified, let's use a standard time-hopping window approach (Welch's method), which Matlab uses in its PSD function. What should vary over each run? For fixed channel characteristic, white noise sequences should be independently generated for each run (measurement). How many runs should be made for a given frequency point? Consider the performance measure to be computed: 2 1T ˆ MSE = f c (t ) f c (t ) dt T0 1T ˆ E f c (t ) f c (t ) dt (Bias) T0 1T ˆ f f c (t )dt T0 2f 2 1T ˆ f c (t ) f dt (estimator variance - efficiency) T0 The precision was originally described for f , so a variance estimate for this value is needed before the relationship between precision and number of independent runs can be determined. So, preliminary runs can be generated to get an idea of the variance magnitude. The worst case will be the broadest bandwidth at 1000 Hz. Note the other error measures are also dependent on this number. If we want similar precision on all the above quantities, the variance of all the integrand values should be examined. The one that would likely have the highest variance is either f2 or f Code example: % % % % % This creates an IIR filter to model a band-pass channel, excite it with white noise, estimate the PSD from the output, and estimate the center frequency of the channel from the maximum peak position on the psd. This will happen "runs" number of times and the variance of the estimate will be plot as a function of the cumulative number of runs. runs = 200; % Number of simulation runs (independent estimates) fs = 4000; % Sampling frequency in Hz fc = 1000; % Center frequency in Hz bw = 0.2*fc; % Bandwidth is 20% of center frequency fu = -(-bw-sqrt(bw^2+4*fc^2))/2; % Find upper frequency limit fl = fu-bw; % Find lower frequency limit [b,a] = butter(2, 2*[fl fu]/fs); % Create Butterworth filter siglength = 2.5; % Length of signal from which to do the estimation psdlen = round(.25*fs); % Segment length for time-hopping psd window t = [0:round(2.5*fs)]/fs; % Time axis % Round up length to next power of 2. nfft = 2; while nfft < psdlen; nfft=nfft*2; end % Loop to increase number of trails incrementally and observe variability % in estimate for increasing trail ncount = 0; % Initialize counter for each increment in trial size for n=5:5:runs n % Loop to excite channel, estimate and store results for n trials for k = 1:n sig = filter(b,a,randn(1,length(t))); % Create Channel output [p,f] = psd(sig,nfft,fs,hamming(psdlen),floor(psdlen)/2); % Compute PSD np = find(max(abs(p)) == abs(p)); % Find maximum peak fest(k) = f(np); % Assign its corresponding frequency to the estimate end % Estimate the standard deviation of the output for each pass in loop % to examine the likelihood of convergence and a reasonable variance ncount = ncount+1; % Increment index to store standard deviation results festvar(ncount) = std(fest(1:n)); % Standard deviation of actual estimate fvarvar(ncount) = std((fest(1:n)-mean(fest(1:n))).^2);%Standard deviation of variance integrand end % Plot results, if variability significantly reduces as the trials increase, a reasonable % standard deviation for the integrand can be obtained figure(1) plot(runs*[1:length(festvar)]/length(festvar),festvar) xlabel('Number of runs') ylabel('standard deviation') title('Convergence of standard deviation of the estimate') figure(2) plot(runs*[1:length(fvarvar)]/length(fvarvar),fvarvar) xlabel('Number of runs') ylabel('standard deviation') title('Convergence of standard deviation of the estimate`s variance') Round the standard deviation of the center frequency estimate up to 40 (actually looks like 34 on the graph) and predict the required number of runs: N (t1 / 2, N 1 ) 2 S2 2 t(N ) 2 40 2 .52 t ( N ) 2 6400 or N t(N ) 6400 For =.05, plot t(N) and (N/6400)0.5 as a function of N: The about graph sugests that about 25000 runs will be within the confidence limit requirements. The same analysis can be done for the variance estimate. If variance magnitude is on the order of 1000, then a confidence interval of 50 would provide precision in the first 2 digits. If variance is rounded up to 1400 (actually around 1150), the predicted number of runs required is: N (t1 / 2, N 1 ) 2 S2 2 t(N ) 2 1400 2 50 2 t ( N ) 2 784 or N t(N ) 784 By comparing equations, it can be infered that satisfying the estimate of the center frequency will exceed the requirements for the variance in this case. So now we can run the simulation to determine bias, variance (for efficiency studies and computing confidence limits), and MSE for an overall error value. Code Example: % % % % % % This will create an IIR filter to model a band-pass channel, excite it with white noise, estimate the PSD from the output, and estimate the center frequency of the channel from the maximum peak position on the psd. This will happen "runs" number of times. The mean value of the estimate for each center frequency is computed along with the bias, variance, and mean square error. These values are saved to a mat file when finished runs = 25000; % Number of simulation runs (independent estimates) fs = 4000; % Sampling frequency in Hz fca = [100:100:1000]; % Center frequencies in Hz siglength = 2.5; % Length of signal from which to do the estimation psdlen = round(.25*fs); % Segment length for time-hopping psd window % raise to next power of 2. nfft = 2; while nfft < psdlen; nfft=nfft*2; end t = [0:round(2.5*fs)]/fs; % Time axis % This loop updates the center frequency value does estimation for kf= 1:length(fca) fc = fca(kf) % Assign center frequency bw = 0.2*fc; % Band limit is 20% of center frequency fu = -(-bw-sqrt(bw^2+4*fc^2))/2; % Find upper frequency limit fl = fu-bw; % Find lower frequency limit [b,a] = butter(2, 2*[fl fu]/fs); % Create butterworth filter % Loop to excite channel, do and store estimation for k = 1:runs sig = filter(b,a,randn(1,length(t))); % Create Channel output [p,f] = psd(sig,nfft,fs,hamming(psdlen),floor(psdlen)/2); % Compute PSD np = find(max(abs(p)) == abs(p)); % Find maximum peak fest(k) = f(np); % Assign its corresponding frequency to the estimate end fce(kf) = mean(fest); % Actual frequency estimate fcsde(kf) = std(fest); % Standard devation of the estimate bias(kf) = fce(kf)-fc; % Bias of the estimate msetot(kf) = mean((fest-fc).^2); % Mean square error of the estimate end save sim2run.mat fce fcsde bias msetot % Save results to a file Results: Center Frequency Estimate Bias 100 200 300 400 500 600 700 800 900 1000 100.25 200.38 300.71 400.75 501.05 601.84 702.22 802.93 903.88 1005.1 0.2450 0.3756 0.7128 0.7470 1.0464 1.8431 2.2159 2.9262 3.8839 5.1055 Standard Deviation of the Estimate 3.6882 7.2622 10.715 14.050 17.304 20.603 23.885 27.059 30.323 33.432 Conclusions on what these results mean ??? 95% Confidence Interval 0.0457 0.0900 0.1328 0.1741 0.2145 0.2554 0.2961 0.3354 0.3759 0.4144 RMSE 3.6963 7.2718 10.739 14.069 17.335 20.685 23.987 27.216 30.57 33.819 Using Probability of Error as on Performance Measure: Assume 2 phenomena are present in your measurement space: Example: Noise – processes not of interest; Signal – process of interest Example: Symbol 1 - binary 0; Symbol 2 - binary 1 Let the pdf of noise measurements be f (x | H0 ) and the pdf of signal measurements be f ( x | H1 ) 0.2 f(x|H0) 0.15 0.1 Area=Probability of False Alarm = Pfa 0.05 0 0 5 10 15 x 20 25 30 25 30 0.1 Total probability of error = p fa (1 ) pmd f(x|H1) Area= Probability of Missed Detection = Pmd 0.05 where = Pr(H0) 0 For noise and signal being present an equal amount of time = 0.5 0 5 Detection Threshold 10 15 x 20 To determine p fa and pmd through simulation: 1. Generate noise only output i.e. N samples. 2. Apply threshold to data and let K be the number of times the noise crosses the threshold K 3. p fa N Do an analogous procedure for pmd with signal present p md N K number of times it didn' t cross threshhold N The simulation design question is – How many independent samples are needed to get a reliable estimate? A threshold applied to random population results in a binary population modeled by the binomial distribution. N k Pr[ x k ] p (1 p) n k p probabilit y of success or crossing threshld K Apply the rules for deriving the maximum likelihood estimator to estimate p to obtain: K pmle N The variance of this estimate is: p(1 p) VAR pmle N Therefore choose N such that standard deviation is less than c*100% of the estimated p value. c=.1 will typically provide precision in the most significant digit: p(1 p) (c ) p N or 102 d (1 p) N p where 10(-d)=c. d will be approximately equal to the number of significant digits allowed by the precision of the simulation. For d=2, plot the relationship between N and p. Log-Log Plot of Required Number of Run d=2 9 10 8 10 Independent Runs 7 10 6 10 5 10 4 10 3 10 -10 10 -8 10 -6 -4 10 10 Expected Probability -2 10 0 10 Note that for estimating small probabilities, N can be quite large. Twice the number of significant digits is effectively the reciprocal of the slope of the plot on a log-log scale. Small probabilities (on the order of 10-10) are typical for false alarm rates in radar or bit error rate in digital communication systems. Example: Consider a radar system where a series of independent target and clutter (echoes from nontargets) returns are received for N (N=8) samples. Assume a Rayleigh distribution (b=1) for the clutter and a Rician distribution for the target with 2 degrees of freedom and increasing power based on its centrality parameter (in Matlab the Rician distribution is called the non-central chi-squared distribution). For a threshold for a fixed false alarm rate of 0.001, compute the probability of detection for increasing power ratios between target and non-target cases starting with 9 dB up to 15 dB. Since the specification for FA probability is given in terms of one significant digit, let's use that for the desired precision on the probability of detection results. Use the following code to determine the threshold for the fixed FA probability: % % This code will generate Rayleigh random variables for a fixed power level and estimate a false alarm threshold. % % This code will tend to run fast in Matlab if enough fast memory is available. If not, then the problem should be broken in to loops. p = 10E-3; % Required false alarm probability pt = p/10; % We will compute this probability for testing threshold sensitivity runs = ceil(10^2*(1-pt)/pt); % Number of simulation runs (independent estimates) b = 1; % Scale parameter in Rayleigh distribution n = 8; % number of returns to average before testing with threshold value. val = raylrnd(1,runs,n); test_stat = mean(val,2); test_stat = sort(test_stat); k = round(runs*(1-p)); threshr = test_stat(k); % % % % % % Generate random numbers Compute mean over each run Sort number for easy ratio computation (k/runs) Apply probability to find position of threshold in array Actual Threshold % Check on threshold sensitivity kl = round(runs*(1-p*10)); % check on threshold for 1 order of magnitude less ku = round(runs*(1-p/10)); % check on threshold for 1 order of magnitude greater % Look for changes in the significant digits to make sure you don't % truncate the threshold too early. tu = test_stat(ku) tl = test_stat(kl) The results: tu = 2.0477, tl = 1.5561, threshr = 1.8344 (desired threshold) Note there is a clear distinction between the first 2 significant digits for the probabilities and orders of magnitude changes. If these were the too close, the threshold would not be sensitive enough discriminate probabilities between p/10 and 10*p. More runs would be needed to improve this. Now using the above threshold, generate Rician distributed parameters according to specified power ratios, perform the N sequence average to get the test statistics, and compute the probability of a miss for each case. (Why would this be more convenient or better than computing the probability of detection directly?) The power in the Rayleigh Distribution for b= 1 volt is 2 watts. The power in the Rician distribution is given in terms of its degrees of freedom and centrality parameter: W ( )2 2( 2 ) where is the centrality parameter and denotes the degrees of freedom. Matlab Code: % % % This code will generate Rician random variables for various power levels and compute the probability of that the values will not exceed the given threshold. p = .01; % Expected probability of error in the best case runs = ceil(10^2*(1-p)/p); % Number of simulation runs (independent estimates) new = 2; % degrees of freedom plevdb = [9 12 15]; % Power dB for each distribution plev = 2*10.^(plevdb/10); % Actual power for the distribution t = 1.83436436061758; % False alarm threshold n = 8; % number of returns to average before testing the threshold value. for kf= 1:length(plev) % compute centrality parameter based required power delt = (-(4+2*new)+sqrt((4+2*new)^2-4*(new^2+2*new-plev(kf))))/(2) % generate RV's val = zeros(runs,1); for kk=1:n val = val + ncx2rnd(new, delt, runs, 1); end miss = find(val/n < t); pm(kf) = length(miss)/runs; end Results: SNR (dB) 9 Pr(Missed Detection) 0.13222222222222 Pr(Detection) 0.86777777777778 12 0.00898989898990 0.99101010101010 15 0.00050505050505 0.99949494949495 A probability value of p=.01 (runs= 9900) was chosen as the expected probability result. What would be you conclusions concerning the results above? Note the standard deviation, based on the estimated Pr(Miss) is given by: p(1 p) STD pmle N And for large N, the binomial distribution can be approximated by a Gaussian distribution, thus the 95% confidence limits are (recall for Gaussian distributions that is 1.96 ~ 2 times the standard deviation): SNR (dB) Pr(Missed Detection) 95% confidence limits 9 12 15 0.13222222222222 0.00898989898990 0.00050505050505 0.006808 0.001897 0.000451 In this case p is the probability of missed detection. How would the confidence limits change if the probability of detection was used?