Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Dana Joaquin ECE 3522: Stochastics Department of Electrical and Computer Engineering, Temple University, Philadelphia, PA 19121 I. PROBLEM STATEMENT The point of this assignment is to see how the central limit theorem works and its application to compute and display the PDF of multiple random variables. The central limit theorem will be checked by comparing its result with a Gaussian distribution using root-mean squared error. In addition, this assignment will also explore the concept of the Box-Muller Transform. II. APPROACH AND RESULTS The assignment required for 100 random variables all with a length of 10,000, and these random variables had to be uniformly distributed from [-1 1]. This was done using a for loop, looping 100 times to generate our random variables, and coding a custom MATLAB function to create a random variable that contained values within the specified range. The custom function that was made is called “updf” and it implements the random number generator for a uniform distribution from the mathworks website: 𝑟 = 𝑎 + (𝑏 − 𝑎).∗ 𝑟𝑎𝑛𝑑(𝑁, 1) Equation 1. Random Number Generator within Specific Interval The sum of each PDF for each sample size n was computed using MATLAB’s sum function. In addition, the mean, variance and standard deviation for each n was computed as well. A histogram plot of the sums of random variables was plotted using MATLAB’s histogram function. Next, each sample size sum’s mean and variance were stored in, a Gaussian fit distribution for n=1, 10, and 100 were plotted on top of the actual histogram for comparison, which is shown in figure 1. As n increases, the Gaussian distributions starts to fit more with the histogram plot, which makes sense since more samples are being calculated and the spread of the data gets larger. Figure 1b. provides a better view of how the sum of random variables, Sn, looks likes. Figure 1a. Histogram plot of the sum of random variables and a Gaussian distribution for when n=1, 10, and 100. Figure 1b. Histogram plot of just the sum, Sn’s PDF without the Gaussian distributions in Figure 1a. Next, the root mean-square (RMS) error is plotted with respect to n, displayed in Figure 2. What this plot is saying is that as the sample size/amount of random variables increase, the error between the PDF of Sn and a Gaussian distribution decreases. This makes sense due to the central limit theorem; as the number of random variables increase, the more the sum’s PDF look like a Gaussian one. Figure 2. Root Mean-Squared Error with respect to the number of samples being computed. For the Box-Muller technique, it takes uniformly distributed random variables and changes them to independent random variables with a standard normal distribution. The algorithm to do this technique were taken from the BoxMuller Technique page off of Wikipedia. The equations used are: 𝑧0 = √−2𝑙𝑛𝑈1 cos(2𝜋𝑈2 ) 𝑧1 = √−2𝑙𝑛𝑈1 sin(2𝜋𝑈2 ) Z0 and Z1 are the independent random variables that have standard normal distributions. Since they are independent random variables, they can be multiplied together to get their joint PDF. Figure 3 displays the joint PDF of the computed independent random variables. A custom function was coded, called “boxmuller” to do it. Lastly, the Box-Muller technique was timed using the tic and toc functions in MATLAB and the random variables sum technique was timed as well with the same functions. Table I displays the time results for both of the methods. Based on the table, it shows that the Box-Muller technique is the faster algorithm by about 0.1 of a second. Table I. Amount of Time for Each Method Method Time Elapse Box-Muller Technique 0.174884 seconds Sum of Uniformly Distributed Random Variables 0.070228 seconds III. MATLAB CODE % CA 8 % Dana Joaquin function ca08 clear; clc close all % parameters n = 100; N = 1e4; minnie = -1; maxie = 1; % % % % # of RVs samples in each RV start of interval end of interval for i=1:n ss = updf(N, minnie, maxie); % get random variable N-long vector sssum(i) = sum(ss); % get sum of random variable vector smean(i) = mean(ss)*i; % estimated mean svar(i) = var(ss)*i; % esimated variance sstdev(i) = sqrt(svar(i)); % estimated standard deviation end % plot Sn PMF hlim = length(sssum)/2; figure(1) hs = histogram(sssum, 'Normalization', 'pdf') oh_so_edgy = -hlim:1:hlim; hs.BinEdges = oh_so_edgy; sdata = hs.Values; % plot n = 1, 10, 100 Gaussian Dist. for i=1:3 a = [1 10 100]; hold on x = linspace(-hlim, hlim, length(sdata)); norm = normpdf(x,smean(a(i)),sstdev(a(i))); plot(x, norm) end legend('Histogram', 'n = 1', 'n=10', 'n=100') title('Sum of Random Variables PDF') xlabel('Random Variables') ylabel('Probability') % computing root mean-squared error for i=1:n x = linspace(-hlim, hlim, length(sdata)); norm = normpdf(x,smean(i),sstdev(i)); rmserr(i) = compute_rmse(sdata, norm, n); end % plotting RMSE with respect to # random variables figure(2) plot(1:n, rmserr) title('Root Mean Square Error') xlabel('Samples (n)') ylabel('Error') %Box-Muller Function ss = 1e4; rvs = 10; figure(3) boxmuller(ss, rvs); title('Box-Muller PDF') xlabel('Samples') ylabel('Probability') end % function name: compute_rmse % coded by Dana Joaquin % input argument(s): % (1) sig_a: signal data (array) % (2) sig_b: signal data (array) % (3) length_vect: length of signal data (array) % output argument(s): % (1) rmse: scalar value of root mean-squared error % objective: % This function computes the root mean-squared value of two % data arrays brought in by sig_a and sig_b. % Calculates the root mean-squared value by using the formula: % % n : # of samples (both sig_a and sig_b have to be same size) % Y-hat : predicted values (sig_a is true val. arrays) % Y : true values (sig_b is predicted val. arrays) % % __n__ % MSE = 1 \ _ 2 % --- / (Y - Y) % n ----i i % i = 1 % % RMSE = sqrt(MSE) % function rmse_val = compute_rmse(sig_a, sig_b, sig_length) n = sig_length; for i=1:n data_array(i) = (sig_b(i) - sig_a(i))^2; end rmse = sqrt((1/n)*(sum(data_array))); rmse_val = rmse; end % function name: updf % coded by Dana Joaquin (equation taken from MathWorks website though) % input argument(s): % (1) samples: amt of random # desired % (2) minum: min. range value % (3) maxum: max range value % output argument(s): % (1) unipdf: a vector of uniformly distributed random variables % within the range of minum < x < maxum % objective: % This function creates a vector of uniformly distributed random % variables. % function unipdf = updf(samples, minum, maxum) unipdf = minum + (maxum-minum).*rand(samples,1); end % % % % % % % % % % function name: boxmuller coded by Dana Joaquin input argument(s): (1) samples: length of random variable vecto (2) rvs: number of random variables output argument(s): (1) histogram plot of the data objective: This function computes and plots the Gaussian distribution of two independent random variables using the Box-Muller % distribution. The equations used are taken from % the wikipedia page about the Box-Muller. % http://en.wikipedia.org/wiki/Box%E2%80%93Muller_transfor function boxmul = boxmuller(samples, rvs) for i=1:rvs % 2 uniformly distributed random variables x = rand(samples, 1); y = rand(samples, 1); a(:, i) = sqrt(-2.*log(x)); b(:, i) = 2*pi.*y; % computing 2 independent random variables z0(:, i) = a(:, i).*cos(b(:, i)); z1(:, i) = a(:, i).*sin(b(:, i)); % computing joint pdf zz(:,i) = rvs.*(z0(:,i).*z1(:,i)); end % plotting pdf histogram(zz, 'Normalization', 'pdf') end IV. CONCLUSIONS The central limit theorem basically says that the more random variables are summed up together, their sum’s distribution will appear more Gaussian, no matter what the original probability distribution looked like. Therefore, if 100 random variables all had an exponential PDF, when added all together, the sum’s distribution will start to look Gaussian. A good rule of thumb that the textbook says that when the number of samples/random variables exceeds 30, that is when it will start looking Gaussian. For the first part of the assignment, a sum of 100 random variables all with uniform distributions were added together. Based on the theorem, their sum’s distribution should now appear Gaussian. In the second part of the assignment, Sn’s distribution is plotted and based on the figure, it is slowly getting the Gaussian shape, where the histogram is starting to “center” itself on the apparent mean value. Plotting the Gaussian distributions for n=1, 10, 100 is a comparison to the actual data and how close the PDF of S n is close to a Gaussian. At n = 1, since it is only 1 sample, it is nowhere near to the actual distribution because it is only accounted for the probability of 1 sample. As the samples get larger, it starts to get closer to the probability values of the actual PDF. Once n=100, the Gaussian distribution starts to have the probability values of the actual PDF. As the number of sample size increases, that trend will continue to soon enough the actual PDF and the Gaussian PDF have very little error, which is what the central limit theorem says: the more random variables being added, the more their sum PDF takes on a Gaussian shape. The third part of the assignment asked for the root mean-squared error between the actual fit and a Gaussian fit with respect to the number of random variables/samples, n. Based on the results, in plot 3, as the number of random variables and samples increased, the error dropped exponentially towards zero. This plot helps reiterate the concept of central limit theorem. The Box-Muller Technique does the same thing as to what the central limit theorem does, except for uniformly distributed random variables between the range of 0 and 1. This technique is good for applications that are restricted to those limits because it computes their PDF a lot faster than taking the sum of multiple random variables, but because it has certain requirements in order to be used, it is not as likely. Sure, the sum method takes a little bit longer than the Box-Muller, but it can be used for any range of values.