Download Digital Signal Proce..

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Time series wikipedia , lookup

History of statistics wikipedia , lookup

Karhunen–Loève theorem wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Transcript
Digital Signal Processing
1. Define statistical variance and covariance.
The concept of variance can be extended to continuous data sets too. In that case, instead of
summing up the individual differences from the mean, we need to integrate them. This approach
is also useful when the number of data points is very large, like the population of a
country.Variance is extensively used in probability theory, wherein from a given smaller sample
set, more generalized conclusions need to be drawn. This is because variance gives us an idea
about the distribution of data around the mean, and thus from this distribution, we can work out
where we can expect an unknown data point. In probability theory and statistics, the variance is
a measure of how far a set of numbers is spread out. It is one of several descriptors of
a probability distribution, describing how far the numbers lie from the mean (expected value). In
particular, the variance is one of the moments of a distribution. In that context, it forms part of a
systematic approach to distinguishing between probability distributions. While other such
approaches have been developed, those based on moments are advantageous in terms of
mathematical and computational simplicity.The variance is a parameter describing in part either
the actual probability distribution of an observed population of numbers, or the theoretical
probability distribution of a sample (a not-fully-observed population) of numbers. In the latter
case a sample of data from such a distribution can be used to construct an estimate of its
variance: in the simplest cases this estimate can be the sample variance, defined below.
In probability theory and statistics, covariance is a measure of how much two random
variableschange together. If the greater values of one variable mainly correspond with the greater
values of the other variable, and the same holds for the smaller values, i.e. the variables tend to
show similar behavior, the covariance is a positive number. In the opposite case, when the
greater values of one variable mainly correspond to the smaller values of the other, i.e. the
variables tend to show opposite behavior, the covariance is negative. The sign of the covariance
therefore shows the tendency in the linear relationship between the variables. The magnitude of
the covariance is not that easy to interpret. The normalized version of the covariance,
the correlation coefficient, however shows by its magnitude the strength of the linear relation.A
distinction has to be made between the covariance of two random variables,
a population parameter, that can be seen as a property of the joint probability distribution at one
side, and on the other side thesample covariance, which serves as an estimated value of the
parameter.
2. How do you compute the energy of a discrete signal in time and frequency domains?
3. Define sample autocorrelation function. Give the mean value of this estimate.
Autocorrelation is the cross-correlation of a signal with itself. Informally, it is the similarity
between observations as a function of the time separation between them. It is a mathematical tool
for finding repeating patterns, such as the presence of a periodic signal which has been buried
under noise, or identifying the missing fundamentalfrequency in a signal implied by
its harmonic frequencies. It is often used in signal processing for analyzing functions or series of
values, such as time domain signals. Different fields of study define autocorrelation differently,
and not all of these definitions are equivalent. In some fields, the term is used interchangeably
with autocovariance. In statistics, the autocorrelation of a random process describes
the correlation between values of the process at different points in time, as a function of the two
times or of the time difference. Let X be some repeatable process, and i be some point in time
after the start of that process. (i may be aninteger for a discrete-time process or a real number for
a continuous-time process.) Then Xi is the value (or realization) produced by a given run of the
process at time i. Suppose that the process is further known to have defined values
for mean μi and variance σi2 for all times i. Then the definition of the autocorrelation between
times s and t is
where "E" is the expected value operator. Note that this expression is not well-defined for all
time series or processes, because the variance may be zero (for a constant process) or
infinite. If the function R is well-defined, its value must lie in the range [−1, 1], with 1
indicating perfect correlation and −1 indicating perfect anti-correlation.If Xt is a second-order
stationary process then the mean μ and the variance σ2 are time-independent, and further the
autocorrelation depends only on the difference between t and s: the correlation depends only
on the time-distance between the pair of values but not on their position in time. It is
common practice in some disciplines, other than statistics and time series analysis, to drop
the normalization by σ2 and use the term "autocorrelation" interchangeably with
"autocovariance". However, the normalization is important both because the interpretation of
the autocorrelation as a correlation provides a scale-free measure of the strength of statistical
dependence, and because the normalization has an effect on the statistical properties of the
estimated autocorrelations.
4. What is the basic principle of Welch method to estimate power spectrum?
In physics, engineering, and applied mathematics, Welch's method, named after P.D. Welch,
is used for estimating the power of a signal at different frequencies: that is, is is an approach
to spectral density estimation. The method is based on the concept of
using periodogram spectrum estimates, which are the result of converting a signal from the time
domain to the frequency domain. Welch's method is an improvement on the
standard periodogram spectrum estimating method and on Bartlett's method, in that it reduces
noise in the estimated power spectra in exchange for reducing the frequency resolution. Due to
the noise caused by imperfect and finite data, the noise reduction from Welch's method is often
desired. The Welch method is based on Bartlett's method and differs in two ways:
1. The signal is split up into overlapping segments: The original data segment is split up
into L data segments of length M, overlapping by D points.
1. If D = M / 2, the overlap is said to be 50%
2. If D = 0, the overlap is said to be 0%. This is the same situation as in the Bartlett's
method.
2. The overlapping segments are then windowed: After the data is split up into overlapping
segments, the individual L data segments have a window applied to them (in the time
domain).
1. Most window functions afford more influence to the data at the center of the set
than to data at the edges, which represents a loss of information. To mitigate that
loss, the individual data sets are commonly overlapped in time (as in the above
step).
2. The windowing of the segments is what makes the Welch method a
"modified"periodogram.
After doing the above, the periodogram is calculated by computing the discrete Fourier
transform, and then computing the squared magnitude of the result. The
individual periodograms are then time-averaged, which reduces the variance of the individual
power measurements. The end result is an array of power measurements vs. frequency "bin".
5. How do find the ML estimate?
6. Give the basic principle of Levinson recursion.
Ans. The Levinson recursion is a simplified method for solving normal equations. It may be
shown to be equivalent to a recurrence relation in orthogonal polynomial theory. The
simplification is Levinson's method is possible because the matrix has actually
only N different elements when a general matrix could have N2 different elements.Levinson
developed his recursion with single time series in mind (the basic idea was presented in Section
3.3). It is very little extra trouble to do the recursion for multiple time series. Let us begin with
the prediction-error normal equation. With multiple time series, unlike single time series, the
prediction problem is changed if time is reversed. We may write both the forward and the
backward prediction-error normal equations as one equation in the form of (36).
Since end effects play an important role, we will show how, when given the solution for 3-term
filters, and
(36)
to find the solution
and
four-term filters to
(37)
by forming a linear combination of
and in
and
. This can be done by choosing constant matrices
(38
)
Make by choosing and so that the bottom element on the right-hand side of (38) vanishes.
That is,
.Make by choosing and so that the top element on the righthand side vanishes. That is,
.
Of course, one will want to solve more than just the prediction-error problem. We will also want
to go from 3 x 3 to 4 x 4 in the solution of the filter problem with arbitrary right-hand side
.This is accomplished by choosing in the following construction (39) so that
(39
)
7. Why are FIR filters widely used for adaptive filters?
8. Express the Windrow- Hoff LMS adaptive algorithm. State its properties.
Ans. Least mean squares (LMS) algorithms are a class of adaptive filter used to mimic a
desired filter by finding the filter coefficients that relate to producing the least mean
squares of the error signal (difference between the desired and the actual signal). It is
a stochastic gradient descent method in that the filter is only adapted based on the error at
the current time. It was invented in 1960 byStanford University professor Bernard
Widrow and his first Ph.D. student, Ted Hoff. LMS algorithm summary
The LMS algorithm for a pth order algorithm can be summarized as
Parameters: p = filter order
μ = step size
Initialisation:
Computation: For n = 0,1,2,...
where
denotes the Hermitian transpose of
.
The main drawback of the "pure" LMS algorithm is that it is sensitive to the scaling of its
input x(n). This makes it very hard (if not impossible) to choose a learning rate μ that guarantees
stability of the algorithm (Haykin 2002). The Normalised least mean squares filter (NLMS) is a
variant of the LMS algorithm that solves this problem by normalising with the power of the
input. The NLMS algorithm can be summarised as:
Parameters: p = filter order
μ = step size
Initialization:
Computation: For n = 0,1,2,...
[edit]Optimal learning rate
It can be shown that if there is no interference (v(n) = 0), then the optimal learning rate for the
NLMS algorithm is
μopt = 1
and is independent of the input x(n) and the real (unknown) impulse response
general case with interference (
. In the
), the optimal learning rate is
The results above assume that the signals v(n) and x(n) are uncorrelated to each other,
which is generally the case in practice.