Download 1 CHAPTER 2 LITERATURE REVIEW 2.1 Music Fundamentals 2.1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Perceptual control theory wikipedia , lookup

Time series wikipedia , lookup

Gene expression programming wikipedia , lookup

Hierarchical temporal memory wikipedia , lookup

Neural modeling fields wikipedia , lookup

Backpropagation wikipedia , lookup

Pattern recognition wikipedia , lookup

Convolutional neural network wikipedia , lookup

Catastrophic interference wikipedia , lookup

Transcript
CHAPTER 2
LITERATURE REVIEW
2.1 Music Fundamentals
2.1.1 Music Terminologies
2.1.1.1 Note Frequency
In music, a note means a specific frequency. There are 12
notes in an octave, represented by C, C#, D, D#, E, F, F#, G, G#,
A, A#, and B. There is simple relationship between two
successive notes. If the frequencies of two successive notes are Fi
and Fi+1, then Fi+1 = 21/12 * Fi. The notes are standardized around
the A note of octave 4 (A4), which is 440.00 Hz.(Chu, 2011:
682)
Table 2.1 Note Frequencies of Nine Octaves (Chu, 2011: 682)
2.1.1.2 Chord
A musical term, a chord is three or more different notes played
at the same time. A “chord progression” describes how chords
change during a piece of music. (Bielawski, 2010: 4)
5
6
Figure 2.1 C and F Chords on Piano (Weissman, 1992: 4)
2.1.1.2.1 Major Chords
A major chord consists of a root, a major third, and
a perfect fifth. For example, the C Major chord
includes the notes C, E, and G. The E is a major third
above the C; the G is a perfect fifth above the C.
Here’s a quick look at how to build major chords on
every note of the scale: (Miller, 2005: 113)
Figure 2.2 Major Chords (Miller, 2005: 113)
2.1.1.2.2 Minor Chords
The main difference between a major chord and a
minor chord is the third. Although a major chord
utilizes a major third, a minor chord flattens that
interval to create a minor third. The fifth is the same.
In other words, a minor chord consists of a root, a
minor third, and a perfect fifth. Here’s a quick look at
7
how to build minor chords on every note of the scale:
(Miller, 2005: 113)
Figure 2.3 Minor Chords (Miller, 2005: 113)
2.2 Windowing
In a signal analyzer the time record length is adjustable but it must be
selected from a set of predefined values. Since most signals are not periodic in
the predefined data block time periods, a window must be applied to correct for
leakage. A window is shaped so that it is exactly zero at the beginning and end of
the data block and has some special shape in between. This function is then
multiplied with the time data block forcing the signal to be periodic.
(http://www.physik.uni-urwuerzburg.de)
8
Table 2.2 Windowing Type (http://www.physik.uni-urwuerzburg.de)
2.3 Fourier Transform
The discrete Fourier transform (DFT) is a fundamental transform in
digital signal processing, with applications in frequency analysis, fast
convolution, image processing, etc. Moreover, fast algorithms exist that make it
possible to compute the DFT very efficiently. The algorithms for the efficient
computation of the DFT are collectively called fast Fourier transforms (FFTs).
The historic paper by Cooley and Tukey made well known an FFT of complexity
N log2 N, where N is the length of the data vector. A sequence of early papers
still serves as a good reference for the DFT and FFT. In addition to texts on
digital signal processing, a number of books devote special attention to the DFT
and FFT The importance of Fourier analysis in general is put forth very well by
Leon Cohen:
Bunsen and Kirchhoff, observed (around 1865) that light spectra can
be used for recognition, detection, and classification of substances because they
are unique to each substance.
This idea, along with its extension to other waveforms and the
invention of the tools needed to carry out spectral decomposition, certainly ranks
as one of the most important discoveries in the history of mankind
(Yip, 2000:37)
9
2.3.1 Short Time Fourier Transforms
Time Fourier Transforms can be computed at every instance in time or at
specific intervals of time. For the case of specified intervals, the Time Fourier
is said to be decimated in time. Decimation is appropriate when the Time
Fourier Transforms is either not changing quickly in time, or the Time Fourier
Transform unable to track rapid changes.
One way to generate an undecimated TFR display of a temporal signal,
x(t) is by placing the signal into a bank of band pass filters, each filter being
tuned to a different frequency. It is in this spirit that the short time Fourier
transform is formulated.
(Mark, 2009: 413)
2.4
Artificial Intelligence
Artificial Intelligence (AI) definitions vary of two dimensions, and those are:

Systems that think and act like humans

Systems that think and act rationally
(Russell and Norvig, 2003: 1)
Figure 2.4 Two Main Dimensions of AI (Russell and Norvig , 2003: 2)
For these reasons, the study of AI as rational-agent design has at least
two advantages. First, it is more general than the "laws of thought" approach,
10
because correct inference is just one of several possible mechanisms for
achieving ratiornality. Second, it is more amenable to scientific development
than are approaches based on human behavior or human thought because the
standard of rationality is clearly defined and completely general. Human
behavior, on the other hand, is well-adapted for one specific environment and
is the product, in part, of a complicated and largely unknown evolutionary
pirocess that still is far from producing perfection. (Russell and Norvig, 2003:
5)
2.5 Neural Network
An artificial neural network is an information-processing system that has
certain performance characteristics in common with biological neural
networks. Artificial neural networks have been developed as generalizations
of mathematical models of human cognition or neural biology, based on the
assumptions that:
1. Information processing occurs at many simple elements called
neurons.
2. Signals are passed between neurons over connection links.
3. Each connection link has an associated weight, which, in a typical
neural net, multiplies the signal transmitted.
4. Each neuron applies an activation function (usually nonlinear) to its
net input (sum of weighted input signals) to determine its output
signal. (Fausett, 1994: 3)
Figure 2.5 Biological Neural Network (Fausett, 1994: 3)
A neural net consists of a large number of simple processing elements
called neurons, units, cells, or nodes. Each neuron is connected to other
11
neurons by means of directed communication links, each with an associated
weight. The weights represent information being used by the net to solve a
problem. Neural nets can be applied to a wide variety of problems, such as
storing and recalling data or patterns, classifying patterns, performing general
mappings from input patterns to output patterns, grouping similar patterns, or
finding solutions to constrained optimization problems.
Each neuron has an internal state, called its activation or activity level,
which is a function of the inputs it has received. Typically, a neuron sends its
activation as a signal to several other neurons. It is important to note that a
neuron can send only one signal at a time, although that signal is broadcast to
several other neurons.
(Fausett, 1994: 4)
12
2.5.1 Backpropagation Neural Network
As is the case with most Neural Networks, the aim is to train the net
to achieve a balance between the ability to respond correctly to the input
patterns that are used for training(memorization) and the ability to give
reasonable(good responses) to input that is similar, but not identical, to that
used in training(generalization).
The training of a network by Backpropagation involves 3 stages: the
feedforward of the input training pattern, the calculation and backpropagation
of the associated error, and the adjustment of the weights. After training,
application of the net involves only the computations of the feedforward
phase. Even if training is slow, a trained net can produce its output very
rapidly. Numerous variations of backpropagation have been developed to
improve the speed of training process. (Fausett, 1994: 290)
2.5.1.1 Backpropagation Algorithm
Step0. Initialize weights
(set to small random values)
Step1. While stopping condition is false, do Steps 2-9
Step 2. For each training pair, do Steps 3-8
Feedforward:
Step 3. Each input unit (Xi, i= 1, …, n)
receives input signal xi and broadcast
this signal to all units in the layer
above(the hidden units).
Step 4. Each hidden unit(Zj, j=1, . . ., p) sums
its weighted input signals,
Applies its activation function to
compute its output signal,
And sends this signal to all units in the
layer above (output units).
13
Step 5. Each output unit (Yk, k = 1, . . . , m)
sums its weighted input signals,
And applies its activation function to
compute its output signal,
yk= f( y_ink ).
Backpropagation of error:
Step 6. Each output unit(Yk, k=
1, . . . ,m) receives a target
pattern corresponding to the
input training pattern,
computes its error information
term,
Calculates its weight
correction term(used to update
wjk later),
Calculates its bias correction
term (used to update w0k later),
And sends
to units in the
layer below
Step 7. Each hidden unit(Zj ,
j=1, . . . ,p) sums its delta
inputs(from units in the layer
above),
Multiplies by the derivative of
its activation function to
14
calculate its error information
term,
Calculates its weight correction
term(used to update vij later),
And calculates its bias
correction term(used to update
v0j later),
Update weights and biases:
Step 8.
Each output unit(Yk ,k
= 1, . . . ,m) updates
its bias and weights(j
= 0, ..., p):
Each hidden unit
(Zj,j=1, . . . ,p)
updates its bias and
weights(i=0, . . . ,n):
Step 9. Test stopping condition.
2.5.1.2 Application Algorithm
Backpropagation application algorithm is used to calculate
output signals. It consists only feedforward phase. The
application procedure is as follows:
Step 0. Initialize weights (from training algorithm)
Step 1. For each input vector, do Steps 2-4
Step 2. For i=1,. . . . ,n: set activation of input unit
xi;
15
Step 3. For j=1,. . . . ,p:
Step 4. For k=1, . . . .,m:
yk= f( y_ink ).
2.5.2 Common Activation Function
The basic operation of an artificial neuron involves summing its
weighted input signal and applying an output, or activation, function. For the
input units, this function is the identity function (see Figure 2.6). Typically,
the same activation function is used for all neurons in any particular layer of
a neural net, although this is not required. In most cases, a nonlinear
activation function is used. In order to achieve the advantages of multilayer
nets, compared with the limited capabilities of single-layer nets, nonlinear
functions are required (since the results of feeding a signal through two or
more layers of linear processing elements-i.e., elements with linear
activation functions-are no different from what can be obtained using a
single layer). (Fausett, 1994: 17)
Figure 2.6 Identity Function (Fausett, 1994: 17)
16
2.5.2.1 Binary Sigmoid Function
Sigmoid functions (S-shaped curves) are useful activation
functions. The logistic function and the hyperbolic tangent functions
are the most common. They are especially advantageous for use in
neural nets trained by backpropagation, because the simple
relationship between the value of the function at a point and the
value of the derivative at that point reduces the computational burden
during training. The logistic function, a sigmoid function with range
from 0 to 1, is often used as the activation function for neural nets in
which the desired output values either are binary or are in the interval
between 0 and 1. To emphasize the range of the function, we will call
it the binary sigmoid; it is also called the logistic sigmoid. The
function is illustrated below (Fausett, 1994: 18)
17
2.5.3 Nguyen-Widrow Initialization
The choice of initial weights will influence whether the net reaches
a global (or only a local) minimum of the error and; if so, how quickly
it converges. The update of the weight between two units depends on
both the derivative of the upper unit's activation function and the
activation of the lower unit. For this reason, it is important to avoid
choices of initial weights that would make it likely that either
activations or derivatives of activations are zero. The values for the
initial weights must not be too large, or the initial input signals to each
hidden or output unit will be likely to fall in the region where the
derivative of the sigmoid function has a very small value (the so-called
saturation region), On the other hand, if the initial weights are too small,
the net input to a hidden or output unit will be close to zero, which also
causes extremely slow learning. (Fausett, 1994: 296)
Nguyen-Widrow Initialization is a simple modification of the
common random weight initialization presented typically gives much
faster learning. The approach is based on a geometrical analysis of the
response of the hidden neurons to a single input; the analysis is
extended to the case of several inputs by using Fourier transforms.
Weights from the hidden units to the output units (and biases on the
output units) are initialized to random values between -0.5 and 0.5, as is
commonly the case.
The initialization of the weights from the input units to the hidden
units is designed to improve the ability of the hidden units to learn. This
is accomplished by distributing the initial weights and biases so that, for
each input pattern, it is likely that the net input to one of the hidden
units will be in the range in which that hidden neuron will learn most
readily. The definitions we use are as follows:
18
The procedure consists of the following simple steps:
For each hidden unit (j = 1, ... ,p):
Initialize its weight vector (from the input units):
vij(old) = random number between -0.5 and 0.5 (or between -y and
y).
Compute ||vj(old)|| = sqrt (V1j(old)2 + V2j(old)2 + … + Vnj(old)2)
Reinitialize weights:
Set bias: VOj = random number between - β and β.
(Fausett, 1994: 297)
2.5.4 Momentum
In backpropagation with momentum, the weight change is in a
direction that is a combination of the current gradient and the previous
gradient. This is a modification of gradient descent whose advantages arise
chiefly when some training data are very different from the majority of the
data(and possibly even incorrect). It is desirable to use a small learning rate to
avoid a major disruption of the direction of learning when a very unusual pair
of training patterns is presented. However, it is also preferable to maintain
training at a fairly rapid pace as long as the training data are relative similar.
Convergence is sometimes faster if a momentum term is added to the
weight update formulas. In order to use momentum, weights (or weight
updates) from one or more previous training patterns must be saved. For
example, in the simplest form of backpropagation with momentum, the new
weights for training step t+1 are based on the weights at training steps t and t1. The weight update formulas for backpropagation with momentum are
Wjk (t+1) = Wjk (t)+ α δk Zj + µ [ Wjk – Wjk (t-1) ]
or
∆ Wjk (t+1) = Wjk (t)+ α δk Zj + µ ∆Wjk (t)
And
Vij(t+1) = Vjk(t) + α δj Xi + µ [ Vij – Vij(t-1) ],
19
or
∆ Vij (t+1) = Vij (t) + α δj Xi + µ∆Vij(t)
Where the momentum parameter µ is constrained to be in the range from 0 to
1, exclusive of the end points.
Momentum allows the net to make reasonably large weight
adjustments as long as the corrections are in the same general direction for
several patterns, while using a smaller learning rate to prevent a large
response to the error from any one training pattern. It also reduces the
likelihood that the net will find weight that are a local, but not global,
minimum. When using momentum, the net in proceeding not in the direction
of the gradient, but in the direction of a combination of the current gradient
and the previous direction of weight correction.
As in the case of delta-bar-delta updates, momentum forms an
exponential weighted sum (with µ as the base and time as the exponent) of
the past and present weight changes. Limitation to the effectiveness of
momentum include the factor that the learning places an upper limit on the
amount by which a weight can be changed and the fact that momentum can
cause the weight to be changed in a direction that would increase the error
(Fausett, 1994: 306)
2.5.5 Determining Number of Hidden Nodes
Usually some rule-of-thumb methods are used for determining the
number of neurons in the hidden nodes.

The number of hidden layer neurons is 2/3 (or 70% to 90%) of the size
of the input layer. If this is insufficient then number of output layer
neurons can be added later on.

The number of hidden layer neurons should be less than twice of the
number of neurons in input layer

The size of the hidden layer neurons is between the input layer size and
output layer size
(Karsoliya, 2012: 715)
20
2.6 Java Programming
2.6.1 A Brief History of Java
Java was developed at Sun Microsystems in 1991, by a team
comprising James Gosling, Patrick Naughton, Chris Warth, Ed Frank,
and Mike Sheridan as its members. The language was initially
called Oak. It was later termed as Java. Java was launched on 23 May,
1995. The Java software was released as a development kit. The first
two versions were named JDK 1.0 and JDK 1.1. In 1998, while
releasing the next version, Sun Microsystems changed the
nomenclature
from Java
Development
Kit
(JDK) to
Software
Development Kit (SDK). Also, it added “2” to the name. The released
version of Java was called Java 2 SDK 1.2. (Bhave, 2009: 1)
2.6.2
Java Features
According to Bhave, Java also has some features such as:

Simple
As compared to C++, Java is simpler because of many
reasons. The most important thing is the absence of pointers.
Many unnecessary features of C++, like overloading of
operators, are removed from Java.

Secure
Java language becomes secure because of the following
properties/components:


No pointers

Bytecode verifier

Class loader

Security manager
Object oriented
Java is object oriented. During the last many years,
every new language introduced is object oriented.

Robust
Java is a robust language. It does not crash the
computer, even when it encounters minor mistakes in a
21
program. It has the ability to withstand the threats. This is
considered as a great advantage for a programming language.

Multi-threaded
With the introduction of high-speed microprocessors a
few years ago, users wanted to perform many tasks at a time.
This is possible on a single-chip processor only by the use of
multi-threading. Multi-threading means ability of a program to
run multiple (more than one) pieces of program code
simultaneously. This is possible by time slicing in a single
processor system. Every thread is assigned a small time slice
to run. It creates a feeling that all the tasks are running
simultaneously in time.

Interpreted
Java is an interpreted language. When we write a
program,
it
is
The interpreter executes
compiled
this
class
into
file.
a
class file.
However,
the
interpreters 30 years ago were interpreting the statement in
textual form. It was a very slow process. Java interprets
bytecode; hence, it is considerably fast. Actually, Java gets all
the advantages of interpretation, without suffering from major
disadvantages.

Architecture neutral
The Java programming language is dependent neither
on any particular microprocessor family, nor on any particular
architecture. Any standard computer or even a microcomputer
can run Java programs.

Distributed
Java has many in-built faculties which make it easy to
use language in distributed systems.
(Bhave, 2009: 2- 3)
22
2.6.3
How Java Programs Run
While developing a program in non-interpreted languages (like
C/C++), the following steps occur. A program is written in higher
level language (HLL). It is called the source code, typically
a .c or .cpp file. Next, this program is compiled completely, resulting
in an executable file. It typically has an .exe extension. This file
contains a machine language program (sometimes called machine
code). This file can be executed on machine (computer) with the help
of an operating system.
Unlike C or C++, Java programs run differently. First, a
program is written in Java. It is called the source code. The file
has .java extension.
Next,
this
program
is
compiled
by
a
Java compiler. It produces a bytecode. The file extension is .class.
This process is shown in Figure 2.9. (Bhave, 2009: 4)
Figure 2.7 Compilation in Java (Bhave, 2009: 4)
2.7 Pitch Class Profile (PCP)
PCP is used for chord recognition and key-finding for musical audio data.
Each element in the vector represents the relative intensity of one of the 12
pitch classes. i.e., A, A#, B, C, C#, D, D#, E, E#, F, G. it is calculated once for
each basic time unit, which is selected to be the length of one half beat. For
example, for a time signature of 4/4. The quarter note is one beat so that the
duration of a one-eighth note is the basic time unit.
Pitch Class Number = mod ([12 * log 2(f/440)], 12). Where [] is round
operation that rounds the operand into integer, f is the frequency of peaks and
440, which is the frequency of the note A4. Third, the energy of the peaks in
the magnitude spectrum is added to the element of the PCP feature vector
according to the pitch class number. That is, energies of all the peaks that have
23
pitch class number i are added to the i-th element of a PCP vector. Each
element of a PCP vector represents the relative intensity of each pitch class
number. (Shiu, 2007: 28)
2.8
WAV
The Microsoft .WAV file format is a technique for storing analog
audio data in a digital format. It is capable of storing waveform data in many
different formats and an array of compression types.
A *.WAV file is a digital recording of the sounds made by any
instrument or
human voice. It basically cannot be modified. When a PC plays back a WAV
file, it converts numbers in the file into audio signals for the PC's speakers. A
complete tune recorded in .WAV format is always very large.
A .WAV file is always true to the original instruments that produced
the music.
Strengths:
WAV files are simple and widely used, especially on PCs. Many
applications have been developed to play WAV files and it is the native
sound format for Windows. Later versions of Netscape Navigator (3+) and
Microsoft Internet Explorer (2+) support the WAV format.
Weaknesses:
WAV is seen as a proprietary Windows format, although conversion tools are
available to play WAV files on other platforms. WAV files are not highly
compressed.
(Park, 2004: 1)
2.9 Flowchart
Flowchart is an extremely useful tool in program development
activity in many respects. Firstly, any error, omission or commission canb e
easily detected from a program flowchart that it can be from a program
because a program flowchart is a pictorial representation of the logic of a
program. Secondly, a program flowchart can be followed easily and quickly.
24
Thirdly, it serves as a good document, which may be of great help if the need
for program modification arises in future.
The following are the standard symbols used in program
flowcharts:
Terminal used to show the beginning and end
computer-related process
Input/Output. Used to show any Input/ Output
operation
Computer Processing. Used to show any
processing performed by a computer system
Predefined processing. Used to indicate any
process not specially defined in flowchart.
Comment.
Used to write any explanatory
statement to clarify something.
Flow Line. Used to connect the symbol.
Document Input/Output. Used when input
comes from a document and output goes to a
document
Decision. Used to show any point in process
where a decision must be made to determine
further action
On-page Connector. Used to connect parts of a
flowchart continued on the same page.
25
Off-page Connector. Used to connect parts of a
flowchart continued on the different page.
(Chaudhuri,2005: 3)
2.10 Nyquist’s Theorem
Nyquist‘s theorem states that the maximum frequency that can be represented
when digitizing an analogue signal is exactly half the sampling rate. Frequency
above this limit will give rise to unwanted frequencies below the Nyquist
frequency of half the sampling rate. What happens to signals at exactly the
Nyquist frequency depends on the phase. (Benson, 2007:254)
26
2.10 Related Works
This research is also based on the previous works that had been done to
recognize chords. There are a number of techniques that are developed such as
Hidden Markov Model.
Björn Schuller, Florian Eyben, and Gerhard Rigoll proposed Automatic
Chord Labelling in 2008, where Hidden Markov Model is used. The inputs used
are musical pieces that are converted from MP3 to a monophonic, 44.1 kHz, 16
Bit wave.
According to Schuller, Automatic Chord Labeling becomes a challenge
when dealing with original audio recordings, in particular of modern popular
music. In this work we therefore suggest a data-driven approach Hidden-MarkovModels (HMM) as opposed to typical chord-template modeling. The feature basis
is formed by pitch-tuned chromatic feature information. (Schuller, 2008: 555)
Basically in their research, the song processed is partitioned into frames.
Frames produced consecutive bars which will be mapped into pitch classes. Per
bar, 12-bin chroma-based vector is computed. And the chords that can be
recognized are mapped into 24 chords, major and minor chords.
Maksim Khadkevich and Maurizio Omologo also developed Hidden
Markov technique as well, but with different approach of using Viterbi decoding
to produce output chords. Processes start with the evaluation of a set of different
windowing methods for Discrete Fourier Transform is investigated in terms of
their efficiency. Pitch class profile vectors, that represent harmonic information,
are extracted from the given audio signal. The resulting chord sequence is
obtained by running a Viterbi decoder on trained hidden Markov models.
(Khadkevich et al., 2009: 1)
Another research about chord recognition uses feed-forward Neural
Network as technique. The research was carried out by M. Osmalskyj, J-J.
Embrechts, S. Piérard, and M. Van Droogenbroeck in 2012. It recognize 10
chords and several instruments such as piano and violin.
The method uses the known feature vector for automatic chord recognition
called the Pitch Class Profile (PCP). Although the PCP vector only provides
attributes corresponding to 12 semi-tone values show that it is adequate for chord
recognition.(Osmalsky et al., 2012: 39)