Download Journal of Systems and Software:: A Fuzzy Neural Network for

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Neural modeling fields wikipedia , lookup

Speech synthesis wikipedia , lookup

Type-2 fuzzy sets and systems wikipedia , lookup

Stemming wikipedia , lookup

Speech-generating device wikipedia , lookup

Fuzzy concept wikipedia , lookup

Facial recognition system wikipedia , lookup

Catastrophic interference wikipedia , lookup

Affective computing wikipedia , lookup

Fuzzy logic wikipedia , lookup

Speech recognition wikipedia , lookup

Convolutional neural network wikipedia , lookup

Pattern recognition wikipedia , lookup

Transcript
VOL. 1, NO. 9, December 2011
ISSN 2222-9833
ARPN Journal of Systems and Software
©2009-2011 AJSS Journal. All rights reserved
http://www.scientific‐journals.org A Fuzzy Neural Network for Speech Recognition
A.Vijay kumar
Assistant Professor
Department of CSE,
Hyderabad Institute of
Technology and Management
Hyderabad, A.P, India.
Aruna
Assistant Professor
M.Vijayapal Reddy
Assistant Professor (C)
Department of H & BS,
ACE COLLEGE OF ENGINEERING
Ghatkesar, Hyderabad, A.P, India
Head of Dept (BCA)
[email protected]
[email protected]
O.U. P.G. COLLEGE
Secunderabad, A.P, India
[email protected]
ABSTRACT
There are two problems when conditional T-S fuzzy Neural network is used directly in speech recognition system. One is the rule
disaster problem, that is, the rule number will increase exponentially with the increase of input dimensions. Another problem is
the network reasoning failure resulted from input dimensions too large. The paper presented an improved algorithm of T-S fuzzy
neural network. The subtraction clustering algorithm was used to make certain rule number to escape the rule disaster. The
network reasoning can correctly work by adding a compensated factor on membership. The improved algorithm was used in
speech recognition system. The experimental results showed that the recognition results of improved algorithm are better than the
ones of radial basis function (RBF) neural network using Kmeans clustering algorithm to select the centroid. And it has much
better robustness.
Keywords: T-S fuzzy neural network; speech recognition; fuzzy rules.
1. INTRODUCTION
The main aim of this paper is to present an improved algorithm
of T-S(Takagi-Sugeno) fuzzy neural network model, which
can be applied into the speech recognition system. Fuzzy
neural network (FNN) combined by neural network and fuzzy
system, not only can mimic the human brain logic thinking, but
also has the ability of processing simultaneously quantitative
and qualitative knowledge of artificial neural network [4].The
characteristic parameters of the speech signal will produce
inaccurate and incomplete information in the process of
quantification and transfer. Therefore, the speech recognition
lacks of semantic character. The concept of membership
function in fuzzy theory can compensate for these
shortcomings to some degree and provide more comprehensive
information for the system to enhance the robustness of speech
recognition.
2. BACKGROUND
Speech recognition also known as automatic speech
recognition or computer speech recognition converts spoken
words to machine-readable input for example to key presses,
using the binary code for a string of character codes. The term
"voice recognition" is sometimes used to refer to speech
recognition where the recognition system is trained to a
particular speaker as is the case for most desktop recognition
software; hence there is an aspect of speaker recognition,
which attempts to identify the person speaking, to better
recognize what is being said. Speech recognition is a broad
term which means it can recognize almost anybody’s speech such as a call centre system designed to recognize many
voices. Voice recognition is a system trained to a particular
user, where it recognizes their speech based on their unique
vocal sound. Speech recognition applications include voice
dialing (e.g., "Call home"), call routing (e.g., "I would like to
make a collect call"), demotic appliance control and contentbased spoken audio search (e.g., find a pod cast where
particular words were spoken), simple data entry (e.g., entering
a credit card number), preparation of structured documents
(e.g., a radiology report), speech-to-text processing (e.g., word
processors or emails), and in aircraft cockpits usually termed
Direct Voice Input.Speech recognition can be of two types
based on the grammar that the recognition is based on.
(Grammar is in other words the list of possible recognition
outputs that can be generated)Command and Control Dictation
In a command and control scenario a developer provides a
limited set of possible word combinations, and the speech
recognition engine matches the words spoken by the user to the
limited list. In command and control the accuracy of
recognition is very high.It is always better for applications to
implement command and control as the higher accuracy of
recognition makes the application respond better.In Dictation
mode the recognition engine compared the input speech to the
whole list of the dictionary words. For the dictation mode to
have a high accuracy of recognition is it important that the user
has prior trained the recognition engine by speaking in to it.
3. LITERATURE SURVEY
A. Neural Network
A fuzzy neural network or neuro-fuzzy system is a learning
machine that finds the parameters of a fuzzy system (i.e., fuzzy
sets, fuzzy rules) by exploiting approximation techniques from
neural networks. Combining fuzzy systems with neural
284
VOL. 1, NO. 9, December 2011
ISSN 2222-9833
ARPN Journal of Systems and Software
©2009-2011 AJSS Journal. All rights reserved
http://www.scientific‐journals.org networks. Both neural networks and fuzzy systems have some
things in common. They can be used for solving a problem
(e.g. pattern recognition, regression or density estimation) if
there does not exist any mathematical model of the given
problem. They solely do have certain disadvantages and
advantages which almost completely disappear by combining
both concepts. Neural networks can only come into play if the
problem is expressed by a sufficient amount of observed
examples. These observations are used to train the black box.
On the one hand no prior knowledge about the problem needs
to be given. On the other hand, however, it is not
straightforward to extract comprehensible rules from the neural
network's structure. On the contrary, a fuzzy system demands
linguistic rules instead of learning examples as prior
knowledge. Furthermore the input and output variables have to
be described linguistically. If the knowledge is incomplete,
wrong or contradictory, then the fuzzy system must be tuned.
Since there is not any formal approach for it, the tuning is
performed in a heuristic way. This is usually very time
consuming and error-prone.
D. Performance of speech recognition systems
can recognize a small number of words (for instance, the ten
digits) as spoken by most speakers. Such systems are popular
for routing incoming phone calls to their destinations in large
organizations.Both acoustic modeling and language modeling
are important parts of modern statistically-based speech
recognition algorithms. Hidden Markov models (HMMs) are
widely used in many systems. Language modeling has many
other applications such as smart keyboard and document
classification. Performance of speech recognition systems is
typically described in terms of word error rate, E, defined as:
E= ((S+I+D)/N)*100)
(2.1)
Where N is the total number of words in the test set, and S, I,
and D are the total number of substitutions, insertions, and
deletions, respectively [2].
4. IMPLEMENTATION
Determining the rule numbers of reasoning layer
The rule numbers of reasoning layer is set by the subtraction
Verification Value:
clustering algorithm through extracting possible cluster center
The performance of speech recognition systems is usually
specified in terms of accuracy and speed. Accuracy may be
measured in terms of performance accuracy which is usually
rated with word error rate (WER), whereas speed is measured
with the real time factor. Other measures of accuracy include
Single Word Error Rate (SWER) and Command Success Rate
(CSR).Most speech recognition users would tend to agree that
dictation machines can achieve very high performance in
controlled conditions. There is some confusion, however, over
the interchangeability of the terms "speech recognition" and
"dictation". Commercially available speaker-dependent
dictation systems usually require only a short period of training
(sometimes also called `enrollment') and may successfully
capture continuous speech with a large vocabulary at normal
pace with a very high accuracy. Most commercial companies
claim that recognition software can achieve between 98% to
99% accuracy if operated under optimal conditions. `Optimal
conditions' usually assume that users:
of the input data and taking the average of all training data’s

data,
have speech characteristics which match the training

can achieve proper speaker adaptation, and

Work in a clean noise environment (e.g. quiet office
or laboratory space).
center number for the network inference layer’s rule number.
The algorithm is as follows: Suppose (x1,x2,…, xn) are n data
points in M-dimensional space. Assuming each data point is a
candidate for a cluster center. Therefore, the density index at
data point xi is defined as
(6.1)
Where γa is positive and defined as neighborhood of the
point.The data points outside the radius contribute little to the
density index of the point. First calculate the density index of
each data point. Then select the data point with the highest
density index as the first cluster center. Let x(1)c1 is the selected
point and D(1) c1 is its density index. Then the density index of
data point xi can be updated as
This explains why some users, especially those whose speech
is heavily accented, might achieve recognition rates much
lower than expected. Speech recognition in video has become a
popular search technology used by several video search
companies.Limited vocabulary systems, requiring no training,
285
VOL. 1, NO. 9, December 2011
ISSN 2222-9833
ARPN Journal of Systems and Software
©2009-2011 AJSS Journal. All rights reserved
http://www.scientific‐journals.org (6.2)
Where γb is positive and k is cycle number. Obviously the
density index of the data points near the first cluster center xc1
reduces significantly. So these points cannot be the next cluster
center. γb is a constant and defines a neighborhood in which
the density index function is decreased significantly. Usually γb
is greater than γa. Generally, we set γb=1.5 γa. First correct the
density index of each data point. Then select the next cluster
center xc2 and modify the clustering point’s density index
again. The process repeats until all cluster centers are
generated.When influence scope of per-dimensional data
clustering center is determined, the number of per-dimensional
data clustering center is obtained. Take the average of the
number of clustering center for reasoning rule numbers of T-S
fuzzy neural network.
(6.6)
(6.7)
Inference algorithm for reasoning layer
(6.8)
Form equation (4), we can see that, because of 0<µij≤1, α will
tend to minimum or even zero resulting in abnormal
implementation of the fuzzy reasoning when the input data
dimension is too large. To solve this problem, we can add a
compensated factor Nadj to membership. Nadj is usually
determined by experiment and relating to the input dimension.
In this condition, equation (4) can be updated as
(6.9)
(6.3)
where δ(4)i , δ(3)j and δ(2)ij are the first-order gradient of cost
function respectively in the fourth, the third and the second
layer. Δwij, Δcij and Δσij are adjusting error of wij, cij and σij
respectively.
So the T-S fuzzy neural network can still complete correctly
fuzzy reasoning function when input data dimension is
large[1].
Parameter adjusting algorithm for improved T-S
fuzzy neural network:
There are three categories of parameters needing adjusting in
the network. The first one is the weights coefficients wij
between the third layer and fourth layer, which represent defuzzy parameters. And the second and third categories of
parameters are the center value cij and the width σij of the
Gaussian membership function. As T-S network essentially is a
multilayer feed-forward network, we can design parameters
adjustment algorithm imitating back propagation (BP) network
and using error back propagation algorithm. Assume error cost
function is calculated as
Figure 1: Speech Recognition System Window
(6.4)
where, ti and yi respectively represent the desired output
the actual output.
(6.5)
286
VOL. 1, NO. 9, December 2011
ISSN 2222-9833
ARPN Journal of Systems and Software
©2009-2011 AJSS Journal. All rights reserved
http://www.scientific‐journals.org This form shows the main screen of the speech recognition
system
Figure 5: Playing of the Sound File.
This form shows the result of the play file button clicked the
user can hear the sound from the speaker.
Figure 2: Spoken Words Files
This form shows the dialog which opens when the file preview
button is clicked
Figure 6 :Training of the Two Spoken Words.
This form shows the results of the Train Neural network button
clicked and the result can be seen in the File trained box
displaying the files which have been trained
Figure 3: Spoken Word File Path
This form shows that the text filed contains the wave file
which is selected by the user.
Figure 7: Training of the Five Spoken Words.
This form shows the results such as the input vector length,
output vector length and training sample count as well as the
error rate of five files trained.
Figure 4: Feature Extraction Values of Spoken Word
This form shows the results of the read features button clicked
and the result is displayed in the right multi line text box.
Figure 8: Training of the Eight Spoken Words.
287
VOL. 1, NO. 9, December 2011
ISSN 2222-9833
ARPN Journal of Systems and Software
©2009-2011 AJSS Journal. All rights reserved
http://www.scientific‐journals.org This form shows the results such as the input vector length,
output vector length and training sample count as well as the
error rate of 8 files trained.
Figure 12: Recognized Text Of First Spoken Word File.
Figure 9: Training Done Window.
This form shows the results such as the input vector length,
output vector length and training sample count as well as the
error rate after training all the files.
Figure 10: Average Error Rate Window.
This form shows the results such as the input vector length,
output vector length and training sample count as well as the
error rate and finally displays the Average Error rate.
This form shows the results of input file recognized
and the result is displayed below the Recognized text control.
Figure 13: Recognized Text Of Third Spoken Word File.
This form shows the results of input file recognized and the
result is displayed below the Recognized text control.
Figure 14: Recognized Text Of Twenty Fourth Spoken
Figure 11: Retrieved input file details.
Word File.
This form shows the results of input file recognized and the
result is displayed below the Recognized text control.
This form shows the results of retrieve input file details such as
average bytes per second and other information related to the
input query file.
288
VOL. 1, NO. 9, December 2011
ISSN 2222-9833
ARPN Journal of Systems and Software
©2009-2011 AJSS Journal. All rights reserved
http://www.scientific‐journals.org Figure 15: Recognized Text Of Fifth Spoken Word File.
5.
CONCLUSION
ENHANCEMENT
AND
FUTURE
This paper presented an improved algorithm of T-S fuzzy
neural network. Its obvious characteristic is able to directly
present logic, suitable for direct or advanced knowledge
presentation and has better logic performance. But fuzzy logic
system cannot automatically generate and adjust membership
function and rules. Fuzzy neural network (FNN) combined by
neural network and fuzzy system, not only can mimic the
human brain logic thinking, but also has the ability of
processing simultaneously quantitative and qualitative
knowledge of artificial neural network. The characteristic
parameters of the speech signal will produce inaccurate and
incomplete information in the process of quantification and
transfer. Therefore, the speech recognition lacks of semantic
character. The concept of membership function in fuzzy theory
can compensate for these shortcomings to some degree and
provide more comprehensive information for the system to
enhance the robustness of speech recognition.
[6] Derleth, R.P.,” Temporal and compressive properties of
the normal and impaired
auditory system”. Ph.D.
thesis,Universit at Oldenburg, 1999.
[7] Gelin, P., Junqua, J.-C.,“Techniques for robust speech
recognition in the car
environment”. In: Proc.
Eurospeech , Budapest, Hungary, Vol. 6, pp. 2483±2486,
1999.
[8] Francis, I.F., Anderson, T.R..,“Binaural phoneme
recognition using the auditory
image model and
cross-correlation”. In: Proc. ICASSP pp. 1231±1234,
1997.
[9] Dau, T., Kollmeier, B., Kohlrausch, A., ”Modeling
auditory processing of amplitude modulation”,I+II.J.
Acoust. Soc. Am. 102 (5), 2892±2919, 1997.
[10] Junqua, J.-C.; Haton, J.-P.,“Robustness in Automatic
Speech Recognition Fundamentals and Applications”,
Kluwer Academic Publishers, 1995.
AUTHORS
Mr.A.Vijaykumar Graduated in Computer
Science and Engineering. From Jawaharlal Nehru
Technological University Hyderabad, India and M.Tech in
Computer Science and Engineering from Acharaya
Nagarjuna University Guntur, A.P., India .He is working
presently as Assistant Professor in Department of C.S.E in
Hyderabad Institute of Technology and Management
(HITAM), R.R.Dist, INDIA, A.P. He has 5 years of
Experience. His Research areas Automata theory, Compiler
design,
Neuralnetworks
and
Networking.
[email protected]
REFERENCES
[1] Chia-Feng Juang, and Chun-I Lee, “A Fuzzified Neural
Fuzzy Inference Network for Handling Both Linguistic
and numerical information Simultaneously”, Neuro
Computing, doi:10.1016/j.neucom, 2007.
[2] Grabianowski, Ed. "How Speech Recognition Works.", 10
November 2006.
[3] Hitoshi Iyatom, and Masafumi hagiwara, “Adaptive Fuzzy
Inference Neural
network”, Pattern Recognition,
doi:10.1016/j.patcog.2004.04.003, 2004.
[4]Yaonan Wang, “Intelligent Information Processing
Technology”, Higher Education
Press, Beijing,
2003.
[5] N.K. Kasabov, Q. Song, DENFIS: “dynamic evolving
neural-fuzzy inference system and its application for
time-series prediction”, IEEE Trans.Fuzzy Syst.14–154,
2002.
Ms.Aruna Graduated in B.Sc(Computer
Science) From Osmania University Hyderabad, India and
M.Sc(Mathematics)From Osmani University Hyderabad,
A.P., India. She is working presently as Assistant Professor in
Department of H&BS in ACE COLLEGE OF
ENGINEERING, Ghatkesar, R.R.Dist, INDIA, A.P. She has 4
years of Experience. Her Research areas include Differential
equations, Linear algebra, Neural networks and Networking.
[email protected]
289
VOL. 1, NO. 9, December 2011
ISSN 2222-9833
ARPN Journal of Systems and Software
©2009-2011 AJSS Journal. All rights reserved
http://www.scientific‐journals.org Mr.M.VIJAYAPAL REDDY,Asst Prof (c),
M.Tech (Computer Science and Engineering ) from
Acharaya Nagarjuna University Guntur, A.P., India.& MCA
from Kakatiya University Warangal,A.P , M.A.(Sociology)
from Osmania University .He worked has Head of dept for
MCA & guided many students in projects, presently working
as Head of dept for BCA( foreign Batch students) in
OSMANIA
UNIVERSITY
P.G.
COLLEGE
,Secunderabad, A.P,INDIA, He has 7 years of Experience in
field of Computer Science, His Research areas Data Base
Management System, Neural Networks and Networking.
[email protected]
290