* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Journal of Systems and Software:: A Fuzzy Neural Network for
Neural modeling fields wikipedia , lookup
Speech synthesis wikipedia , lookup
Type-2 fuzzy sets and systems wikipedia , lookup
Speech-generating device wikipedia , lookup
Fuzzy concept wikipedia , lookup
Facial recognition system wikipedia , lookup
Catastrophic interference wikipedia , lookup
Affective computing wikipedia , lookup
Fuzzy logic wikipedia , lookup
Speech recognition wikipedia , lookup
VOL. 1, NO. 9, December 2011 ISSN 2222-9833 ARPN Journal of Systems and Software ©2009-2011 AJSS Journal. All rights reserved http://www.scientific‐journals.org A Fuzzy Neural Network for Speech Recognition A.Vijay kumar Assistant Professor Department of CSE, Hyderabad Institute of Technology and Management Hyderabad, A.P, India. Aruna Assistant Professor M.Vijayapal Reddy Assistant Professor (C) Department of H & BS, ACE COLLEGE OF ENGINEERING Ghatkesar, Hyderabad, A.P, India Head of Dept (BCA) [email protected] [email protected] O.U. P.G. COLLEGE Secunderabad, A.P, India [email protected] ABSTRACT There are two problems when conditional T-S fuzzy Neural network is used directly in speech recognition system. One is the rule disaster problem, that is, the rule number will increase exponentially with the increase of input dimensions. Another problem is the network reasoning failure resulted from input dimensions too large. The paper presented an improved algorithm of T-S fuzzy neural network. The subtraction clustering algorithm was used to make certain rule number to escape the rule disaster. The network reasoning can correctly work by adding a compensated factor on membership. The improved algorithm was used in speech recognition system. The experimental results showed that the recognition results of improved algorithm are better than the ones of radial basis function (RBF) neural network using Kmeans clustering algorithm to select the centroid. And it has much better robustness. Keywords: T-S fuzzy neural network; speech recognition; fuzzy rules. 1. INTRODUCTION The main aim of this paper is to present an improved algorithm of T-S(Takagi-Sugeno) fuzzy neural network model, which can be applied into the speech recognition system. Fuzzy neural network (FNN) combined by neural network and fuzzy system, not only can mimic the human brain logic thinking, but also has the ability of processing simultaneously quantitative and qualitative knowledge of artificial neural network [4].The characteristic parameters of the speech signal will produce inaccurate and incomplete information in the process of quantification and transfer. Therefore, the speech recognition lacks of semantic character. The concept of membership function in fuzzy theory can compensate for these shortcomings to some degree and provide more comprehensive information for the system to enhance the robustness of speech recognition. 2. BACKGROUND Speech recognition also known as automatic speech recognition or computer speech recognition converts spoken words to machine-readable input for example to key presses, using the binary code for a string of character codes. The term "voice recognition" is sometimes used to refer to speech recognition where the recognition system is trained to a particular speaker as is the case for most desktop recognition software; hence there is an aspect of speaker recognition, which attempts to identify the person speaking, to better recognize what is being said. Speech recognition is a broad term which means it can recognize almost anybody’s speech such as a call centre system designed to recognize many voices. Voice recognition is a system trained to a particular user, where it recognizes their speech based on their unique vocal sound. Speech recognition applications include voice dialing (e.g., "Call home"), call routing (e.g., "I would like to make a collect call"), demotic appliance control and contentbased spoken audio search (e.g., find a pod cast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g., a radiology report), speech-to-text processing (e.g., word processors or emails), and in aircraft cockpits usually termed Direct Voice Input.Speech recognition can be of two types based on the grammar that the recognition is based on. (Grammar is in other words the list of possible recognition outputs that can be generated)Command and Control Dictation In a command and control scenario a developer provides a limited set of possible word combinations, and the speech recognition engine matches the words spoken by the user to the limited list. In command and control the accuracy of recognition is very high.It is always better for applications to implement command and control as the higher accuracy of recognition makes the application respond better.In Dictation mode the recognition engine compared the input speech to the whole list of the dictionary words. For the dictation mode to have a high accuracy of recognition is it important that the user has prior trained the recognition engine by speaking in to it. 3. LITERATURE SURVEY A. Neural Network A fuzzy neural network or neuro-fuzzy system is a learning machine that finds the parameters of a fuzzy system (i.e., fuzzy sets, fuzzy rules) by exploiting approximation techniques from neural networks. Combining fuzzy systems with neural 284 VOL. 1, NO. 9, December 2011 ISSN 2222-9833 ARPN Journal of Systems and Software ©2009-2011 AJSS Journal. All rights reserved http://www.scientific‐journals.org networks. Both neural networks and fuzzy systems have some things in common. They can be used for solving a problem (e.g. pattern recognition, regression or density estimation) if there does not exist any mathematical model of the given problem. They solely do have certain disadvantages and advantages which almost completely disappear by combining both concepts. Neural networks can only come into play if the problem is expressed by a sufficient amount of observed examples. These observations are used to train the black box. On the one hand no prior knowledge about the problem needs to be given. On the other hand, however, it is not straightforward to extract comprehensible rules from the neural network's structure. On the contrary, a fuzzy system demands linguistic rules instead of learning examples as prior knowledge. Furthermore the input and output variables have to be described linguistically. If the knowledge is incomplete, wrong or contradictory, then the fuzzy system must be tuned. Since there is not any formal approach for it, the tuning is performed in a heuristic way. This is usually very time consuming and error-prone. D. Performance of speech recognition systems can recognize a small number of words (for instance, the ten digits) as spoken by most speakers. Such systems are popular for routing incoming phone calls to their destinations in large organizations.Both acoustic modeling and language modeling are important parts of modern statistically-based speech recognition algorithms. Hidden Markov models (HMMs) are widely used in many systems. Language modeling has many other applications such as smart keyboard and document classification. Performance of speech recognition systems is typically described in terms of word error rate, E, defined as: E= ((S+I+D)/N)*100) (2.1) Where N is the total number of words in the test set, and S, I, and D are the total number of substitutions, insertions, and deletions, respectively [2]. 4. IMPLEMENTATION Determining the rule numbers of reasoning layer The rule numbers of reasoning layer is set by the subtraction Verification Value: clustering algorithm through extracting possible cluster center The performance of speech recognition systems is usually specified in terms of accuracy and speed. Accuracy may be measured in terms of performance accuracy which is usually rated with word error rate (WER), whereas speed is measured with the real time factor. Other measures of accuracy include Single Word Error Rate (SWER) and Command Success Rate (CSR).Most speech recognition users would tend to agree that dictation machines can achieve very high performance in controlled conditions. There is some confusion, however, over the interchangeability of the terms "speech recognition" and "dictation". Commercially available speaker-dependent dictation systems usually require only a short period of training (sometimes also called `enrollment') and may successfully capture continuous speech with a large vocabulary at normal pace with a very high accuracy. Most commercial companies claim that recognition software can achieve between 98% to 99% accuracy if operated under optimal conditions. `Optimal conditions' usually assume that users: of the input data and taking the average of all training data’s data, have speech characteristics which match the training can achieve proper speaker adaptation, and Work in a clean noise environment (e.g. quiet office or laboratory space). center number for the network inference layer’s rule number. The algorithm is as follows: Suppose (x1,x2,…, xn) are n data points in M-dimensional space. Assuming each data point is a candidate for a cluster center. Therefore, the density index at data point xi is defined as (6.1) Where γa is positive and defined as neighborhood of the point.The data points outside the radius contribute little to the density index of the point. First calculate the density index of each data point. Then select the data point with the highest density index as the first cluster center. Let x(1)c1 is the selected point and D(1) c1 is its density index. Then the density index of data point xi can be updated as This explains why some users, especially those whose speech is heavily accented, might achieve recognition rates much lower than expected. Speech recognition in video has become a popular search technology used by several video search companies.Limited vocabulary systems, requiring no training, 285 VOL. 1, NO. 9, December 2011 ISSN 2222-9833 ARPN Journal of Systems and Software ©2009-2011 AJSS Journal. All rights reserved http://www.scientific‐journals.org (6.2) Where γb is positive and k is cycle number. Obviously the density index of the data points near the first cluster center xc1 reduces significantly. So these points cannot be the next cluster center. γb is a constant and defines a neighborhood in which the density index function is decreased significantly. Usually γb is greater than γa. Generally, we set γb=1.5 γa. First correct the density index of each data point. Then select the next cluster center xc2 and modify the clustering point’s density index again. The process repeats until all cluster centers are generated.When influence scope of per-dimensional data clustering center is determined, the number of per-dimensional data clustering center is obtained. Take the average of the number of clustering center for reasoning rule numbers of T-S fuzzy neural network. (6.6) (6.7) Inference algorithm for reasoning layer (6.8) Form equation (4), we can see that, because of 0<µij≤1, α will tend to minimum or even zero resulting in abnormal implementation of the fuzzy reasoning when the input data dimension is too large. To solve this problem, we can add a compensated factor Nadj to membership. Nadj is usually determined by experiment and relating to the input dimension. In this condition, equation (4) can be updated as (6.9) (6.3) where δ(4)i , δ(3)j and δ(2)ij are the first-order gradient of cost function respectively in the fourth, the third and the second layer. Δwij, Δcij and Δσij are adjusting error of wij, cij and σij respectively. So the T-S fuzzy neural network can still complete correctly fuzzy reasoning function when input data dimension is large[1]. Parameter adjusting algorithm for improved T-S fuzzy neural network: There are three categories of parameters needing adjusting in the network. The first one is the weights coefficients wij between the third layer and fourth layer, which represent defuzzy parameters. And the second and third categories of parameters are the center value cij and the width σij of the Gaussian membership function. As T-S network essentially is a multilayer feed-forward network, we can design parameters adjustment algorithm imitating back propagation (BP) network and using error back propagation algorithm. Assume error cost function is calculated as Figure 1: Speech Recognition System Window (6.4) where, ti and yi respectively represent the desired output the actual output. (6.5) 286 VOL. 1, NO. 9, December 2011 ISSN 2222-9833 ARPN Journal of Systems and Software ©2009-2011 AJSS Journal. All rights reserved http://www.scientific‐journals.org This form shows the main screen of the speech recognition system Figure 5: Playing of the Sound File. This form shows the result of the play file button clicked the user can hear the sound from the speaker. Figure 2: Spoken Words Files This form shows the dialog which opens when the file preview button is clicked Figure 6 :Training of the Two Spoken Words. This form shows the results of the Train Neural network button clicked and the result can be seen in the File trained box displaying the files which have been trained Figure 3: Spoken Word File Path This form shows that the text filed contains the wave file which is selected by the user. Figure 7: Training of the Five Spoken Words. This form shows the results such as the input vector length, output vector length and training sample count as well as the error rate of five files trained. Figure 4: Feature Extraction Values of Spoken Word This form shows the results of the read features button clicked and the result is displayed in the right multi line text box. Figure 8: Training of the Eight Spoken Words. 287 VOL. 1, NO. 9, December 2011 ISSN 2222-9833 ARPN Journal of Systems and Software ©2009-2011 AJSS Journal. All rights reserved http://www.scientific‐journals.org This form shows the results such as the input vector length, output vector length and training sample count as well as the error rate of 8 files trained. Figure 12: Recognized Text Of First Spoken Word File. Figure 9: Training Done Window. This form shows the results such as the input vector length, output vector length and training sample count as well as the error rate after training all the files. Figure 10: Average Error Rate Window. This form shows the results such as the input vector length, output vector length and training sample count as well as the error rate and finally displays the Average Error rate. This form shows the results of input file recognized and the result is displayed below the Recognized text control. Figure 13: Recognized Text Of Third Spoken Word File. This form shows the results of input file recognized and the result is displayed below the Recognized text control. Figure 14: Recognized Text Of Twenty Fourth Spoken Figure 11: Retrieved input file details. Word File. This form shows the results of input file recognized and the result is displayed below the Recognized text control. This form shows the results of retrieve input file details such as average bytes per second and other information related to the input query file. 288 VOL. 1, NO. 9, December 2011 ISSN 2222-9833 ARPN Journal of Systems and Software ©2009-2011 AJSS Journal. All rights reserved http://www.scientific‐journals.org Figure 15: Recognized Text Of Fifth Spoken Word File. 5. CONCLUSION ENHANCEMENT AND FUTURE This paper presented an improved algorithm of T-S fuzzy neural network. Its obvious characteristic is able to directly present logic, suitable for direct or advanced knowledge presentation and has better logic performance. But fuzzy logic system cannot automatically generate and adjust membership function and rules. Fuzzy neural network (FNN) combined by neural network and fuzzy system, not only can mimic the human brain logic thinking, but also has the ability of processing simultaneously quantitative and qualitative knowledge of artificial neural network. The characteristic parameters of the speech signal will produce inaccurate and incomplete information in the process of quantification and transfer. Therefore, the speech recognition lacks of semantic character. The concept of membership function in fuzzy theory can compensate for these shortcomings to some degree and provide more comprehensive information for the system to enhance the robustness of speech recognition. [6] Derleth, R.P.,” Temporal and compressive properties of the normal and impaired auditory system”. Ph.D. thesis,Universit at Oldenburg, 1999. [7] Gelin, P., Junqua, J.-C.,“Techniques for robust speech recognition in the car environment”. In: Proc. Eurospeech , Budapest, Hungary, Vol. 6, pp. 2483±2486, 1999. [8] Francis, I.F., Anderson, T.R..,“Binaural phoneme recognition using the auditory image model and cross-correlation”. In: Proc. ICASSP pp. 1231±1234, 1997. [9] Dau, T., Kollmeier, B., Kohlrausch, A., ”Modeling auditory processing of amplitude modulation”,I+II.J. Acoust. Soc. Am. 102 (5), 2892±2919, 1997. [10] Junqua, J.-C.; Haton, J.-P.,“Robustness in Automatic Speech Recognition Fundamentals and Applications”, Kluwer Academic Publishers, 1995. AUTHORS Mr.A.Vijaykumar Graduated in Computer Science and Engineering. From Jawaharlal Nehru Technological University Hyderabad, India and M.Tech in Computer Science and Engineering from Acharaya Nagarjuna University Guntur, A.P., India .He is working presently as Assistant Professor in Department of C.S.E in Hyderabad Institute of Technology and Management (HITAM), R.R.Dist, INDIA, A.P. He has 5 years of Experience. His Research areas Automata theory, Compiler design, Neuralnetworks and Networking. [email protected] REFERENCES [1] Chia-Feng Juang, and Chun-I Lee, “A Fuzzified Neural Fuzzy Inference Network for Handling Both Linguistic and numerical information Simultaneously”, Neuro Computing, doi:10.1016/j.neucom, 2007. [2] Grabianowski, Ed. "How Speech Recognition Works.", 10 November 2006. [3] Hitoshi Iyatom, and Masafumi hagiwara, “Adaptive Fuzzy Inference Neural network”, Pattern Recognition, doi:10.1016/j.patcog.2004.04.003, 2004. [4]Yaonan Wang, “Intelligent Information Processing Technology”, Higher Education Press, Beijing, 2003. [5] N.K. Kasabov, Q. Song, DENFIS: “dynamic evolving neural-fuzzy inference system and its application for time-series prediction”, IEEE Trans.Fuzzy Syst.14–154, 2002. Ms.Aruna Graduated in B.Sc(Computer Science) From Osmania University Hyderabad, India and M.Sc(Mathematics)From Osmani University Hyderabad, A.P., India. She is working presently as Assistant Professor in Department of H&BS in ACE COLLEGE OF ENGINEERING, Ghatkesar, R.R.Dist, INDIA, A.P. She has 4 years of Experience. Her Research areas include Differential equations, Linear algebra, Neural networks and Networking. [email protected] 289 VOL. 1, NO. 9, December 2011 ISSN 2222-9833 ARPN Journal of Systems and Software ©2009-2011 AJSS Journal. All rights reserved http://www.scientific‐journals.org Mr.M.VIJAYAPAL REDDY,Asst Prof (c), M.Tech (Computer Science and Engineering ) from Acharaya Nagarjuna University Guntur, A.P., India.& MCA from Kakatiya University Warangal,A.P , M.A.(Sociology) from Osmania University .He worked has Head of dept for MCA & guided many students in projects, presently working as Head of dept for BCA( foreign Batch students) in OSMANIA UNIVERSITY P.G. COLLEGE ,Secunderabad, A.P,INDIA, He has 7 years of Experience in field of Computer Science, His Research areas Data Base Management System, Neural Networks and Networking. [email protected] 290