Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Amin Mubark Alamin Ibrahim* et al. (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH Volume No.2, Issue No.6,October – November 2014, 1543 – 1546. AMIN MUBARK ALAMIN IBRAHIM Assistant professor, Faculty of Computer Science and Information Technology, Alneelian university,Khartoum, Sudan Collaborator at the University of shagra KSA, Department of mathematics, Faculty of Education, Shagra University, KSA Abstract-This paper aims at designing Algorithm for Prefix stemming of English words Language . That is because accomplish the process of stemming of the reduction in the area of computer memory especially there are many words that differ beginning but similar roots and the primary beneficiary of this algorithm information retrieval systemsThe results referring to reduce the size of the words by 29%Scalable depending on the number of words that have been stemmed. Key words: stemming words, prefix stemming I. INTRODUCTION Information retrieval is the activity of obtaining information resources relevant to an information need from a collection of information resources.An information retrieval process begins when a user enters a query into the system. Queries are formal statements of information needs, for example search strings in web search engines. In information retrieval a query does not uniquely identify a single object in the collection. Instead, several objects may match the query, perhaps with different degrees of relevancy.An object is an entity that is represented by information in a database. User queries are matched against the database information. Depending on the application the data objects may be, for example, text documents, images,[1] audio,[2]mind maps[3] or videos. Often the documents themselves are not kept or stored directly in the IR system, but are instead represented in the system by document surrogates or metadata.Most IR systems compute a numeric score on how well each object in the database matches the query, and rank the objects according to this value. The top ranking objects are then shown to the user. The process may then be iterated if the user wishes to refine the query.[4]A prefix is an affix which is placed before the stem of a word.[5]Of importance during the stemming words for information retrieval systems, the researcher designed an algorithm for prefix stemming to an English words. II. THE CONCEPT OF STEMMING WORDS The definition of a stem is the main stalk of a plant[6] III. GENERAL DESCRIPTION OF ALGORITHM Bellow the description to prefix stemming algorithm of English language words, the rule of prefix stemming algorithm is stated as follows: (condition ) S1- > S2 This means that if the word start with increase of (S1) and origin after the increase is identical to the following condition: S *: means that the origin of the word possible to start with S. Delete from floor to increase the word appears in the abstract (S2) and then solve (S2) replace (S1). And required to length of remaining after abstraction two letters or more. IV. STEPS OF ALGORITHM IN DETAILS a- Are floor length less than or equal to two letters if the answer yes go to step (e). Otherwise, go next step. b- Is the word be divested by tribal increase if they do not go to step(e). And if yes then tested following conditions: dis-> disagree->agree im-> impassible->passable in-> inhale->hale Page | 1543 @ 2013 http://www.ijitr.com All rights Reserved. 2320 –5547 Amin Mubark Alamin Ibrahim* et al. (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH Volume No.2, Issue No.6,October – November 2014, 1543 – 1546. bi-> bicycle->cycle non-> non interactive->interactive pre-> preorder->order mis-> misbehave->behave (un and *s!=v)->unable->able Re-> reread->re de-> descend->scend tri-> triangle->angle il-> illegal->legal ir-> irregular->regular super-> superimpose->impose sub-> submarine->marine (com and *s!=m )->compact->pact (con and *s!=n) ->contact->tact (mid and *s!=d)->midterm->term Middle->middle anti-> antivirus->virus hyper-> hyperlink->link c- Is length the floor after abstraction less than or equal to two letters. If you answered yes go to(d) otherwise go to(e). d. Save the floor without stripping and go to (f) e. Save the word. f. End. V. PRACTICAL EXAMPLE FOR ALGORITHM Applied following example is a table (table1) containing the words models contain increase prefix and when we are using the algorithm has been stemming. The results referring to reduce the size of the words by 29% compared to size before using algorithm. This is also shown in Figure (1-1) TABLE1:CONTAIN WORDS AND SIZE OF WORDS BEFORE AND AFTER USING PREFIX STEMMING ALGORITHM Words before prefix stemming Number of bytes before algorithm Words after prefix stemming Number of bytes after algorithm disabilities 12 abilities 9 disaccord 9 accord 6 disaffect 9 affect 9 disadvantage 12 advantage 9 disaffirm 9 affirm 6 inable 6 able 4 inactive 8 active 6 inaffable 9 affable 7 incase 6 case 4 inconstant 10 constant 8 Page | 1544 @ 2013 http://www.ijitr.com All rights Reserved. 2320 –5547 Amin Mubark Alamin Ibrahim* et al. (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH Volume No.2, Issue No.6,October – November 2014, 1543 – 1546. impedance 9 pedance 7 impeccable 10 peccable 8 impeding 8 peding 6 imperfectly 11 perfectly 9 impossible 10 possible 8 nonabrasive 11 abrasive 8 nonacidic 9 acidic 6 nonaction 9 action 6 nonactor 8 actor 5 nonabsorbable 13 absorbable 10 preadamic 9 adamic 6 prebatch 8 batch 5 preboard 7 board 5 Prebook 7 book 4 Preborn 7 born 4 superable 9 able 4 superactivity 13 activity 8 superaddition 13 addition 8 superbank 9 bank 9 superpad 8 pad 3 total 278 197 Figure(1-1) Represents a comparison between the size of the data before and after the use of the algorithm Page | 1545 @ 2013 http://www.ijitr.com All rights Reserved. 2320 –5547 Amin Mubark Alamin Ibrahim* et al. (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH Volume No.2, Issue No.6,October – November 2014, 1543 – 1546. VI. RESULTS AND DISCUSSION The researcher has designed prefix stemming algorithm. Referring the most important results of the application of the algorithm that it contributes effectively to reduce the space occupied by the word of the memory of the computerespecially there are many words that differ beginning but similar roots and the primary beneficiary of this algorithm information retrieval systemsThe results referring to reduce the size of the words by 29%Scalable depending on the number of words that have been stemmed. Increases effectiveness of the algorithm, when combined with the suffix stemming algorithm dimensional reduce space and speed up indexing and search, especially when combined with Information Retrieval Systems so as to further search VII. REFERENCES [1] Goodrum, Abby A. (2000). "Image Information Retrieval: An Overview of Current Research". Informing Science 3 (2). [2] Foote, Jonathan (1999). "An overview of audio information retrieval". Multimedia Systems (Springer). [3] Beel, Jöran; Gipp, Bela; Stiller, Jan-Olaf (2009). "Information Retrieval On Mind Maps - What Could It Be Good For?". Proceedings of the 5th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom'09). Washington, DC: IEEE. [4] Frakes, William B. (1992). Information Retrieval Data Structures & Algorithms. Prentice-Hall, Inc. ISBN 0-13-463837-9. [5] A B VedranaMihalicek ed. (2011). Language Files, 11th Edition.Ohio State University. pp. 152–153. [6] http://www.yourdictionary.com/stem Page | 1546 @ 2013 http://www.ijitr.com All rights Reserved. 2320 –5547