Download Print this article

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Amin Mubark Alamin Ibrahim* et al.
(IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH
Volume No.2, Issue No.6,October – November 2014, 1543 – 1546.
AMIN MUBARK ALAMIN IBRAHIM
Assistant professor, Faculty of Computer Science and Information Technology, Alneelian university,Khartoum,
Sudan
Collaborator at the University of shagra KSA, Department of mathematics, Faculty of Education, Shagra
University, KSA
Abstract-This paper aims at designing Algorithm for Prefix stemming of English words Language . That
is because accomplish the process of stemming of the reduction in the area of computer memory
especially there are many words that differ beginning but similar roots and the primary beneficiary of
this algorithm information retrieval systemsThe results referring to reduce the size of the words by
29%Scalable depending on the number of words that have been stemmed.
Key words: stemming words, prefix stemming
I.
INTRODUCTION
Information retrieval is the activity of obtaining information resources relevant to an information need from a
collection of information resources.An information retrieval process begins when a user enters a query into the
system. Queries are formal statements of information needs, for example search strings in web search engines. In
information retrieval a query does not uniquely identify a single object in the collection. Instead, several objects
may match the query, perhaps with different degrees of relevancy.An object is an entity that is represented by
information in a database. User queries are matched against the database information. Depending on
the application the data objects may be, for example, text documents, images,[1] audio,[2]mind maps[3] or
videos. Often the documents themselves are not kept or stored directly in the IR system, but are instead
represented in the system by document surrogates or metadata.Most IR systems compute a numeric score on how
well each object in the database matches the query, and rank the objects according to this value. The top ranking
objects are then shown to the user. The process may then be iterated if the user wishes to refine the
query.[4]A prefix is an affix which is placed before the stem of a word.[5]Of importance during the stemming
words for information retrieval systems, the researcher designed an algorithm for prefix stemming to an English
words.
II.
THE CONCEPT OF STEMMING WORDS
The definition of a stem is the main stalk of a plant[6]
III.
GENERAL DESCRIPTION OF ALGORITHM
Bellow the description to prefix stemming algorithm of English language words, the rule of prefix stemming
algorithm is stated as follows:
(condition ) S1- > S2
This means that if the word start with increase of (S1) and origin after the increase is identical to the following
condition:
S *: means that the origin of the word possible to start with S.
Delete from floor to increase the word appears in the abstract (S2) and then solve (S2) replace (S1). And required
to length of remaining after abstraction two letters or more.
IV.
STEPS OF ALGORITHM IN DETAILS
a- Are floor length less than or equal to two letters if the answer yes go to step (e). Otherwise, go next step.
b- Is the word be divested by tribal increase if they do not go to step(e). And if yes then tested following
conditions:
dis-> disagree->agree
im-> impassible->passable
in-> inhale->hale
Page | 1543
@ 2013 http://www.ijitr.com All rights Reserved.
2320 –5547
Amin Mubark Alamin Ibrahim* et al.
(IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH
Volume No.2, Issue No.6,October – November 2014, 1543 – 1546.
bi-> bicycle->cycle
non-> non interactive->interactive
pre-> preorder->order
mis-> misbehave->behave
(un and *s!=v)->unable->able
Re-> reread->re
de-> descend->scend
tri-> triangle->angle
il-> illegal->legal
ir-> irregular->regular
super-> superimpose->impose
sub-> submarine->marine
(com and *s!=m )->compact->pact
(con and *s!=n) ->contact->tact
(mid and *s!=d)->midterm->term
Middle->middle
anti-> antivirus->virus
hyper-> hyperlink->link
c- Is length the floor after abstraction less than or equal to two letters. If you answered yes go to(d) otherwise go
to(e).
d. Save the floor without stripping and go to (f)
e. Save the word.
f. End.
V.
PRACTICAL EXAMPLE FOR ALGORITHM
Applied following example is a table (table1) containing the words models contain increase prefix and when we
are using the algorithm has been stemming. The results referring to reduce the size of the words by 29%
compared to size before using algorithm. This is also shown in Figure (1-1)
TABLE1:CONTAIN WORDS AND SIZE OF WORDS BEFORE AND AFTER USING PREFIX STEMMING ALGORITHM
Words before prefix
stemming
Number of bytes before
algorithm
Words after prefix
stemming
Number of bytes after
algorithm
disabilities
12
abilities
9
disaccord
9
accord
6
disaffect
9
affect
9
disadvantage
12
advantage
9
disaffirm
9
affirm
6
inable
6
able
4
inactive
8
active
6
inaffable
9
affable
7
incase
6
case
4
inconstant
10
constant
8
Page | 1544
@ 2013 http://www.ijitr.com All rights Reserved.
2320 –5547
Amin Mubark Alamin Ibrahim* et al.
(IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH
Volume No.2, Issue No.6,October – November 2014, 1543 – 1546.
impedance
9
pedance
7
impeccable
10
peccable
8
impeding
8
peding
6
imperfectly
11
perfectly
9
impossible
10
possible
8
nonabrasive
11
abrasive
8
nonacidic
9
acidic
6
nonaction
9
action
6
nonactor
8
actor
5
nonabsorbable
13
absorbable
10
preadamic
9
adamic
6
prebatch
8
batch
5
preboard
7
board
5
Prebook
7
book
4
Preborn
7
born
4
superable
9
able
4
superactivity
13
activity
8
superaddition
13
addition
8
superbank
9
bank
9
superpad
8
pad
3
total
278
197
Figure(1-1) Represents a comparison between the size of the data before and after the use of the algorithm
Page | 1545
@ 2013 http://www.ijitr.com All rights Reserved.
2320 –5547
Amin Mubark Alamin Ibrahim* et al.
(IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH
Volume No.2, Issue No.6,October – November 2014, 1543 – 1546.
VI.
RESULTS AND DISCUSSION
The researcher has designed prefix stemming algorithm. Referring the most important results of the application of
the algorithm that it contributes effectively to reduce the space occupied by the word of the memory of the
computerespecially there are many words that differ beginning but similar roots and the primary beneficiary of
this algorithm information retrieval systemsThe results referring to reduce the size of the words by 29%Scalable
depending on the number of words that have been stemmed. Increases effectiveness of the algorithm, when
combined with the suffix stemming algorithm dimensional reduce space and speed up indexing and search,
especially when combined with Information Retrieval Systems so as to further search
VII.
REFERENCES
[1]
Goodrum, Abby A. (2000). "Image Information Retrieval: An Overview of Current Research". Informing
Science 3 (2).
[2]
Foote, Jonathan (1999). "An overview of audio information retrieval". Multimedia Systems (Springer).
[3]
Beel, Jöran; Gipp, Bela; Stiller, Jan-Olaf (2009). "Information Retrieval On Mind Maps - What Could It
Be Good For?". Proceedings of the 5th International Conference on Collaborative Computing:
Networking, Applications and Worksharing (CollaborateCom'09). Washington, DC: IEEE.
[4]
Frakes, William B. (1992). Information Retrieval Data Structures & Algorithms. Prentice-Hall, Inc. ISBN
0-13-463837-9.
[5]
A B VedranaMihalicek ed. (2011). Language Files, 11th Edition.Ohio State University. pp. 152–153.
[6]
http://www.yourdictionary.com/stem
Page | 1546
@ 2013 http://www.ijitr.com All rights Reserved.
2320 –5547