Download III. Algorithm and Data structure - Academic Science,International

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Lattice model (finance) wikipedia , lookup

Red–black tree wikipedia , lookup

Quadtree wikipedia , lookup

Linked list wikipedia , lookup

Interval tree wikipedia , lookup

B-tree wikipedia , lookup

Binary search tree wikipedia , lookup

Transcript
Medicine Recommendation System
Anuraag Advani, Ankita Gandhi, Harita Jagad, Prof. Abhijit Patil
[email protected],[email protected],[email protected],[email protected]
( 9769084977 )
(9769794127)
( 9820325888)
(8080320201)
Department of Computer Engineering
Dwarkadas.J.Sanghvi College of Engineering,Vile Parle(West), Mumbai
Abstract— Autocomplete is one of the most pervasive and
most studied algorithms in computer science. The
proposed system will be implemented for paediatricians to
improve both speed and accuracy of the predictions being
made for a particular medicine in order to assist them with
prescribing medicines even if they remember it’s partial or
the generic name. The results predicted will adapt based
on history of previously selected results. We plan to use a
modification of the Trie data structure to accomplish this.
The search time depends on the length of the string being
searched and it is independent of the database. Thus, we
aim to make an efficient algorithm to provide accurate
results and quicken the process of medicine prescription.
Keywords—recommendation system, trie, data strutures,
priority lists, autocomplete
I. INTRODUCTION
One of the common characteristics of drug names is that they
are hard to remember or may have jargons and homophones
that naturally have similar way to spell. It is also difficult for a
doctor to remember names for a large number of medicines. A
standard search in a database is usually slow especially if the
database contains a large number of entries. Also such a search
would be static and give the same result for a given query. Our
proposed system will help the doctor search for medication in a
swift manner using either brand or generic names. Also given
the generic name for a medicine a record of all brand names
belonging to that generic class will be generated.
To implement the above system we propose the use of Tries- a
data structure, which would be modified to suit our application.
II.Previous Work
The paper [1] discusses trie to optimize search and provide
faster retrieval. They discuss a method to reduce retrieval time
using fast p tries and later go on to discuss how the space
complexity of the system can be reduced. The discussed
approach is based on finding the exact matching string or a
string similar to it.
In [2] a Segment Tree is discussed in which each leaf node is
given a priority and based on the priority of the leaf node
suggestions are given. Each node contains the value of
maximum priority of its descendant .But this approach can be
applicable only for a selection of phrases.
Also the paper[3] shows many methods to implement
autocomplete efficiently. The paper also mentions a trie based
approach in which the edit distance is maintained between
various nodes in the trie and based on this an error tolerant
implementation is suggested. For a string to be completed the
algorithm traverses the trie for each character of that string,
then the edit distance between its parents and siblings is
calculated and all the distances less than threshold are given as
output. Not only is the retrieval time of this algorithm more
than the proposed algorithm, the algorithm will give results
only based on the distance between two words while our
proposed algorithm will modify its output depending on what
the user selects when the user types a given substring.
The two main features to implement such a search strategy are:
1.
2.
Speed:- The suggestions given by the algorithm
should be faster than the typing speed, thus immediate
suggestions are required, even at the cost of space. If
the user can type faster, than we provide suggestion,
our suggestions will be futile.
Accuracy:- Accuracy certainly plays an important
part. A fast result is useless if it is inaccurate, as the
user won’t select it and end up typing the entire wordwhat we don’t want in the first place. To provide
accurate results we must correctly predict what the
user wishes to prescribe according to partial name
typed and condition diagnosed.
III. ALGORITHM AND DATA STRUCTURE
For the data structure we are using a trie which we have
modified to suit our application. To provide a fast lookup our
algorithm has the time complexity of O(m), to search for a
string of length m and provide the needed suggestions. The
term trie stand for information retrieval. A trie is a tree which
is empty at its root, as we go from the root lo leaves- a word
from the database is formed. The interior node contain
prefixes of the word such that the parent node precedes the
child node in the prefix.
1.
Standard Implementation of tries
A trie is generally used to search if a word exists in the
database or not. This can be done by navigating from the root
to leaves using each character in the word in a sequence.
Finally we can check if the node reached is a
the tree is not changed by insertions or deletions which will be
a rare scenario. A given string will always give the same
results and a dfs will always give preference to the leftmost
branch. Also the search time is proportional to O(m+b(d-m))
Figure 1
Figure 2
leaf to determine if the word is complete. For example if we
want to search for Crocin in the above trie, we start from the
root which is initially empty. The first letter is C therefore we
navigate to node C which is a child of the root. For the next
letter- r we navigate to the appropriate child of node c. This
continues till node n is reached. Now finally we check if n is a
leaf node. If it is a leaf node we can say that the work exists in
the database. The algorithm is as follows:
1.
2.
3.
4.
5.
6.
Accept string str that needs to be searched in database
CurrNode= root
For each character char in str do 4
CurrNode= CurrNode->child(char)
If CurrNode is leaf terminate with success
Else output “String Not Found”
2) Tries For Autocomplete
The above method is used only to check if a given string is
present in a database and can be a good search method to
check if a username or password is present in a database. But
for autosuggest we need to modify the standard way in which
a trie is used. For autosuggest we can easily travel from the
root to the typed characters as seen in the standard
implementation. But after reaching that character append the
characters obtained from the depth first search of the rest of
the tree. This approach is logical- however is static, as long as
where m is the length of the string that is typed, d is the depth
at which the word is present and b is the branching factor.
3) Our approach
In our approach we associate each node with a priority list.
The first 6 elements in the list are initialized with the dfs of the
tree. The 7th element is used as a buffer and is initially empty.
The 8 th element is the last searched element when the given
string was entered- irrespective of its priority. Each node is
given a priority,the initial elements have high priority while
the latter ones have low priority. Each time when a search is
executed two things are taken into account. First, all the
characters that were typed for searching a medicine. Second,
what was the medicine that was searched for-searchedMed.
After, the search is completed the priorities for each node that
was traversed is updated as follows:
1) If searchedMed is present in Currnode->list set
searchedMed[index] and go to 4
2) Currnode->list[8]=searchedMed
3) searchedMed[index]=findEmpty()
4) Currnode->list>priority(searchedMed[index])=Currnode->list>priority(searchedMed[index])+1
5) Currnode=nextNode()
Where nextnode() returns the set of nodes that were traversed
to finally reach the searched medicine and findEmpty() returns
the index of the first element in the list which is empty else it
will return 8. Thus, if any of the elements are empty the
searchedMed is placed at that position in the list or the
searchedMed is placed at the eight position. For example,
Thus using this algorithm the trie modifies its output
depending on the elements that are searched by the user. If the
list is completely full, the 8th element in the list is replaced by
the searched medicine. Moreover, the output given by this
method is in order of most frequently searched elements
making it easier for doctors to choose the required drug.
4) Mapping the generic names to brands
In each node of the trie we will also store a list generic with
indicates if an item in the list containing medicine names is a
generic name or a brand name. If the value at that position is 1
then it is a generic else it is a brand. For each element in list for
Figure 3
consider cro is typed- the algorithm traverses to the o node as
we have already seen. As the list is initialized using the dfs of
the trie structure the first element is Crocin and the second is
Cromolyn. This output is the same as that obtained in the
previous section. But the time complexity is only O(m), thus
even in this case the speed is increased.
Now suppose the user was searching for Sinarest and had
typed Sy instead of Si and also consider that this error is quite
common. When the user types Sy initially the webpage will
output the list containing the drugs Synthroid, Symax and
Syllact. This is also the output that was observed when the trie
was modified for autocomplete. The user might rectify this
error and select the correct drug. Thus, when the user selects
Sinarest the trie will update. As the node y, which is reached
after traversing S-y, contains 3 drug names in its priority list
Sinarest is added at the fourth position in the list. Thus, when
a user types Sy next Sinarest is present in the suggestions and
can be easily chosen by user.
Figure 4
a node, if the name is found to be a generic the database is
scanned. This is a relational database which is maintained and
is required to store the information about the drug like it’s
dosage, side effects and also the generic class of the brand.
Now the generic class for each brand is searched and compared
with the name of this element. If a match is found the brand
name is also given in the list of suggestions.
The above mentioned process is followed to increase the utility
of our system. Thus for a suggestion that is a generic name, a
list of brand names belonging to that generic is provided. For
example, if doctor types Acetaminophen- Tylenol,
Paracetamol, Panadol and Mapap are also given as output.
Thus the doctor can easily select the required brand name by
just typing the generic name.
IV.Expected Results
As stated above our algorithm will provide a speed complexity
of O(m) where m is the length of the string that has been typed
by the user. This is really high and is independent of the size of
the database. A standard brute force algorithm would have to
scan the entire database to find the strings that match the given
prefix. So it’s complexity is O(n) where n is the size of the
database. Normally m<<n thus the algorithm outperforms the
brute force algorithm.
Other algorithms like...idk would have to scan at least the
length of the typed string to make the predictions. Also many
black box methods for autocomplete are available like Lucene,
Solr, Sphinx and Redis, by implementing these we lose
flexibility. Thus the proposed approach is close enough to the
best possible implementation for this application. Although, it
has high space complexity, but overall it gives good results as
only medicine names are stored in the trie. In addition, the
proposed method also updates the data structure based on what
is more frequently searched and can also overcome spelling
errors. In the long run after being implemented the system will
stabilize and only become more efficient. Thus the
intuitiveness and briskness in the system will be one of the
major factors for it to replace the manual process for providing
prescriptions.
functionalities like brand name suggestions using generic
names, correction of spelling errors, and an option for quick
printing of prescription.
VI.Refrences:
[1] Willard, Dan E. "New trie data structures which support
very fast search operations." Journal of Computer and System
Sciences 28.3 (1984): 379-394.
V.Conclusion
Thus our proposed system will give appropriate suggestions to
doctors using trie data structure along with priority lists
attached to each node. Thus we see how the data structure can
improve the speed of search and also make the search more
flexible for a medical application. The data structure can
similarly be modified or used as is in other applications as well.
This system with our user friendly GUI will be of tremendous
use to doctors and facilitate them in a lot of ways by providing
[2] Dhruv Matani “An O(k logn) algorithm for prefix based
ranked autocomplete.”
[3] "Extending autocompletion to tolerate errors." Proceedings
of the 2009 ACM SIGMOD International Conference on
Management of data 29 Jun. 2009: 707-718. Surajit Chaudhuri
and Raghav Kaushik (Microsoft Research).