Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Medicine Recommendation System Anuraag Advani, Ankita Gandhi, Harita Jagad, Prof. Abhijit Patil [email protected],[email protected],[email protected],[email protected] ( 9769084977 ) (9769794127) ( 9820325888) (8080320201) Department of Computer Engineering Dwarkadas.J.Sanghvi College of Engineering,Vile Parle(West), Mumbai Abstract— Autocomplete is one of the most pervasive and most studied algorithms in computer science. The proposed system will be implemented for paediatricians to improve both speed and accuracy of the predictions being made for a particular medicine in order to assist them with prescribing medicines even if they remember it’s partial or the generic name. The results predicted will adapt based on history of previously selected results. We plan to use a modification of the Trie data structure to accomplish this. The search time depends on the length of the string being searched and it is independent of the database. Thus, we aim to make an efficient algorithm to provide accurate results and quicken the process of medicine prescription. Keywords—recommendation system, trie, data strutures, priority lists, autocomplete I. INTRODUCTION One of the common characteristics of drug names is that they are hard to remember or may have jargons and homophones that naturally have similar way to spell. It is also difficult for a doctor to remember names for a large number of medicines. A standard search in a database is usually slow especially if the database contains a large number of entries. Also such a search would be static and give the same result for a given query. Our proposed system will help the doctor search for medication in a swift manner using either brand or generic names. Also given the generic name for a medicine a record of all brand names belonging to that generic class will be generated. To implement the above system we propose the use of Tries- a data structure, which would be modified to suit our application. II.Previous Work The paper [1] discusses trie to optimize search and provide faster retrieval. They discuss a method to reduce retrieval time using fast p tries and later go on to discuss how the space complexity of the system can be reduced. The discussed approach is based on finding the exact matching string or a string similar to it. In [2] a Segment Tree is discussed in which each leaf node is given a priority and based on the priority of the leaf node suggestions are given. Each node contains the value of maximum priority of its descendant .But this approach can be applicable only for a selection of phrases. Also the paper[3] shows many methods to implement autocomplete efficiently. The paper also mentions a trie based approach in which the edit distance is maintained between various nodes in the trie and based on this an error tolerant implementation is suggested. For a string to be completed the algorithm traverses the trie for each character of that string, then the edit distance between its parents and siblings is calculated and all the distances less than threshold are given as output. Not only is the retrieval time of this algorithm more than the proposed algorithm, the algorithm will give results only based on the distance between two words while our proposed algorithm will modify its output depending on what the user selects when the user types a given substring. The two main features to implement such a search strategy are: 1. 2. Speed:- The suggestions given by the algorithm should be faster than the typing speed, thus immediate suggestions are required, even at the cost of space. If the user can type faster, than we provide suggestion, our suggestions will be futile. Accuracy:- Accuracy certainly plays an important part. A fast result is useless if it is inaccurate, as the user won’t select it and end up typing the entire wordwhat we don’t want in the first place. To provide accurate results we must correctly predict what the user wishes to prescribe according to partial name typed and condition diagnosed. III. ALGORITHM AND DATA STRUCTURE For the data structure we are using a trie which we have modified to suit our application. To provide a fast lookup our algorithm has the time complexity of O(m), to search for a string of length m and provide the needed suggestions. The term trie stand for information retrieval. A trie is a tree which is empty at its root, as we go from the root lo leaves- a word from the database is formed. The interior node contain prefixes of the word such that the parent node precedes the child node in the prefix. 1. Standard Implementation of tries A trie is generally used to search if a word exists in the database or not. This can be done by navigating from the root to leaves using each character in the word in a sequence. Finally we can check if the node reached is a the tree is not changed by insertions or deletions which will be a rare scenario. A given string will always give the same results and a dfs will always give preference to the leftmost branch. Also the search time is proportional to O(m+b(d-m)) Figure 1 Figure 2 leaf to determine if the word is complete. For example if we want to search for Crocin in the above trie, we start from the root which is initially empty. The first letter is C therefore we navigate to node C which is a child of the root. For the next letter- r we navigate to the appropriate child of node c. This continues till node n is reached. Now finally we check if n is a leaf node. If it is a leaf node we can say that the work exists in the database. The algorithm is as follows: 1. 2. 3. 4. 5. 6. Accept string str that needs to be searched in database CurrNode= root For each character char in str do 4 CurrNode= CurrNode->child(char) If CurrNode is leaf terminate with success Else output “String Not Found” 2) Tries For Autocomplete The above method is used only to check if a given string is present in a database and can be a good search method to check if a username or password is present in a database. But for autosuggest we need to modify the standard way in which a trie is used. For autosuggest we can easily travel from the root to the typed characters as seen in the standard implementation. But after reaching that character append the characters obtained from the depth first search of the rest of the tree. This approach is logical- however is static, as long as where m is the length of the string that is typed, d is the depth at which the word is present and b is the branching factor. 3) Our approach In our approach we associate each node with a priority list. The first 6 elements in the list are initialized with the dfs of the tree. The 7th element is used as a buffer and is initially empty. The 8 th element is the last searched element when the given string was entered- irrespective of its priority. Each node is given a priority,the initial elements have high priority while the latter ones have low priority. Each time when a search is executed two things are taken into account. First, all the characters that were typed for searching a medicine. Second, what was the medicine that was searched for-searchedMed. After, the search is completed the priorities for each node that was traversed is updated as follows: 1) If searchedMed is present in Currnode->list set searchedMed[index] and go to 4 2) Currnode->list[8]=searchedMed 3) searchedMed[index]=findEmpty() 4) Currnode->list>priority(searchedMed[index])=Currnode->list>priority(searchedMed[index])+1 5) Currnode=nextNode() Where nextnode() returns the set of nodes that were traversed to finally reach the searched medicine and findEmpty() returns the index of the first element in the list which is empty else it will return 8. Thus, if any of the elements are empty the searchedMed is placed at that position in the list or the searchedMed is placed at the eight position. For example, Thus using this algorithm the trie modifies its output depending on the elements that are searched by the user. If the list is completely full, the 8th element in the list is replaced by the searched medicine. Moreover, the output given by this method is in order of most frequently searched elements making it easier for doctors to choose the required drug. 4) Mapping the generic names to brands In each node of the trie we will also store a list generic with indicates if an item in the list containing medicine names is a generic name or a brand name. If the value at that position is 1 then it is a generic else it is a brand. For each element in list for Figure 3 consider cro is typed- the algorithm traverses to the o node as we have already seen. As the list is initialized using the dfs of the trie structure the first element is Crocin and the second is Cromolyn. This output is the same as that obtained in the previous section. But the time complexity is only O(m), thus even in this case the speed is increased. Now suppose the user was searching for Sinarest and had typed Sy instead of Si and also consider that this error is quite common. When the user types Sy initially the webpage will output the list containing the drugs Synthroid, Symax and Syllact. This is also the output that was observed when the trie was modified for autocomplete. The user might rectify this error and select the correct drug. Thus, when the user selects Sinarest the trie will update. As the node y, which is reached after traversing S-y, contains 3 drug names in its priority list Sinarest is added at the fourth position in the list. Thus, when a user types Sy next Sinarest is present in the suggestions and can be easily chosen by user. Figure 4 a node, if the name is found to be a generic the database is scanned. This is a relational database which is maintained and is required to store the information about the drug like it’s dosage, side effects and also the generic class of the brand. Now the generic class for each brand is searched and compared with the name of this element. If a match is found the brand name is also given in the list of suggestions. The above mentioned process is followed to increase the utility of our system. Thus for a suggestion that is a generic name, a list of brand names belonging to that generic is provided. For example, if doctor types Acetaminophen- Tylenol, Paracetamol, Panadol and Mapap are also given as output. Thus the doctor can easily select the required brand name by just typing the generic name. IV.Expected Results As stated above our algorithm will provide a speed complexity of O(m) where m is the length of the string that has been typed by the user. This is really high and is independent of the size of the database. A standard brute force algorithm would have to scan the entire database to find the strings that match the given prefix. So it’s complexity is O(n) where n is the size of the database. Normally m<<n thus the algorithm outperforms the brute force algorithm. Other algorithms like...idk would have to scan at least the length of the typed string to make the predictions. Also many black box methods for autocomplete are available like Lucene, Solr, Sphinx and Redis, by implementing these we lose flexibility. Thus the proposed approach is close enough to the best possible implementation for this application. Although, it has high space complexity, but overall it gives good results as only medicine names are stored in the trie. In addition, the proposed method also updates the data structure based on what is more frequently searched and can also overcome spelling errors. In the long run after being implemented the system will stabilize and only become more efficient. Thus the intuitiveness and briskness in the system will be one of the major factors for it to replace the manual process for providing prescriptions. functionalities like brand name suggestions using generic names, correction of spelling errors, and an option for quick printing of prescription. VI.Refrences: [1] Willard, Dan E. "New trie data structures which support very fast search operations." Journal of Computer and System Sciences 28.3 (1984): 379-394. V.Conclusion Thus our proposed system will give appropriate suggestions to doctors using trie data structure along with priority lists attached to each node. Thus we see how the data structure can improve the speed of search and also make the search more flexible for a medical application. The data structure can similarly be modified or used as is in other applications as well. This system with our user friendly GUI will be of tremendous use to doctors and facilitate them in a lot of ways by providing [2] Dhruv Matani “An O(k logn) algorithm for prefix based ranked autocomplete.” [3] "Extending autocompletion to tolerate errors." Proceedings of the 2009 ACM SIGMOD International Conference on Management of data 29 Jun. 2009: 707-718. Surajit Chaudhuri and Raghav Kaushik (Microsoft Research).