Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Mining and Text Analytics Quranic Arabic Corpus By Saima Rahna & Anees Mohammad Summary ● ● ● Quranic Arabic corpus enables further analysis of the Quran Uses linguistic resources for each word and verse in the quran – e.g. Morphology and syntax Automated algorithms were used in the Quran. Introduction ● Islam was born in Arabia (1400 years ago) ● The key sacred texts are in Arabic ● Only a minority Muslims can speak and understand Arabic ● A larger percentage of Muslims know English as a second language or even first ● Web resources and book resources use English in parallel with Arabic. Data Mining ● ● ● Uses tools and techniques to extract data Different aspects of a single topic in the Quran can reappear in many chapters Therefore frequent patterns can be used to construct a subjective index where all versus on a single topic can be covered easily. Text Analytic ● Referred to as information extraction ● The Quranic corpus is an advantage to those who don't understand Arabic ● Can give the English readers a better insight into the source ● The translation is at a detailed text Analytic level Resources & Techniques Statistical techniques ● ● Implementing statistical techniques such as keyword extraction Can explore semiotic relationships between sound and meaning in the Quran ● Recognise reoccurring patterns ● Recognise reoccurring patterns for high level of accuracy ● Linguistic resource ● Arabic grammar and syntax used for each word in the quran ● A comment based system used online for visitors to discuss and correct the data. Algorithms ● ● ● Quranic Arabic Corpus used Java to implement their algorithms. Search feature (searching concepts and key words in the Holy Quran) ● Finding multi-word repetitions ● Mining frequent patterns to a graph. Algorithm for indexing the Quran When a word is encountered for the first time, it is added to the index; if it already exists there, then a new location is added to its list. For each verse V parse word list -> list(W) For each word W If INDEX contains W is false add W and W.location to Index Else fetch W in INDEX add new location to W Filtering algorithm ● The Quranic 'quote filtering' algorithm ● ● The Quran has the use of Arabic diacritics (symbols) The filtering algorithm has 3 filtering stages after making the input text. Algorithm-Sub path Mining ● ● This is used to generate frequent patterns within the Quran corpus The process starts by scanning the transaction database, calculating the count for each vertex in the graph Conclusion ● Algorithms used ● Resources and techniques used for ● implementation of the Quranic Arabic corpus ● How data mining is applied ● How text analytic has also been applied Thank you :-)