Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Goal recap Implementation Experimental Results Conclusion Questions & Answers Our goal is to implement framework, to predict network traffic by mining mainstream news articles Method › Latent Dirichlet Allocation (LDA) identifies and classifies popular topics in articles ISP can query and pre-cache highly popular videos to reduce overall traffic and delay Implemented a python program to parse the news articles and collect the title and content Original LDA implementation processed random Wikipedia articles, we modified it to pass and process news articles. Wrote a script to extract and store YouYube statistical data such as, view-counts, number of subscribers, YouTube ID’s, date of upload, user profile data, etc. Wrote and implemented a program to sort topics by popularity , we pick most popular topics and compare it with news websites › Popular news websites (such as CNN, BBC) generate popularity chart over time by clickview data Implemented the ZOOM Operation › Wrote a program to distribute the articles by sources/category › Query words using frequent pattern mining and LDA results to check relevancy and accuracy of popular topics 1 0.9 0.8 0.7 0.6 LDA+FP 0.5 OSLDA 0.4 0.3 0.2 0.1 0 10 20 40 80 100 (X axis) # of feeds VS (Y axis) Video relevance to the topic 1 0.9 0.8 0.7 0.6 LDA+FP 0.5 OSLDA 0.4 0.3 0.2 0.1 0 10 20 30 40 50 60 70 80 90 (X axis) # of feeds VS (Y axis) Accuracy of selecting video with most traffic 100 Online LDA alone accurately chooses the most popular topic around 57% of the times using 1k articles. With 100k articles it is around 91% accurate. The blue line is the accuracy using both Online LDA and frequent pattern mining. With 1k articles the accuracy is around 92%. Using 100k articles the accuracy close to 100%. When using only Online LDA there is only around a 60% chance the selected video will be relevant to the actual topic when using 10k articles. When using 100k articles the probability rises to about 87%. When using frequent pattern mining and Online LDA there is around a 94% chance the video selected is relevant using 10k articles. With 100k the probability is close to 100%. From these results we conclude that using Online LDA combined with frequent pattern mining we will be able to predict popular topics from mainstream media and identify relevant videos from video portals with high accuracy Thank you Q&A!!