Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
References for Html Purifier Module 文章中引文编号,我 EndNote 中的编号 1, [Vieira, et al.,2006] 2, [Chakrabarti, et al.,2008] 3, [Fetterly, et al.,2004] 4, [Pardalos and Xue,1999] 5, [Chibane and Doan,2007] 6, [Manku, et al.,2007] 7, [Fetterly, et al.,2005] 8, [Lin and Ho,2002] 9, [Gupta, et al.,2003] 10, [Coughlan, et al.,2000] 11, [Yi, et al.,2003] 12, [Cai, et al.,2003] 13, [Ma, et al.,2003] 14, [Wang, et al.,2008] 15, [Kushmerick,1999] 16, [Chakrabarti, et al.,2007] 17, [Bing, et al.,2008] 18, [Kovacevic, et al.,2002] 19, [Carvalho, et al.,2006] 20, [Chen, et al.,2006] 21, [Bar-Yossef and Rajagopalan,2002] 22, [Gibson, et al.,2005] 23, [Gupta, et al.,2006] 24, [Liu and Meng,2006] 25, [Kao, et al.,2005] 26, [Yi and Liu,August, 2003.] 27, [Chakrabarti,2008], have you tried the software? Is it better to substitue the paper “Analyzing fine-grained hypertext features for enhanced crawling and topic distillation” at the page 34 of the Data Engineering Bulletin, for the software link. 28, [Debnath, et al.,2005] 29, [Davison,2000] 30, [Song, et al.,2004] 31, [Restrepo and Bovik,1994] 32, [Best and Chakravarti,1990] 33, [Best and Tan,1993] 34, [Cai, et al.,2003] 35, same to the reference 34 36, 37, [Bar-Yossef and Rajagopalan,2002] Z. Bar-Yossef and S. Rajagopalan, "Template detection via data mining and its applications," in Proceedings of the 11th international conference on World Wide Web. Honolulu, Hawaii, USA: ACM, 2002. [Best and Chakravarti,1990] M. J. Best and N. Chakravarti, "Active set algorithms for isotonic regression: a unifying framework," Math. Program., vol. 47, pp. 425-439, 1990. [Best and Tan,1993] M. J. Best and R. Y. Tan, "An O(n^3 log n) strong polynomial algorithm for an isotonic regression knapsack problem," Optimization Theory and Applications, vol. 79, pp. 463-478, 1993. [Bing, et al.,2008] L. Bing, Y. Wang, Y. Zhang, and H. Wang, "Primary Content Extraction with Mountain Model," presented at the proceedings of the IEEE CIT2008, Sydney, Australia, 2008. [Cai, et al.,2003] D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma, "VIPS: a vision based page segmentation algorithm," Microsoft Technical Report 2003. [Cai, et al.,2003] D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma, "Extracting Content Structure for Web Pages Based on Visual Representation," presented at Web Technologies and Applications: 5th Asia-Pacific Web Conference, Xian, China, 2003. [Carvalho, et al.,2006] A. L. d. C. Carvalho, P.-A. Chirita, E. S. d. Moura, P. Calado, and W. Nejdl, "Site level noise removal for search engines," in Proceedings of the 15th international conference on World Wide Web. Edinburgh, Scotland: ACM, 2006. [Chakrabarti, et al.,2007] D. Chakrabarti, R. Kumar, and K. Punera, "Page-level template detection via isotonic smoothing," in Proceedings of the 16th international conference on World Wide Web. Banff, Alberta, Canada: ACM, 2007. [Chakrabarti, et al.,2008] D. Chakrabarti, R. Kumar, and K. Punera, "A graph-theoretic approach to webpage segmentation," in Proceeding of the 17th international conference on World Wide Web. Beijing, China: ACM, 2008. [Chakrabarti,2008] HyParSuite. http://www.cse.iitb.ac.in/~soumen/download/. [Chen, et al.,2006] L. Chen, S. Ye, and X. Li, "Template detection for large scale search engines," in Proceedings of the 2006 ACM symposium on Applied computing. Dijon, France: ACM, 2006. [Chibane and Doan,2007] I. Chibane and B.-L. Doan, "A web page topic segmentation algorithm based on visual criteria and content layout," in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. Amsterdam, The Netherlands: ACM, 2007. [Coughlan, et al.,2000] J. Coughlan, A. Yuille, C. English, and D. Snow, "Efficient deformable template detection and localization without user initialization," Comput. Vis. Image Underst., vol. 78, pp. 303-319, 2000. [Davison,2000] B. D. Davison, "Recognizing Nepotistic Links on the Web," presented at the AAAI-2000 Workshop on Artificial Intelligence for Web Search, Austin, TX 2000. [Debnath, et al.,2005] S. Debnath, P. Mitra, N. Pal, and C. L. Giles, "Automatic Identification of Informative Sections of Web Pages," IEEE Trans. on Knowl. and Data Eng., vol. 17, pp. 1233-1246, 2005. [Fetterly, et al.,2004] D. Fetterly, M. Manasse, M. Najork, and J. L. Wiener, "A large-scale study of the evolution of web pages," Softw. Pract. Exper., vol. 34, pp. 213-237, 2004. [Fetterly, et al.,2005] D. Fetterly, M. Manasse, and M. Najork, "Detecting phrase-level duplication on the world wide web," in Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. Salvador, Brazil: ACM, 2005. [Gibson, et al.,2005] D. Gibson, K. Punera, and A. Tomkins, "The volume and evolution of web page templates," in Special interest tracks and posters of the 14th international conference on World Wide Web. Chiba, Japan: ACM, 2005. [Gupta, et al.,2003] S. Gupta, G. Kaiser, D. Neistadt, and P. Grimm, "DOM-based content extraction of HTML documents," in Proceedings of the 12th international conference on World Wide Web. Budapest, Hungary: ACM, 2003. [Gupta, et al.,2006] S. Gupta, H. Becker, G. Kaiser, and S. Stolfo, "Verifying genre-based clustering approach to content extraction," in Proceedings of the 15th international conference on World Wide Web. Edinburgh, Scotland: ACM, 2006. [Kao, et al.,2005] H.-Y. Kao, J.-M. Ho, and M.-S. Chen, "WISDOM: Web Intrapage Informative Structure Mining Based on Document Object Model," IEEE Trans. on Knowl. and Data Eng., vol. 17, pp. 614-627, 2005. [Kovacevic, et al.,2002] M. Kovacevic, M. Dilligenti, M. Gori, and V. M. Milutinovic, "Recognition of Common Areas in a Web Page Using a Visualization Approach," in Proceedings of the 10th International Conference on Artificial Intelligence: Methodology, Systems, and Applications: Springer-Verlag, 2002. [Kushmerick,1999] N. Kushmerick, "Learning to remove Internet advertisements," in Proceedings of the third annual conference on Autonomous Agents. Seattle, Washington, United States: ACM, 1999. [Lin and Ho,2002] S.-H. Lin and J.-M. Ho, "Discovering informative content blocks from Web documents," in Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. Edmonton, Alberta, Canada: ACM, 2002. [Liu and Meng,2006] W. Liu and X. Meng, "Vision-based Web Data Records Extraction," presented at Ninth International Workshop on the Web and Databases (WebDB 2006), Chicago, 2006. [Ma, et al.,2003] L. Ma, N. Goharian, A. Chowdhury, and M. Chung, "Extracting unstructured data from template generated web documents," in Proceedings of the twelfth international conference on Information and knowledge management. New Orleans, LA, USA: ACM, 2003. [Manku, et al.,2007] G. S. Manku, A. Jain, and A. D. Sarma, "Detecting near-duplicates for web crawling," in Proceedings of the 16th international conference on World Wide Web. Banff, Alberta, Canada: ACM, 2007. [Pardalos and Xue,1999] P. M. Pardalos and G. Xue, "Algorithms for a Class of Isotonic Regression Problems," Algorithmica, vol. 23, pp. 211--222, 1999. [Restrepo and Bovik,1994] A. Restrepo and A. C. Bovik, "Locally monotonic regression," IEEE Transactions on Signal Processing, vol. 41, pp. 2796–2780, 1994. [Song, et al.,2004] R. Song, H. Liu, J.-R. Wen, and W.-Y. Ma, "Learning block importance models for web pages," in Proceedings of the 13th international conference on World Wide Web. New York, NY, USA: ACM, 2004. [Vieira, et al.,2006] K. Vieira, A. S. d. Silva, N. Pinto, E. S. d. Moura, J. M. B. Cavalcanti, and J. Freire, "A fast and robust method for web page template detection and removal," in Proceedings of the 15th ACM international conference on Information and knowledge management. Arlington, Virginia, USA: ACM, 2006. [Wang, et al.,2008] Y. Wang, B. Fang, X. Cheng, L. Guo, and H. Xu, "Incremental web page template detection," in Proceeding of the 17th international conference on World Wide Web (Poster). Beijing, China: ACM, 2008. [Yi, et al.,2003] L. Yi, B. Liu, and X. Li, "Eliminating noisy information in Web pages for data mining," in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. Washington, D.C.: ACM, 2003. [Yi and Liu,August, 2003.] L. Yi and B. Liu, "Web Page Cleaning for Web Mining through Feature Weighting," presented at the proceedings of Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-03), Acapulco, Mexico, , August, 2003.