Download Refs4 - Search Engine and Web Mining Group

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
References for Html Purifier Module
文章中引文编号,我 EndNote 中的编号
1, [Vieira, et al.,2006]
2, [Chakrabarti, et al.,2008]
3, [Fetterly, et al.,2004]
4, [Pardalos and Xue,1999]
5, [Chibane and Doan,2007]
6, [Manku, et al.,2007]
7, [Fetterly, et al.,2005]
8, [Lin and Ho,2002]
9, [Gupta, et al.,2003]
10, [Coughlan, et al.,2000]
11, [Yi, et al.,2003]
12, [Cai, et al.,2003]
13, [Ma, et al.,2003]
14, [Wang, et al.,2008]
15, [Kushmerick,1999]
16, [Chakrabarti, et al.,2007]
17, [Bing, et al.,2008]
18, [Kovacevic, et al.,2002]
19, [Carvalho, et al.,2006]
20, [Chen, et al.,2006]
21, [Bar-Yossef and Rajagopalan,2002]
22, [Gibson, et al.,2005]
23, [Gupta, et al.,2006]
24, [Liu and Meng,2006]
25, [Kao, et al.,2005]
26, [Yi and Liu,August, 2003.]
27, [Chakrabarti,2008], have you tried the software?
Is it better to substitue the paper “Analyzing fine-grained hypertext features for enhanced crawling
and topic distillation” at the page 34 of the Data Engineering Bulletin, for the software link.
28, [Debnath, et al.,2005]
29, [Davison,2000]
30, [Song, et al.,2004]
31, [Restrepo and Bovik,1994]
32, [Best and Chakravarti,1990]
33, [Best and Tan,1993]
34, [Cai, et al.,2003]
35, same to the reference 34
36,
37,
[Bar-Yossef and Rajagopalan,2002] Z. Bar-Yossef and S. Rajagopalan, "Template detection via data
mining and its applications," in Proceedings of the 11th international conference on World
Wide Web. Honolulu, Hawaii, USA: ACM, 2002.
[Best and Chakravarti,1990]
M. J. Best and N. Chakravarti, "Active set algorithms for isotonic
regression: a unifying framework," Math. Program., vol. 47, pp. 425-439, 1990.
[Best and Tan,1993] M. J. Best and R. Y. Tan, "An O(n^3 log n) strong polynomial algorithm for an
isotonic regression knapsack problem," Optimization Theory and Applications, vol. 79, pp.
463-478, 1993.
[Bing, et al.,2008]
L. Bing, Y. Wang, Y. Zhang, and H. Wang, "Primary Content Extraction with
Mountain Model," presented at the proceedings of the IEEE CIT2008, Sydney, Australia,
2008.
[Cai, et al.,2003]
D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma, "VIPS: a vision based page segmentation
algorithm," Microsoft Technical Report 2003.
[Cai, et al.,2003]
D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma, "Extracting Content Structure for Web
Pages Based on Visual Representation," presented at Web Technologies and Applications: 5th
Asia-Pacific Web Conference, Xian, China, 2003.
[Carvalho, et al.,2006]
A. L. d. C. Carvalho, P.-A. Chirita, E. S. d. Moura, P. Calado, and W. Nejdl,
"Site level noise removal for search engines," in Proceedings of the 15th international
conference on World Wide Web. Edinburgh, Scotland: ACM, 2006.
[Chakrabarti, et al.,2007] D. Chakrabarti, R. Kumar, and K. Punera, "Page-level template detection via
isotonic smoothing," in Proceedings of the 16th international conference on World Wide Web.
Banff, Alberta, Canada: ACM, 2007.
[Chakrabarti, et al.,2008] D. Chakrabarti, R. Kumar, and K. Punera, "A graph-theoretic approach to
webpage segmentation," in Proceeding of the 17th international conference on World Wide
Web. Beijing, China: ACM, 2008.
[Chakrabarti,2008] HyParSuite. http://www.cse.iitb.ac.in/~soumen/download/.
[Chen, et al.,2006] L. Chen, S. Ye, and X. Li, "Template detection for large scale search engines," in
Proceedings of the 2006 ACM symposium on Applied computing. Dijon, France: ACM, 2006.
[Chibane and Doan,2007] I. Chibane and B.-L. Doan, "A web page topic segmentation algorithm based
on visual criteria and content layout," in Proceedings of the 30th annual international ACM
SIGIR conference on Research and development in information retrieval. Amsterdam, The
Netherlands: ACM, 2007.
[Coughlan, et al.,2000]
J. Coughlan, A. Yuille, C. English, and D. Snow, "Efficient deformable
template detection and localization without user initialization," Comput. Vis. Image Underst.,
vol. 78, pp. 303-319, 2000.
[Davison,2000] B. D. Davison, "Recognizing Nepotistic Links on the Web," presented at the
AAAI-2000 Workshop on Artificial Intelligence for Web Search, Austin, TX 2000.
[Debnath, et al.,2005]
S. Debnath, P. Mitra, N. Pal, and C. L. Giles, "Automatic Identification of
Informative Sections of Web Pages," IEEE Trans. on Knowl. and Data Eng., vol. 17, pp.
1233-1246, 2005.
[Fetterly, et al.,2004]
D. Fetterly, M. Manasse, M. Najork, and J. L. Wiener, "A large-scale study of
the evolution of web pages," Softw. Pract. Exper., vol. 34, pp. 213-237, 2004.
[Fetterly, et al.,2005]
D. Fetterly, M. Manasse, and M. Najork, "Detecting phrase-level duplication
on the world wide web," in Proceedings of the 28th annual international ACM SIGIR
conference on Research and development in information retrieval. Salvador, Brazil: ACM,
2005.
[Gibson, et al.,2005] D. Gibson, K. Punera, and A. Tomkins, "The volume and evolution of web page
templates," in Special interest tracks and posters of the 14th international conference on
World Wide Web. Chiba, Japan: ACM, 2005.
[Gupta, et al.,2003] S. Gupta, G. Kaiser, D. Neistadt, and P. Grimm, "DOM-based content extraction
of HTML documents," in Proceedings of the 12th international conference on World Wide
Web. Budapest, Hungary: ACM, 2003.
[Gupta, et al.,2006] S. Gupta, H. Becker, G. Kaiser, and S. Stolfo, "Verifying genre-based clustering
approach to content extraction," in Proceedings of the 15th international conference on World
Wide Web. Edinburgh, Scotland: ACM, 2006.
[Kao, et al.,2005]
H.-Y. Kao, J.-M. Ho, and M.-S. Chen, "WISDOM: Web Intrapage Informative
Structure Mining Based on Document Object Model," IEEE Trans. on Knowl. and Data Eng.,
vol. 17, pp. 614-627, 2005.
[Kovacevic, et al.,2002] M. Kovacevic, M. Dilligenti, M. Gori, and V. M. Milutinovic, "Recognition
of Common Areas in a Web Page Using a Visualization Approach," in Proceedings of the 10th
International Conference on Artificial Intelligence: Methodology, Systems, and Applications:
Springer-Verlag, 2002.
[Kushmerick,1999] N. Kushmerick, "Learning to remove Internet advertisements," in Proceedings of
the third annual conference on Autonomous Agents. Seattle, Washington, United States: ACM,
1999.
[Lin and Ho,2002] S.-H. Lin and J.-M. Ho, "Discovering informative content blocks from Web
documents," in Proceedings of the eighth ACM SIGKDD international conference on
Knowledge discovery and data mining. Edmonton, Alberta, Canada: ACM, 2002.
[Liu and Meng,2006]
W. Liu and X. Meng, "Vision-based Web Data Records Extraction,"
presented at Ninth International Workshop on the Web and Databases (WebDB 2006),
Chicago, 2006.
[Ma, et al.,2003]
L. Ma, N. Goharian, A. Chowdhury, and M. Chung, "Extracting unstructured data
from template generated web documents," in Proceedings of the twelfth international
conference on Information and knowledge management. New Orleans, LA, USA: ACM, 2003.
[Manku, et al.,2007] G. S. Manku, A. Jain, and A. D. Sarma, "Detecting near-duplicates for web
crawling," in Proceedings of the 16th international conference on World Wide Web. Banff,
Alberta, Canada: ACM, 2007.
[Pardalos and Xue,1999] P. M. Pardalos and G. Xue, "Algorithms for a Class of Isotonic Regression
Problems," Algorithmica, vol. 23, pp. 211--222, 1999.
[Restrepo and Bovik,1994]
A. Restrepo and A. C. Bovik, "Locally monotonic regression," IEEE
Transactions on Signal Processing, vol. 41, pp. 2796–2780, 1994.
[Song, et al.,2004] R. Song, H. Liu, J.-R. Wen, and W.-Y. Ma, "Learning block importance models for
web pages," in Proceedings of the 13th international conference on World Wide Web. New
York, NY, USA: ACM, 2004.
[Vieira, et al.,2006] K. Vieira, A. S. d. Silva, N. Pinto, E. S. d. Moura, J. M. B. Cavalcanti, and J.
Freire, "A fast and robust method for web page template detection and removal," in
Proceedings of the 15th ACM international conference on Information and knowledge
management. Arlington, Virginia, USA: ACM, 2006.
[Wang, et al.,2008] Y. Wang, B. Fang, X. Cheng, L. Guo, and H. Xu, "Incremental web page template
detection," in Proceeding of the 17th international conference on World Wide Web (Poster).
Beijing, China: ACM, 2008.
[Yi, et al.,2003]
L. Yi, B. Liu, and X. Li, "Eliminating noisy information in Web pages for data
mining," in Proceedings of the ninth ACM SIGKDD international conference on Knowledge
discovery and data mining. Washington, D.C.: ACM, 2003.
[Yi and Liu,August, 2003.]
L. Yi and B. Liu, "Web Page Cleaning for Web Mining through Feature
Weighting," presented at the proceedings of Eighteenth International Joint Conference on
Artificial Intelligence (IJCAI-03), Acapulco, Mexico, , August, 2003.
Related documents