Download NUS at DUC 2007 - National University of Singapore

NUS at DUC 2007: Using Evolutionary Models of Text Ziheng Lin, Tat-Seng Chua, Min-Yen Kan, Wee Sun Lee, Long Qiu and Shiren Ye Department of Computer Science National University of Singapore, Singapore NUS at DUC 2007: Using Evolutionary Models of Text Summarization • Traditionally, weighted heuristics to select sentences • With the advent of machine learning, heuristic weights can be tuned • In last few years, graphical representations of text have shed new light on the summarization problem • TextRank and LexRank allow us to naturally incorporate context as a continuum • How can we enhance this representational model? DUC 2007 Workshop 2 NUS at DUC 2007: Using Evolutionary Models of Text Prestige as sentence selection • One motivation of using graphical methods was to model the problem as finding prestige of nodes in a social network • PageRank used random walk to smooth the effect of non-local context • Lead to TextRank and LexRank • Contrast with previous graphical approaches (Salton et al. 1994) • Did we leave anything out of our representation for summarization? Yes, the notion of an evolving network DUC 2007 Workshop 3 NUS at DUC 2007: Using Evolutionary Models of Text Social networks change! Natural evolving networks (Dorogovtsev and Mendes, 2001) – Citation networks: New papers can cite old ones, but the old network is static – The Web: new pages are added with an old page connecting it to the web graph, old pages may update links DUC 2007 Workshop 4 NUS at DUC 2007: Using Evolutionary Models of Text Evolutionary models for summarization Writers and readers often follow conventional rhetorical styles - articles are not written or read in an arbitrary way Consider the evolution of texts using a very simplistic model – Writers write from the first sentence onwards in a text – Readers read from the first sentence onwards of a text • A simple model: sentences get added incrementally to the graph DUC 2007 Workshop 5 NUS at DUC 2007: Using Evolutionary Models of Text Timestamped Graph Construction Approach – These assumptions suggest us to iteratively add sentences into the graph in chronological order. – At each iteration, consider which edges to add to the graph. – For single document: simple and straightforward: add 1st sentence, followed by the 2nd, and so forth, until the last sentence is added – For multi-document: treat it as multiple instances of single documents, which evolve in parallel; i.e., add 1st sentences of all documents, followed by all 2nd sentences, and so forth • Doesn’t really model chronological ordering between articles, fix later DUC 2007 Workshop 6 NUS at DUC 2007: Using Evolutionary Models of Text Timestamped Graph Construction Model: • Documents as columns – di = document i • Sentences as rows –sj = jth sentence of document DUC 2007 Workshop 7 NUS at DUC 2007: Using Evolutionary Models of Text Timestamped Graph Construction • A multi document example doc1 doc2 doc3 sent1 sent2 sent3 DUC 2007 Workshop 8 NUS at DUC 2007: Using Evolutionary Models of Text An example TSG: DUC 2007 D0703A-A DUC 2007 Workshop 9 NUS at DUC 2007: Using Evolutionary Models of Text Timestamped Graph Construction Formalization of TSGs: – The example is just one instance of TSG – Let’s generalize and formalize the TSG algorithm – A timestamped graph algorithm tsg(M) is a 9-tuple: (d, e, u, f,σ, t, i, s,τ) that • specifies a resulting algorithm that takes • as input the set of texts M • and outputs a graph G • Salient parameters for TSGs: e - # edges to add per vertex per time step u - unweighted or weighted edges For description of other σ- vertex selection function σ(u, G) parameters: s - skew degree see our TextGraphs-2 paper DUC 2007 Workshop 10 NUS at DUC 2007: Using Evolutionary Models of Text Timestamped Graph Construction • Vertex selection function σ(u, G) – One strategy: among those nodes not yet connected to u in G, choose the one that has the highest similarity with u – Similarity functions: Jaccard, cosine, concept links (Ye et al.. 2005) DUC 2007 Workshop 11 NUS at DUC 2007: Using Evolutionary Models of Text Timestamped Graph Construction • Skew degree s – Models how nodes in multi-document graphs are added – Skew degree = how many iterations to wait before adding the 1st sentence of the next document Motivation – Up to now, the TSG models assume that the authors start writing the documents at the same time – In reality, some documents are authored later than others, giving updates or reporting changes – Infer information from timestamps of articles or from date extraction on articles themselves. DUC 2007 Workshop 12 NUS at DUC 2007: Using Evolutionary Models of Text Skew Degree Examples time(d1) < time(d2) < time(d3) < time(d4) d1 d2 d3 d4 d1 d2 d3 d4 d1 d2 d3 d4 Freely skewed = Only add a new document when it would be linked by some node using vertex function σ Skewed by 1 DUC 2007 Workshop Skewed by 2 Freely skewed 13 NUS at DUC 2007: Using Evolutionary Models of Text Timestamped Graph Construction • Representations – We can model a number of different algorithms using this 9-tuple formalism: (d, e, u, f,σ, t, i, s,τ) – The given toy example: (f, 1, 0, 1, max-cosine-based, sentence, 1, 0, null) – LexRank graphs: (u, N, 1, 1, cosine-based, sentence, Lmax, 0, null) N = total number of sentences in the cluster; Lmax = the max document length i.e., all sentences are added into the graph in one timestep, each connected to all others, and cosine scores are given to edge weights DUC 2007 Workshop 14 Summarization using TSGs NUS at DUC 2007: Using Evolutionary Models of Text System Overview • Sentence splitting –Detect and mark sentence boundaries –Annotate each sentence with the doc ID and the sentence number –E.g., XIE19980304.0061: 4 March 1998 from Xinhua News; XIE19980304.0061-14: the 14th sentence of this document • Graph construction –Construct TSG in this phase DUC 2007 Workshop 16 NUS at DUC 2007: Using Evolutionary Models of Text System Overview • Sentence Ranking – Apply topic-sensitive random walk on the graph to redistribute the weights of the nodes • Sentence extraction – Extract the top-ranked sentences – Two different modified MMR rerankers are used, depending on whether it is main or update task DUC 2007 Workshop 17 NUS at DUC 2007: Using Evolutionary Models of Text Differences for main and update task processing Main task: 1. 2. 3. Construct a TSG for input cluster Run topic-sensitive PageRank on the TSG Apply first modified version of MMR to extract sentences Update task: • Cluster A: – Construct a TSG for cluster A – Run topic-sensitive PageRank on the TSG – Apply the second modified version of MMR to extract sentences • Cluster B: – Construct a TSG for clusters A and B – Run topic-sensitive PageRank on the TSG; only retain sentences from B – Apply the second modified version of MMR to extract sentences • Cluster C: – Construct a TSG for clusters A, B and C – Run topic-sensitive PageRank on the TSG; only retain sentences from C – Apply the second modified version of MMR to extract sentences DUC 2007 Workshop 18 NUS at DUC 2007: Using Evolutionary Models of Text Sentence Ranking • Once a timestamped graph is built, we want to compute an prestige score for each node • PageRank: use an iterative method that allows the weights of the nodes to redistribute until stability is reached • Similarities as edges → weighted edges; query → topic-sensitive Topic sensitive (Q) portion DUC 2007 Workshop Standard random walk term 19 NUS at DUC 2007: Using Evolutionary Models of Text Sentence Extraction – Main task • Original MMR: integrates a penalty of the maximal similarity of the candidate document and one selected document • Ye et al. (2005) introduced a modified MMR: integrates a penalty of the total similarity of the candidate sentence and all selected sentences Penalty: All previous sentence similarity • Score(s) = PageRank score of s; S = selected sentences • This is used in the main task DUC 2007 Workshop 20 NUS at DUC 2007: Using Evolutionary Models of Text Sentence Extraction – Update task •Update task assumes readers already read previous cluster(s) – implies we should not select sentences that have redundant information with previous cluster(s) • Propose a modified MMR for the update task: – consider the total similarity of the candidate sentence with all selected sentences and sentences in previously-read cluster(s) Previous cluster overlap • P contains some top-ranked sentences in previous cluster(s) DUC 2007 Workshop 21 Evaluation and Analysis NUS at DUC 2007: Using Evolutionary Models of Text Macroscopic Evaluation Main task parameterization –Graph construction: (u, 1, 1, 1, concept-link-based, sentence, 1, 0, null) –Sentence extraction: λ= 0.8 and δ= 6, tuned from DUC 05 and 06 datasets –12th for ROUGE-2 and 10th for ROUGE-SU4 among 32 systems DUC 2007 Workshop 23 NUS at DUC 2007: Using Evolutionary Models of Text Macroscopic Evaluation •Update task parameterization –Graph construction: (u, 1, 1, 1, concept-link-based, sentence, 1, 0, null) –Sentence extraction: λ= 0.8, δ= 3 and γ= 6, based on our experience –3rd in ROUGE-2, 4th in ROUGE-SU4 and 6th in average pyramid scores among 24 systems DUC 2007 Workshop 24 NUS at DUC 2007: Using Evolutionary Models of Text What do we think? • Better performance in update task • TSG is better tailored to deal with update summaries • The second modified version of MMR works better at distilling redundant information that is shown in previously-read cluster(s) DUC 2007 Workshop 25 NUS at DUC 2007: Using Evolutionary Models of Text Conclusion • Proposed a timestamped graph model for text understanding and summarization – Adds sentences one at a time • Parameterized model with nine variables – Several important variables to the iterative TSG formalism explained • MMR reranking modified for fit with update task Future Work • Freely skewed model • Empirical and theoretical properties of TSGs (e.g., in-degree distribution) DUC 2007 Workshop 26 Backup Slides 15 Minutes total talk 3:15-3:30 NUS at DUC 2007: Using Evolutionary Models of Text References • Günes Erkan and Dragomir R. Radev. 2004. LexRank: Graph-based centrality as salience in text summari-zation. Journal of Artificial Intelligence Research, (22). • Rada Mihalcea and Paul Tarau. 2004. TextRank: Bring-ing order into texts. In Proceedings of EMNLP 2004. • S.N. Dorogovtsev and J.F.F. Mendes. 2001. Evolution of networks. Submitted to Advances in Physics on 6th March 2001. • Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual Web search engine. Com-puter Networks and ISDN Systems, 30(1-7). • Jon M. Kleinberg. 1999. Authoritative sources in a hy-perlinked environment. In Proceedings of ACM-SIAM Symposium on Discrete Algorithms, 1999. • Shiren Ye, Long Qiu, Tat-Seng Chua, and Min-Yen Kan. 2005. NUS at DUC 2005: Understanding docu-ments via concepts links. In Proceedings of DUC 2005. DUC 2007 Workshop 28

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download NUS at DUC 2007 - National University of Singapore