Download An evolutionary approach for improving the quality of automatic

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Chromosome wikipedia , lookup

Designer baby wikipedia , lookup

Quantitative comparative linguistics wikipedia , lookup

Genome (book) wikipedia , lookup

Polyploid wikipedia , lookup

Karyotype wikipedia , lookup

Microevolution wikipedia , lookup

Koinophilia wikipedia , lookup

Gene expression programming wikipedia , lookup

Smith–Waterman algorithm wikipedia , lookup

Transcript
An evolutionary approach for improving
the quality of automatic summaries
Constantin Orasan
Research Group in Computational Linguistics
School of Humanities, Languages and Social Sciences
University of Wolvehampton
[email protected]
Proceeding of the ACL 2003 Workshop on Multilingual
Summarization and Question Answering
Introduction

There are two main approaches for producing automatic
summarizations.




Extract and rearrange
Understand and generate
Given that “understand” a text is usually domain-specific,
extraction methods are preferred when robustness needed.
Here we present a novel approach to improve the quality of
summarization by ameliorating their local cohesion.
Continuity Principle


Use the continuity principle defined in Centering Theory (Grosz et
al., 1995) to improve the quality.
This principle requires that two consecutive utterances have at
least one entity in common.


In general utterances are clauses or sentences, here we consider
sentences as utterances.
We try to produce summaries which do not violate the continuity
principle.

Produce sequences of sentences that refer the same entity, and
therefore be more coherent.
Corpus Investigation

We consider two utterances have an entity in common if the
same head noun phrase appear in both utterances.


Use the FDG tagger to determine the head of noun phrases.
We investigated 146 human produced abstracts from the Journal
of Artificial Intelligence Research and almost 75% satisfy the
principle.
Use CP in Summarization and Text Generation



In order to produce a summary which violate continuity principle
least, we score a sentence use both content and context
information.
Karamanis and Manurung (2002) used the CP in text generation,
however, summarization is harder because it needs firstly
identify the important information in the document.
Another difference is that we do not intend to change the order
of the extracted sentences because preliminary experiments did
not lead to any promising results.
Content-based scoring

The existing heuristics are:







Keyword method: TFIDF scores of words, the score of sentence is
the sum of scores of words.
Indicator phrase method: such as in this paper, we present, in
conclusion(meta-discourse markers), …
Location method: sentences in the first and last 13 paragraphs
have their scores boosted.
Title and headers method: sentences containing the words in title
and headers have their score boosted.
Special formatting rules: sentences that contain equations are
excluded.
The score of a sentence is a weighted function of these parameters
established through experiments.
One of the most important heuristics proved to be the indicating
phrase method.
Context-based scoring



Depending on the context in which a sentence appears in a
summary, its score can be boosted or penalized.
If the continuity principle satisfied with either the sentence that
precedes or follows it the score boosted, otherwise penalized.
After experiment we decide to boost the sentence’s score with
the TFIDF scores of the common NPs’ heads and penalize with
the highest TFIDF score in the document.
The Greedy Algorithm

Extract the highest scored sentence from those not extracted
yet.


Scores are computed in the way described above.
Given the original order of sentences is maintained, the
algorithm in Figure 1 is performed.
The Greedy Algorithm




At score computing stage, a sentence’s score is computed as if
it is included in the extract.
The one with highest score is extracted, repeat until the
required length reached.
The first extracted sentence is always the one with highest
content-based score.
It is possible to extract S1 and S2 but in a later iteration extract
S3 between S1 and S2 that violates continuity principle with S2.
1
4
5
7
8
9
10
1
3
4
5
6
7
8
9
4
6
7
9
2
10
The Evolutionary Algorithm



The inclusion of a sentence in the above method depends on
sentences existing in the summary.
A specific type of evolutionary algorithms are genetic algorithm
which encode the problem as a series of genes, called
chromosome.
Our genes take integer values representing the position of
sentence in document.
The Evolutionary Algorithm



Genetic algorithms use a fitness function to assess how good a
chromosome is, in our case the function is the sum of the
scores of the sentences.
Genetic algorithms use genetic operations to evolve a
population of chromosomes, in our case use weighted roulette
wheel selection to select chromosomes.
Once several chromosomes selected, they are evolved using
crossover and mutation.
The Evolutionary Algorithm

We use the single point crossover operator and two mutation
operators.



The first one replaces the value of a gene with a randomly
generated integer value (try to include random sentences in the
summary).
The second replaces the values of a gene with the value of the
preceding gene incremented by one (introduce consecutive
sentences in the summary).
Start with a population of randomly generated chromosomes
which is then evolves using the operators, each has a certain
probability of being applied.
The Evolutionary Algorithm


The best chromosome (the one with highest fitness score)
during all generations is the solution to the problem.
In our case we iterated a population of 500 chromosomes for
100 generations.
Evaluation and Discussion


We evaluated on 10 scientific papers on Journal of Artificial
Intelligence Research, total 90000 words, given that from each
text we produce eight different summaries which had to be
assessed by humans, the evaluation was very time consuming.
The quality of a summary can be measured in terms of
coherence, cohesion and informativeness.



Cohesion is indicated by # of dangling anaphoric expressions.
Coherence is indicated by # of ruptures in the discourse.
For informativeness we compute the similarity between summary
and document.
Evaluation and Discussion


In evaluation, TFIDF extracts sentences with highest TFIDF
scores, Basic method refers to the content-based scoring, Greedy
and Evolutionary are two algorithms which additionally use the
continuity principle.
Noticing only slight improvement in the 3% summary, we
decided to increase the length to 5% (value shown in brackets).

We consider a discourse rupture occurs when a sentence seems
completely isolated from the rest of the text.




Usually happens due to presence of isolated discourse markers such as
firstly, however, on the other hand,…
For 3% summaries, context information has little influence because the
indicating phrases has greater influence on coherence than the
continuity principle.
When longer summaries, evolutionary algorithm better than basic
method in all cases, but greedy algorithm not.
We believe that the improvement is due to the discourse information
used by the methods.



Even though anaphora is not directly addressed here, a subsidiary
effect of improving local cohesion should decrease # of dangling
references.
As in the case of DR, greedy algorithm does not perform significantly
better than the basic method.
Most frequent dangling references were due to referring to tables,
figures, definitions and theorems (e.g. As we showed in Table 3 …).



We use a content-based evaluation metric (Donaway et al., 2000)
which computes similarity between summary and document.
The evolutionary algorithm does not lead to major loss of information,
and for several text this method obtains highest score.
In contrast, the greedy method seems to exclude useful information,
for several texts, performing worse than basic method and baseline.
Conclusion and Future Work



We presented two algorithms combining content and context
information. Experiments show that the evolutionary method
performs better in coherence and cohesion, and does not
degrade the information content.
One could argue that 5% summary is too long, but these
summaries can be shortened by using aggregation rules where
two sentences referring to the same entity merged into one.
We intend to extend the experiments and test combination of
centering theory’s principle and the evaluation using other types
of texts.