Download rtf-file - TU Chemnitz

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

British National Corpus wikipedia , lookup

Transcript
FIGURE 1
The Rostock Historical English Newspaper Corpus from 1700 to today
year
corpus-line 1
('down-market')
corpus-line 2
('mid-market')
1700
Post-Man
Post-Boy The Evening Post
1730
Penny London Post
1760
Penny London Post
Journal
1790/1800
Bell's Weekly Messenger
1830
Bell's Life in London
World
1860
Lloyd's Weekly Newspaper
1890/1900
The Daily Mirror
Daily Sketch
Daily Mail
Express
Daily T
G
1930
The Daily Mirror
Daily Sketch
Daily Mail
Express
Daily T
G
1960
The Daily Mirror
Daily Mail
Express
Daily T
G
1990/2000
The Daily Mirror
Daily Mail
Express
Daily T
G
London Journal The London Evening Post
Bingley's The London Evening Post
The World
The London Evening Post
News of the The Standard
Post
Daily Telegraph
P.S.: classification of newspapers is still preliminary
The Sun
The Sun
The Evening Standard
London Post
T
London Post
T
Morning Post
T
Morning Post
T
G
Morning T
G
Morning Post
T
G
Introducing the Rostock
Historical Newspaper Corpus:
From 1700 to Today
Kristina Schneider
INSTITUT FÜR ANGLISTIK/AMERIKANISTIK,
FB SPRACH- UND LITERATURWISSENSCHAFTEN,
UNIVERSITÄT ROSTOCK, AUGUST-BEBEL-STRASSE 28,
D-18051 ROSTOCK
In order to investigate the historical development of English newspapers and news
writing in general and to explore the roots of popular English journalism in
particular, a historical newspaper corpus is currently being assembled at Rostock
University. The aim of this abstract is to present briefly the main characteristics of
the corpus, to explain the rationale behind the selection of newspapers and text
samples and to introduce some features chosen for the analysis of the changing
style of news writing.
The Rostock corpus is a collection of English newspapers from 1700 to the
present in 30-year intervals. An average span of 30 years was chosen because it can
be taken to roughly represent one generation, and newspaper language - as well as
language in general - is not likely to change much faster than from one generation
to another. The corpus is made up of three distinct corpus-lines, namely two
popular lines (down- and mid-market papers) and one quality line (up-market
papers). Each corpus-line and period is represented by a 20,000 word sample (de
Haan 1992) taken from two newspapers, yielding a total corpus size of 600,000
words.
The designations 'down-, mid- and up-market' should be approached with some
caution and have therefore been put in inverted commas because a relatively
precise distinction between down-market papers (line 1), mid-market papers (line
2), and up-market papers (line 3) can only be applied for the 20th century (e.g.
Jucker 1992). For the 18th and 19th centuries, similar criteria like the distinction
between official and non-official papers, cut-price and full-price papers (e.g. Harris
1938), the popularity of Sunday papers (e.g. Berridge 1978), circulation figures (e.g.
Sutherland 1935) and a content analysis (percentage of hard and soft news) have
been used to fit the older papers into these categories.
As far as the content is concerned, only prototypical news reports from British
newspapers - i.e. reports on foreign and home news written by newspaper staff have been included in this corpus. There are several reasons for not analysing the
whole content of these papers. Firstly, news reports have always been part of
newspapers and are thus ideal for diachronic research. Secondly, the concentration
on a certain text type is recommendable to ensure the comparability of different
samples. And last but not least, news reports have been chosen for this corpus
because they are usually made up of complete sentences, a necessary prerequisite
for syntactic studies which will be part of this research.
Features chosen for the subsequent corpus analysis include overall sentence length,
sentence complexity, noun phrase complexity, passive constructions and personal
pronouns, lexical diversity and word length as well as headline development.
Looking at one of the most striking features, namely the change in average
sentence length, first results have shown that there is not only an overall trend
towards shorter sentences from 1700 to the present, but also that this tendency has
indeed always been more pronounced in the more popular papers. Thus, a
distinction between popular and quality papers before the 19th century seems to be
justified.
It has to be emphasized, however, that both the compilation and analysis of this
corpus are still ongoing and further features will be selected for analysis.
Selected Bibliography
Berridge, V. (1978). Popular Sunday papers and mid-Victorian society". In: G. Boyce et al. (eds),
Newspaper history from the seventeenth century to the present day. Beverly Hills, California.
de Haan, P. (1992). The optimum corpus sample size? In G. Leitner (ed.), New directions in English
language corpora: methodology, results, software developments. Mouton de Gruyter, Berlin/New York, pp.
3-19.
Harris, M. (1938). London newspapers in the age of Walpole. London.
Jucker, A.H. (1992). Social stylistics - syntactic variation in British newspapers. Mouton de Gruyter,
Berlin.
Sutherland, J.R. (1935). The circulation of newspapers and literary periodicals, 1700-30. Library,
4(15): 110-124.