Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
FIGURE 1 The Rostock Historical English Newspaper Corpus from 1700 to today year corpus-line 1 ('down-market') corpus-line 2 ('mid-market') 1700 Post-Man Post-Boy The Evening Post 1730 Penny London Post 1760 Penny London Post Journal 1790/1800 Bell's Weekly Messenger 1830 Bell's Life in London World 1860 Lloyd's Weekly Newspaper 1890/1900 The Daily Mirror Daily Sketch Daily Mail Express Daily T G 1930 The Daily Mirror Daily Sketch Daily Mail Express Daily T G 1960 The Daily Mirror Daily Mail Express Daily T G 1990/2000 The Daily Mirror Daily Mail Express Daily T G London Journal The London Evening Post Bingley's The London Evening Post The World The London Evening Post News of the The Standard Post Daily Telegraph P.S.: classification of newspapers is still preliminary The Sun The Sun The Evening Standard London Post T London Post T Morning Post T Morning Post T G Morning T G Morning Post T G Introducing the Rostock Historical Newspaper Corpus: From 1700 to Today Kristina Schneider INSTITUT FÜR ANGLISTIK/AMERIKANISTIK, FB SPRACH- UND LITERATURWISSENSCHAFTEN, UNIVERSITÄT ROSTOCK, AUGUST-BEBEL-STRASSE 28, D-18051 ROSTOCK In order to investigate the historical development of English newspapers and news writing in general and to explore the roots of popular English journalism in particular, a historical newspaper corpus is currently being assembled at Rostock University. The aim of this abstract is to present briefly the main characteristics of the corpus, to explain the rationale behind the selection of newspapers and text samples and to introduce some features chosen for the analysis of the changing style of news writing. The Rostock corpus is a collection of English newspapers from 1700 to the present in 30-year intervals. An average span of 30 years was chosen because it can be taken to roughly represent one generation, and newspaper language - as well as language in general - is not likely to change much faster than from one generation to another. The corpus is made up of three distinct corpus-lines, namely two popular lines (down- and mid-market papers) and one quality line (up-market papers). Each corpus-line and period is represented by a 20,000 word sample (de Haan 1992) taken from two newspapers, yielding a total corpus size of 600,000 words. The designations 'down-, mid- and up-market' should be approached with some caution and have therefore been put in inverted commas because a relatively precise distinction between down-market papers (line 1), mid-market papers (line 2), and up-market papers (line 3) can only be applied for the 20th century (e.g. Jucker 1992). For the 18th and 19th centuries, similar criteria like the distinction between official and non-official papers, cut-price and full-price papers (e.g. Harris 1938), the popularity of Sunday papers (e.g. Berridge 1978), circulation figures (e.g. Sutherland 1935) and a content analysis (percentage of hard and soft news) have been used to fit the older papers into these categories. As far as the content is concerned, only prototypical news reports from British newspapers - i.e. reports on foreign and home news written by newspaper staff have been included in this corpus. There are several reasons for not analysing the whole content of these papers. Firstly, news reports have always been part of newspapers and are thus ideal for diachronic research. Secondly, the concentration on a certain text type is recommendable to ensure the comparability of different samples. And last but not least, news reports have been chosen for this corpus because they are usually made up of complete sentences, a necessary prerequisite for syntactic studies which will be part of this research. Features chosen for the subsequent corpus analysis include overall sentence length, sentence complexity, noun phrase complexity, passive constructions and personal pronouns, lexical diversity and word length as well as headline development. Looking at one of the most striking features, namely the change in average sentence length, first results have shown that there is not only an overall trend towards shorter sentences from 1700 to the present, but also that this tendency has indeed always been more pronounced in the more popular papers. Thus, a distinction between popular and quality papers before the 19th century seems to be justified. It has to be emphasized, however, that both the compilation and analysis of this corpus are still ongoing and further features will be selected for analysis. Selected Bibliography Berridge, V. (1978). Popular Sunday papers and mid-Victorian society". In: G. Boyce et al. (eds), Newspaper history from the seventeenth century to the present day. Beverly Hills, California. de Haan, P. (1992). The optimum corpus sample size? In G. Leitner (ed.), New directions in English language corpora: methodology, results, software developments. Mouton de Gruyter, Berlin/New York, pp. 3-19. Harris, M. (1938). London newspapers in the age of Walpole. London. Jucker, A.H. (1992). Social stylistics - syntactic variation in British newspapers. Mouton de Gruyter, Berlin. Sutherland, J.R. (1935). The circulation of newspapers and literary periodicals, 1700-30. Library, 4(15): 110-124.