Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Answering Medieval Authorship Questions using e-Science Michael Meredith & Peter Ainsworth, The University of Sheffield, UK Overview The provenance and authorship of medieval manuscripts has long been a common research question across multiple disciplines of the arts and humanities. Key questions include: Where and by whom were these manuscripts created?, and What does the codicological evidence – scribal hands, catchwords, page layouts, artistic styles in the miniatures and marginal decoration – suggest about book production in this period? This paper focuses on 15th-century manuscripts (Froissart’s Chronicles) that were made during the Hundred Years’ War and shows how e-Science is allowing us to explore and aid the humanist in identifying the characteristic stylistic, orthographic and iconographic ‘signatures’ of particular scribes and artists, through the use of image recognition and data mining techniques. The application of computer-science algorithms employed in this research includes image edge detection, polygonal model fitting and geometric comparisons. Additionally, we discuss how these technologies are applicable to other datasets, assisting scholars to dig into image data to determine authorship-related questions. Methodology The research described in this paper is being undertaken as part of the larger Digging into Image Data collaboration between an international, multi-disciplinary team of researchers from the University of Sheffield, UK, Michigan State University, USA, and the University of Illinois at Urbana-Champaign, USA. The collaboration has been assembled to investigate the topic of authorship across a diverse set of image data: 15thcentury Froissart manuscripts, 17th- and 18th-century digitised maps, and 19th- and 20th-century digitised quilts. One of the primary questions we are attempting to answer is whether adaptive image analytics can attribute authorship and, if so, how accurate and computationally scalable they are when applied to diverse collections of image data. The first step in answering this question is for computer scientists to gather an understanding of what tell-tale signatures scribes leave by working closely with humanists; this first step isn’t necessarily one-sided as software algorithms can transform images into different domains and spectrums that might assist the scholar better to describe characteristic features. Armed with this understanding, research into image analytics is undertaken to extract these discriminating features in order to construct a statistical digital signature of the author. The results from the research undertaken by the computer scientists are examined by the domain experts and analytically compared using samples of data with known provenance. Future work within the overall project will use the outputs from these stages and, using high-performance computing, search across a large collection of images and datasets to cluster together images that appear to be of similar authorship. The humanists and art historians will comment on these findings. Scholarly Interpretation of Medieval Manuscripts The manuscripts considered within the project consist of text written in a reasonably well-defined medieval cursive hand. In order to construct the particular digital signature of a scribe we focus on a number of areas from which we can extrapolate meaningful patterns; it is important to note that we do not expect any individual characteristic area to be sufficient for identifying a particular scribe, it is only when several are considered together that we can draw meaning from the results. Some of these areas do however raise further research questions that, together, the scholar and computer scientist should be able to answer. Use of Abbreviations Figure 1 illustrates how abbreviations can be used to indicate differences between scribes by juxtaposing folios from two different manuscripts, each written by different scribes copying similar texts1. The scribe on the right consistently abbreviates their ‘et’s, except when they immediately follow a punctuation mark, for example. However, before we can use abbreviations as a potential identifier, further questions need to answered: 1) Were both scribes given the same physical space to write the text (are the lengths of text comparable)? 2) Do scribes consistently use abbreviations across their work? 3) Is there a pattern to the use of abbreviations that perhaps the scribe has developed? 1 Although we have aligned the text within the two manuscripts according to their accepted equivalences (i.e. corresponding sections), and indeed their text very closely correlate, we acknowledge that manuscript stemmas show that is unlikely the two scribes responsible for authoring these folios used exactly the same source The e-Science research outlined later in this paper demonstrates how we can answer these questions, further feeding into the larger and encompassing question of authorship. Flow of Text Figure 2 demonstrates how some scribes are better at filling the available space; the scribe in the right-hand side of the image better justifies the text across the column. There are also examples where the scribe slowly curves the text the further he/she gets down the page – i.e. the rows are no longer perfectly horizontal, even once digitisation shears are removed; scribes wrote the text on a flat surface. In order to validate these observations for inclusion within a digital signature, we must first determine whether a scribe is consistently neat or whether other factors influence this, such as experience, and turn-around times. The algorithms used across this research project will allow us to harvest and quickly a large volume of data to analyse on which to base further conjecture. Individual Letters and Words Within a body of text, scholars can quickly point to forms that appear to be characteristic of a particular scribe. Figure 3 illustrates how two different scribes finish the tail of their ‘g’s in different directions. Other potential tell-tale indicators between different scribes include ‘y’s, ‘est’s and ‘estre’s. The use of abbreviations, flow of text and letter forms demonstrates only a subset of differences that we are investigating. Other areas include the ductus of the text, and pen flourishes as being potential contributing factors to a scribe’s signature. Computer-aided Research Application The potential contributing indicators of a scribe’s digital signature can be addressed using algorithms based in the geometric pattern matching domain. Applying an edge detection map (based on the Sobel convolution) to the source image, we fit a polygonal model around the text for geometric comparisons. Two different ways of fitting a suitably refined polygonal model to this data are evaluated: Least Squares and statistical EM (Expectation, Maximisation). The extracted geometry provides the basis for further data analysis techniques such as Principle Component Analysis (to help answer questions regarding the overall “impression” of the text) and geometric-based image retrieval techniques (i.e. Shape comparisons), where significant contour similarities are useful to help mitigate against a degree of noise in the edge map and polygonal model fitting. We further couple the geometric analysis approach with work arising from the Online Froissart research project 2. As part of the Online Froissart project, many of the manuscript images in our dataset have been analysed and synchronised to a non-diplomatic transcription which allows both the image and text to be zoomed, panned and manipulated together. The synchronisation of text with image offers the ability to perform a textual search on patterns of characters (such as 'est') and find every occurrence between any two points of a manuscript with line accuracy on the image itself. We can also estimate how far along the line such occurrences are before we need to apply any further image analysis and identification. By combining these technologies we automatically count, crop and highlight specific sections from the large corpus of image data we have in order to draw further comparisons and conclusions. In particular regions that we know were written by a specific scribe, the data we harvest is used to determine the validity of the suggested indicators. When they are consistent, future work will fit probability models to them for larger-scale data mining. The overall approach also lends itself well to identifying general patterns of words, abbreviations, flourishes, etc, within our dataset for which we have no supporting transcription synced to the image, or in the case of abbreviations, no indicators within the transcription as to where they occur. Future Work The research outline above provides only one part of the Digging into Image Data research project. Our partners are similarly working on complementary algorithms to identify authorship across the map and quilt collections. The algorithms that are researched at each site will be cross-pollinated and applied to all the datasets (geometric shape recognition to identify reoccurring patterns on a quilt, for example), which will help us extract more robust salient characteristics in relation to determining authorship. This will be assisted with the inclusion of machinelearning techniques. The collective algorithms will then be applied across a large corpus of data using highperformance computing and the results analysed and reported by our research team. 2 See http://www.hrionline.ac.uk/onlinefroissart Illustrations and Figures MS 1 f 1v Start of equivalent sections MS 864 f 1v MS 864 f 97v MS 865 f 90v Corresponding uses of ‘et’ (abbreviated on the right hand side image) Right hand scribe writes ‘et’ as opposed to abbreviates – is there a pattern when he/she uses ‘et’ compared to the abbreviated form? ‘et’ would appear to be written when it follows a “comma” or very early on within a sentence? Further examples of abbreviated forms by the scribe in the right hand side image (many more examples are visible on the whole folio) Figure 1: Use of abbreviations between two different manuscripts/scribal hands Scribe ‘B’s letter G Figure 2: Flow of text Scribe ‘C’s letter G Figure 3: Example of different letter ‘g’s between scribal hands