PROJECT REPORT 5
Annual Report 2017
Nikolai Grube, Christian Prager, Katja Diederichs, Sven Gronemeyer, Antje Grothe, Céline Tamignaux, Elisabeth Wagner, Maximilian Brodhun, and Franziska Diehr
Digital Sign Catalog
For the classification and systematization of Mayan hieroglyphs, we developed a digital Sign Catalogue30. As an inventory of all signs it is an indispensable tool for identifying the glyphs used in a specific text. The identification and and classification of the signs is challenging, because they appear in several graphic variants and can have multiple sign functions, for example as a logograph or as a syllabic sign. Further, continuing academic discussions over the decipherment of the approximately 1000 signs arise various hypotheses for linguistic readings of individual signs.
The demands of analyzing this complex writing system and of integrating the continually changing state of research into the sign catalog necessitate a flexible data model. It must be able to react to potential changes, and the documented information must be both reproducible and verifiable. We chose an ontologically based modelling approach based on CIDOC CRM31 and GOLD32. The data model was implemented in RDF to optimally represent the semantic relations between the entities. The use of graph technology enables semantic queries of the data.
The assessment of proposed readings is a particular challenge for the decipherment and analysis of Maya writing. Multiple readings for a series of signs are attested throughout the research literature. We not only want to document, but also to qualitatively evaluate them. Drawing on the academic literature on the Maya script, we developed sets of criteria oriented toward linguistic context of use (e.g., correct part of speech, plausible text-image relationship, etc.), among other parameters (compare Kelley 1976). For each set, the criteria are linked with each other using propositional logic, so that an appropriate confidence level can be obtained according to the combination.
The goal is to record the extant texts and the objects on which they are recorded in a machine-readable corpus and to compile a dictionary based on this corpus that represents the entire vocabulary and uses of the script. The digital Sign Catalogue aids in compiling the text corpus, which will be coded in TEI/XML. However, the text does not consist of phonemically transliterated values. Rather, every sign will be encoded with a reference to the URI of the corresponding entity in the Sign Catalogue. Once it has been compiled, the corpus will remain a stable, unchanging dataset. All discoveries concerning hieroglyphic readings and the grammar of the language will be recorded outside the corpus in the Sign Catalogue and the linguistic annotation tool ALMAH.
During linguistic analysis, the confidence level determined for each proposed reading aids in preselecting the readings to be investigated. Testing a reading in the corpus can provide new or additional criteria, whereby its confidence level increases or decreases. The ascertained criteria can easily be entered into the Sign Catalogue retroactively, whereby the confidence level is adjusted accordingly. Thus, we especially respond to the need to reproducibly document decipherments that can only be achieved through linguistic analysis and investigation of the text corpus.
The goal of the graph-based model of the Sign Catalogue, together with the text corpus and linguistic analysis, is to obtain secure proposals for readings of Maya signs and to also be able to optimally present new proposed readings for signs that have yet to be deciphered (compare Diehr et al. 2017).
The inventory of signs is currently being compiled with the digital Sign Catalogue and is expected to be completed in mid-2018. In addition to entering data and creating a concordance with other glyph inventories, we are also working on vectorized drawings of the graphs and their variants. Signs that were not included in Thompson's glyph inventory will be re-classified and entered into the sign and graph database. We anticipate that there will be approximately 1000 distinct signs and approximately 3000 graphs. The catalog will then be published on our project portal, and the RDF data will be made accessible using a SPARQL endpoint. Moreover, the data will be published in the TextGrid Repository, where they can be retrieved using OAI-PMH. Documentation of the digital Sign Catalogue can be accessed at http:// idiom-projekt.de/catalogue.
At present, we are working on a comprehensive paper with the working title "Standards for Maya Epigraphic Analysis I: Principles of Maya Graphemics," which will be published on the project website over the course of the year 2018. It will contain a detailed discussion of creating the graph variants and the confidence levels and criteria, as well as additional research results that have only been presented here preliminarily. Thus, this paper will be of particular interest to the academic community engaged in the study of Maya hieroglyphic writing.
TEI Parser
Transcribing the texts is a very complex task that requires many steps to be followed. Within the TEI file, all references to the signs are indicated by referring to a TextGrid URI, rather than to their concrete phonetic transliteration or their entry number from the Sign Catalogue. Consequentially, when transcribing, the TextGrid URI for a sign must be found to indicate the sign in the TEI document. To minimize the demands of this task, we have begun to develop a TEI parser. Given the numerical transliteration as input, the parser will generate the corresponding TEI structure according to the developed schema.
The parser was designed as a plug-in for the TextGrid Lab. Hence, the write permissions to the project's internal storage always exist. By using the parser, the productivity of generating transcriptions is increased substantially. However, certain text phenomena must be encoded manually. One example is the mark-up of damage on a text-bearing object. The parser takes in the transcription information 12st. [*5009st:*128st:679st] and generates the following code from it:
The parser is not an editor, for which reason it can only be used to create files. Open and retroactive editing is not possible. In future work, the parser could be developed into an editor by expanding its functions so that it could not only create files, but also edit them.
Developing the TEI Schema
The development of the TEI schema comprises multiple work packages (WP) that deal with individual aspects of text-markup, some of which are interlocking with each other. In this, we tried to first create the structures of the TEI/XML that facilitate making a text from the Sign Catalogue machine-readable in its intended reading order. While these essential aspects were being tested on the material and correction loops were undertaken in the schema definition, we were able to address other work packages in parallel, for example, the so-called TEI-Header with the file metadata, encoding of damaged text passages, or creative aspects of text composition.
WP 01: Transcription and Semantic Text Structure
In the first work package (WP 01), we addressed the semantic structure of the texts. The goal was to render the text in TEI/XML in its logical sequence, or its reading order. A parallel issue is topographic text arrangement, which describes where the text is located on the text-bearing object (see WP 02 below). In WP 01, we answered the following questions and discussed the following goals: How can the hieroglyphs be coded in XML? How is the reading order arranged in a text field, and how are graphic variants arranged in a hieroglyphic block? The deliverables of WP 01 were initial text examples (ranging from simple to complex) created to account for all text structural phenomena in Classic Maya inscriptions. The encoding scheme for the structure resulting from WP 01 should be applicable to all texts, but it requires special extensions for the codices.
WP 02: Topographic Text Arrangement
In the second work package (WP 02), we addressed the topographic arrangement of texts. Here, "topographic" refers to the position of the text on its carrier. The semantic text structure exists in parallel; as noted, this concerns how the text is read and of which logical sequences it is composed.
In WP 02, we considered the following issue: where are text fields and images located on the text-bearing object? The results of WP 02 were to represent example texts in the TEI/XML structure, in a way that accounts for all text arrangement phenomena in Classic Maya inscriptions. The topographic text arrangement resulting from WP 02 can, in principle, be applied to all texts.
WP 03: Philological and Text-Critical Markup
The third work package concerned editorial engagement with illegible, vague, or reconstructed text passages. Editorially, this concerns a text-critical approach that can be considered later when conducting epigraphic and linguistic analysis. This package thus specifies the markup from WP 01.
WP 03 entailed engaging with the following questions: how can text passages that have been made unclear by ambiguous original spellings or later restorations, or ones that have been damaged or destroyed by physical, chemical, or biological reactions be qualified? The goal of WP 03 was to markup example text passages such that they produce clear editorial guidelines for dealing with vague materials, from which an apparatus criticus is created by...