input/xml/vorträge-056.xml

<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:id="vorträge-056">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>Technical and social Infrastructures for the Humanities: The Example of the Dagaare-English-Cantonese Dictionary</title>
        <author>
          <name>
            <surname>Bodomo</surname>
            <forename>Adams</forename>
          </name>
          <affiliation>Universität Wien, Institut für Afrikawissenschaften; AT</affiliation>
          <email>adams.bodomo@univie.ac.at</email>
        </author>
        <author>
          <name>
            <surname>Wandl-Vogt</surname>
            <forename>Eveline</forename>
          </name>
          <affiliation>Österreichische Akademie der Wissenschaften, Austrian Centre for Digital Humanities; AT</affiliation>
          <email>eveline.wandl-vogt@oeaw.ac.at</email>
        </author>
        <author>
          <name>
            <surname>Mörth</surname>
            <forename>Karlheinz</forename>
          </name>
          <affiliation>Österreichische Akademie der Wissenschaften, Austrian Centre for Digital Humanities; AT</affiliation>
          <email>karlheinz.moerth@oeaw.ac.at</email>
        </author>
      </titleStmt>
      <editionStmt>
        <edition>
          <date>2015-10-17T16:04:32.009000000</date>
        </edition>
      </editionStmt>
      <publicationStmt>
        <publisher>Elisabeth Burr, Universität Leipzig</publisher>
        <address>
          <addrLine>Beethovenstr. 15</addrLine>
          <addrLine>04107 Leipzig</addrLine>
          <addrLine>Deutschland</addrLine>
          <addrLine>Elisabeth Burr</addrLine>
        </address>
      </publicationStmt>
      <sourceDesc>
        <p>Converted from an OASIS Open Document</p>
      </sourceDesc>
    </fileDesc>
    <encodingDesc>
      <appInfo>
        <application ident="DHCONVALIDATOR" version="1.17">
          <label>DHConvalidator</label>
        </application>
      </appInfo>
    </encodingDesc>
    <profileDesc>
      <textClass>
        <keywords scheme="ConfTool" n="category">
          <term>Vortrag</term>
        </keywords>
        <keywords scheme="ConfTool" n="subcategory">
          <term></term>
        </keywords>
        <keywords scheme="ConfTool" n="keywords">
          <term>Multilingualität; Multikulturalität; Diversity</term>
        </keywords>
        <keywords scheme="ConfTool" n="topics">
          <term>Teilen</term>
          <term>Umwandlung</term>
          <term>Strukturanalyse</term>
          <term>Annotieren</term>
          <term>Bereinigung</term>
          <term>Veröffentlichung</term>
          <term>Kollaboration</term>
          <term>Daten</term>
          <term>Standards</term>
        </keywords>
      </textClass>
    </profileDesc>
  </teiHeader>
  <text>
    <body>
      <div type="div1" rend="DH-Heading">
        <head>Introduction</head>
        <p>This paper introduces into the transformation process of the Dagaare – Cantonese
          – English dictionary into an open, online research infrastructure in the
          framework of European research infrastructures and – in doing so – open those
          for Non-European researchers, research data as well as topics.</p>
          <p>The trilingual dictionary is designed for use in lexicographical and linguistic
            field methods training. It serves as a database to illustrate many linguistic
            principles and phenomena in phonology, morphology, syntax and semantics. First
            and foremost it is intended as a reference source for Chinese and English
            speaking students. <lb/>Dagaare is a language spoken in Ghana and Burkina Faso
            by about two million people. It belongs to the Gur branch of the Niger-Congo
            family. In spite of the fact that Dagaare is genetically unrelated to Chinese,
            there are some interesting typological features under which the two languages
            can be compared. To illustrate, both Dagaare and Chinese are tone languages,
            unfortunately lacking audio files in the printed dictionary version. But while
            Chinese has a complex system of four to nine tonemes, Dagaare - like most West
            African languages - has a two-tone system. The first part of the dictionary
            includes information of the orthography and sound system of Dagaare followed by
            an explanation of the verbal and nominal morphology of this language. Part two
            is the proper dictionary which comprises more than thousand head words and a
            total of 3,000 to 4,000 words. Subsequent to the lexicon are represented sample
            field work projects that are intended to aid both the field trainer and trainee.
            They cover the areas of phonology, morphosyntax, lexical semantics ans
            sociolinguistics. </p>
            <p>The valuable lexicographical data mentioned before were meant to be made
              sustainable available on the internet. To this end, they had to be transferred
              into an existing infrastructure, the research infrastructure for lexicography
              available at the Austrian Centre of Digital Humanities (ACDH) of the Austrian
              Academy of Sciences. The Academy has a long-standing tradition in eLexicography
              to which several departments contributed over more than hundreds of years. Most
              recently, the ACDH hosts a research group on eLexicography, the lexicography
              laboratory (1.1.2015-), to support, coordinate and methodically explore
              experimental scholarship in the fields of lexicography. <lb/>The emerging
              infrastructure is made up of several components: (1) an editor, (2) a formalised
              encoding framework, (3) a depositing back end and (4) a publishing system, all
              of which have been integrated into one system. An important keyword in this
              endeavour has been modularisation, the system not being one single piece of
              software but a number of complementary components that interlock neatly through
              clearly defined interfaces. </p>
            </div>
            <div type="div1" rend="DH-Heading">
              <head>Workflow</head>
              <p>The work of integrating the lexicographical data into the infrastructure was
                performed by the eLexciography working group of the ACDH. The workflow is a five
                step procedure: <lb/>(1) analysing and discussing the research focus and data
                structure <lb/>(2) converting the data from a simple table into a
                standards-based XML format (<ref target="http://www.tei-c.org/release/doc/tei-p5-doc/en/Guidelines.pdf">TEI P5</ref>)<lb/>(3) importing it into
                the database <lb/>(4) manual post processing <lb/>(5) and publication on the
                internet. <lb/>At the end of the process the data will be available in a
                persistent manner. </p>
              </div>
              <div type="div1" rend="DH-Heading">
                <head>Editor</head>
                <p>The <ref target="http://www.oeaw.ac.at/acdh/vle">Viennese Lexicographical Editor</ref> (VLE) is a fairly new piece
                of software that first came into existence as a by-product of an entirely
                different development activity: the creation of an interactive online learning
                system for university students. Thus, it was first used in a collaborative
                glossary editing project carried out as part of university language courses at
                the University of Vienna. As the tool proved to be flexible and adaptable
                enough, it was also used and further developed in a number of other projects
                collecting lexical data. </p>
                <p>The interface is built around an XML editor that allows to process standard-based
                  lexicographic and terminological data. Basically any XML-based formats such as
                  LMF, TBX, RDF or TEI can be handled. The program provides a number of useful
                  functions to automate editing procedures. It can check the structural integrity
                  (well-formedness) of input on the fly. Technologically, it draws not only on the
                  XML core specification but also on several cognate technologies. XSLT and XPath
                  play an important role both for visualising and modifying existing datasets.
                  Lexicographers can insert elements on the basis of predefined XML schemas. Most
                  of the functions can be applied both to single and multiple lemmas. One of the
                  most recent improvements is a versioning system and an improved working mode
                  that allows lexicographers to work on the XML data without actually seeing the
                  tags. Furthermore, the editor also has a configurable interface enabling
                  lexicographers to access external corpora and to integrate example sentences
                  from them into dictionary entries. The communication between the dictionary
                  client and the server has been implemented as a RESTful web service.</p>
                  <p>The tool forms part of Austria’s contribution to the pan-European <ref target="http://www.clarin.eu/">CLARIN</ref> Research Infrastructure
                  Consortium and is freely available from the <ref target="http://www.oeaw.ac.at/acdh/vle">ACDH Website</ref>.</p>
                </div>
                <div type="div1" rend="DH-Heading">
                  <head>Formalised encoding framework</head>
                  <p>While the list of formats used in the lexicographic community is unfortunately
                    very long, there exists a de- facto standard which has been used widely in many
                    digital humanities projects, in numerous lexicographic projects and most of the
                    ACDH’s lexicographic endeavours: the Guidelines of the Text Encoding Initiative
                    (TEI). The application of digital (de-facto) standards in building digital
                    language resources is of particular concern when we think about interoperability
                    and re-usability of resources. The buzz-word of open life-cycles for research
                    data will remain meaningless unless researchers succeed in achieving a certain
                    degree of harmonisation in structuring their data and
                    meta data. The ACDH has been working on specialised schemata based on the
                    TEI (P5) dictionary module for quite some time. In all these efforts, they
                    have also aimed at a high degree of interoperability with the ISO standard
                    <ref target="http://www.lexicalmarkupframework.org/">LMF</ref> (Lexical
                    Markup Framework). In order to realise mechanisms for cross-dictionary
                    access, they have also been working with semantic technologies such as RDF
                    and SKOS.
                  </p>
                  <p> The basic reduced TEI schemata have been documented in form of guidelines which
                    give detailed accounts of how dictionaries in the ACDH collection were encoded.
                    These guidelines document and discuss the schema and furnish a number of
                    examples taken from actual dictionaries. The target group for this guide are
                    both the lexicographers working on ACDH projects as well as others who might
                    want to work along similar lines. These particular Guidelines were themselves
                    produced making use of the TEI framework. </p>
                  </div>
                  <div type="div1" rend="DH-Heading">
                    <head>Depositing infrastructure</head>
                    <p>The dictionary editor is a web-based application that allows lexicographers to work in groups. The data is stored on a server of CLARIN Centre Vienna. Being part of an official infrastructure, long-term availability will be vouchsafed. In addition, the owners of the lexicographical data can draw copies of their dictionaries at any time of the compilation process. </p>
                  </div>
                  <div type="div1" rend="DH-Heading">
                    <head>Publishing framework</head>
                    <p>The publishing infrastructure builds on
                      <emph>corpus_shell,</emph> a service-oriented architecture and a distributed and heterogeneous virtual landscape. The core functionality of this modular framework is to expose well-defined interfaces based on acknowledged standards. The principle idea behind the architecture is to decouple the modules serving data from the user-interface components. One of the nice features of the system is that you can build new interfaces via XSLT styles almost on the fly.
                    </p>
                  </div>
                  <div type="div1" rend="DH-Heading">
                    <head>Status and outlook</head>
                    <p>The conversion and import of the data has already been undertaken. Dagaare – English – Cantonese Dictionary is already available online. However, the working group is still improving the web-interface, a stable URL will be assigned by the end of 2015.</p>
                    <p>The Dagaare – English – Cantonese Dictionary is about to be improved from a content as well as collaboration / social infrastructure point of view:</p>
                    <p>(1) Audio files are to be added to support – mainly – the representation of the tone languages Dagaare and Cantonese. Doing so, we enlarge the network of people participating into the project for both,</p>
                    <p>(a) free and open Wikimedia audio tools as well as </p>
                    <p>(b) high performance audio tools e.g. supported by Forschungszentrum
                      Telekommunikation Wien (FTW <ref target="http://www.ftw.at/"
                      >http://www.ftw.at/</ref>) or Phonogrammarchiv at the AAS
                      <ref target="http://www.phonogrammarchiv.at/">http://www.phonogrammarchiv.at/</ref>,</p>
                      <p>(c) speakers with Cantonese mother tongue. (Bodomo Adams himselves represents Dagaare mother tongue).</p>
                      <p>(2) The dictionary will be fully embedded into the lexicographical research infrastructure of ACDH as well as the research of the Institut für Afrikawissenschaften at the University of Vienna. This implies both,</p>
                      <p>(a) experimental development of the dictionary content applying methods of other
                        disciplines e.g. Natural Language Processing and Semantic Technologies for
                        interlinking with other dictionaries, semi-automatic translation into other
                        languages starting with German, connecting with cultural content etc., e.g.
                        interlinking with cultural resources like songs; (b) embedding it into a
                        research framework for African Diaspora studies. <lb/>In doing so, the
                        representatives of both institutes open towards collaboration of global
                        communities of several disciplines that are until now not in touch. </p>
                      </div>
                    </body>
                    <back>
                      <div type="bibliogr">
                        <listBibl>
                          <head>Bibliographie</head>
                          <bibl>
                            <hi rend="bold">Bodomo, Adams</hi> (2004): <hi rend="italic">Dagaare –
                            Cantonese – English Dictionary for Lexicographical Field Research
                            Training</hi> (= Afrikawissenschaftliche Lehrbücher 14). Köln: Köppe. </bibl>
                            <bibl>
                              <hi rend="bold">Bodomo, Adams / Mora, Manolete</hi> (2007): “Documenting
                              Spoken and Sung Texts of the Dagaaba of West Afrika”, in: <hi rend="italic"
                              >Empirical Musicology Review</hi> 2, 3: 81-102. </bibl>
                              <bibl>
                                <hi rend="bold">Budin, Gerhard / Majewski, Stefan / Moerth, Karlheinz</hi>
                                (2012): “Creating Lexical Resources in TEI P5”, in: <hi rend="italic"
                                >Journal of the Text Encoding Initiative</hi> 3 <ref
                                target="https://jtei.revues.org/522">https://jtei.revues.org/522</ref>
                                [letzter Zugriff 08. Februar 2016]. </bibl>
                                <bibl>
                                  <hi rend="bold">Budin, Gerhard / Moerth, Karlheinz</hi> (2011): “Hooking up
                                  to the corpus: the Viennese Lexicographic Editor’s corpus interface”,in:
                                  Kosem, Iztok / Kosem, Karmen (eds.): <hi rend="italic">Electronic
                                  lexicography in the 21st century</hi>. New applications for new users.
                                  Proceedings of eLex 2011 conference. Bled, Slovenia: Trojina, Institute for
                                  Applied Slovene Studies 52-59.</bibl>
                                  <bibl><hi rend="bold">Budin, Gerhard / Moerth, Karlheinz / Durco, Matej</hi>
                                  (2013): “European Lexicography Infrastructure Components”, in: Kosem, Iztok
                                  / Kallas, Jelena / Gantar, Polona / Krek, Simon / Langemets, Margit /
                                  Tuulik, Maria (eds.): <hi rend="italic">Electronic lexicography in the 21st
                                  century: thinking outside the paper</hi>. Proceedings of the eLex 2013
                                  conference, 17-19 October 2013. Tallin, Estonia: Trojina, Institute for
                                  Applied Slovene Studies / Eesti Keele Instituut 76-92. </bibl>
                                  <bibl>
                                    <hi rend="bold">Declerck, Thierry / Lendvai, Pirsoka / Moerth,
                                      Karlheinz</hi> (2013): “Collaborative Tools: From Wiktionary to LMF, for
                                      Synchronic and Diachronic Language Data”, in: Francopoulo, Gil (ed.): <hi
                                      rend="italic">LMF</hi>. Lexical Markup Framework. London / Hoboken: John
                                      Wiley &amp; Sons 175-186. </bibl>
                                      <bibl>
                                        <hi rend="bold">Declerck, Thierry / Moerth, Karlheinz / Wandl-Vogt,
                                          Eveline</hi> (2014): “A SKOS-based Schema for TEI encoded Dictionaries
                                          at ICLTT”, in: <hi rend="italic">LREC 2014, Ninth International Conference
                                          on Language Resources and Evaluation</hi>. Reykjavik, Iceland: European
                                          Language Resources Association 414-417.</bibl>
                                        </listBibl>
                                      </div>
                                    </back>
                                  </text>
                                </TEI>