forked from stefaniegehrke/dhd2016-boa
-
Notifications
You must be signed in to change notification settings - Fork 1
/
vorträge-056.xml
271 lines (271 loc) · 19.6 KB
/
vorträge-056.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:id="vorträge-056">
<teiHeader>
<fileDesc>
<titleStmt>
<title>Technical and social Infrastructures for the Humanities: The Example of the Dagaare-English-Cantonese Dictionary</title>
<author>
<name>
<surname>Bodomo</surname>
<forename>Adams</forename>
</name>
<affiliation>Universität Wien, Institut für Afrikawissenschaften; AT</affiliation>
<email>[email protected]</email>
</author>
<author>
<name>
<surname>Wandl-Vogt</surname>
<forename>Eveline</forename>
</name>
<affiliation>Österreichische Akademie der Wissenschaften, Austrian Centre for Digital Humanities; AT</affiliation>
<email>[email protected]</email>
</author>
<author>
<name>
<surname>Mörth</surname>
<forename>Karlheinz</forename>
</name>
<affiliation>Österreichische Akademie der Wissenschaften, Austrian Centre for Digital Humanities; AT</affiliation>
<email>[email protected]</email>
</author>
</titleStmt>
<editionStmt>
<edition>
<date>2015-10-17T16:04:32.009000000</date>
</edition>
</editionStmt>
<publicationStmt>
<publisher>Elisabeth Burr, Universität Leipzig</publisher>
<address>
<addrLine>Beethovenstr. 15</addrLine>
<addrLine>04107 Leipzig</addrLine>
<addrLine>Deutschland</addrLine>
<addrLine>Elisabeth Burr</addrLine>
</address>
</publicationStmt>
<sourceDesc>
<p>Converted from an OASIS Open Document</p>
</sourceDesc>
</fileDesc>
<encodingDesc>
<appInfo>
<application ident="DHCONVALIDATOR" version="1.17">
<label>DHConvalidator</label>
</application>
</appInfo>
</encodingDesc>
<profileDesc>
<textClass>
<keywords scheme="ConfTool" n="category">
<term>Vortrag</term>
</keywords>
<keywords scheme="ConfTool" n="subcategory">
<term></term>
</keywords>
<keywords scheme="ConfTool" n="keywords">
<term>Multilingualität; Multikulturalität; Diversity</term>
</keywords>
<keywords scheme="ConfTool" n="topics">
<term>Teilen</term>
<term>Umwandlung</term>
<term>Strukturanalyse</term>
<term>Annotieren</term>
<term>Bereinigung</term>
<term>Veröffentlichung</term>
<term>Kollaboration</term>
<term>Daten</term>
<term>Standards</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<text>
<body>
<div type="div1" rend="DH-Heading">
<head>Introduction</head>
<p>This paper introduces into the transformation process of the Dagaare – Cantonese
– English dictionary into an open, online research infrastructure in the
framework of European research infrastructures and – in doing so – open those
for Non-European researchers, research data as well as topics.</p>
<p>The trilingual dictionary is designed for use in lexicographical and linguistic
field methods training. It serves as a database to illustrate many linguistic
principles and phenomena in phonology, morphology, syntax and semantics. First
and foremost it is intended as a reference source for Chinese and English
speaking students. <lb/>Dagaare is a language spoken in Ghana and Burkina Faso
by about two million people. It belongs to the Gur branch of the Niger-Congo
family. In spite of the fact that Dagaare is genetically unrelated to Chinese,
there are some interesting typological features under which the two languages
can be compared. To illustrate, both Dagaare and Chinese are tone languages,
unfortunately lacking audio files in the printed dictionary version. But while
Chinese has a complex system of four to nine tonemes, Dagaare - like most West
African languages - has a two-tone system. The first part of the dictionary
includes information of the orthography and sound system of Dagaare followed by
an explanation of the verbal and nominal morphology of this language. Part two
is the proper dictionary which comprises more than thousand head words and a
total of 3,000 to 4,000 words. Subsequent to the lexicon are represented sample
field work projects that are intended to aid both the field trainer and trainee.
They cover the areas of phonology, morphosyntax, lexical semantics ans
sociolinguistics. </p>
<p>The valuable lexicographical data mentioned before were meant to be made
sustainable available on the internet. To this end, they had to be transferred
into an existing infrastructure, the research infrastructure for lexicography
available at the Austrian Centre of Digital Humanities (ACDH) of the Austrian
Academy of Sciences. The Academy has a long-standing tradition in eLexicography
to which several departments contributed over more than hundreds of years. Most
recently, the ACDH hosts a research group on eLexicography, the lexicography
laboratory (1.1.2015-), to support, coordinate and methodically explore
experimental scholarship in the fields of lexicography. <lb/>The emerging
infrastructure is made up of several components: (1) an editor, (2) a formalised
encoding framework, (3) a depositing back end and (4) a publishing system, all
of which have been integrated into one system. An important keyword in this
endeavour has been modularisation, the system not being one single piece of
software but a number of complementary components that interlock neatly through
clearly defined interfaces. </p>
</div>
<div type="div1" rend="DH-Heading">
<head>Workflow</head>
<p>The work of integrating the lexicographical data into the infrastructure was
performed by the eLexciography working group of the ACDH. The workflow is a five
step procedure: <lb/>(1) analysing and discussing the research focus and data
structure <lb/>(2) converting the data from a simple table into a
standards-based XML format (<ref target="http://www.tei-c.org/release/doc/tei-p5-doc/en/Guidelines.pdf">TEI P5</ref>)<lb/>(3) importing it into
the database <lb/>(4) manual post processing <lb/>(5) and publication on the
internet. <lb/>At the end of the process the data will be available in a
persistent manner. </p>
</div>
<div type="div1" rend="DH-Heading">
<head>Editor</head>
<p>The <ref target="http://www.oeaw.ac.at/acdh/vle">Viennese Lexicographical Editor</ref> (VLE) is a fairly new piece
of software that first came into existence as a by-product of an entirely
different development activity: the creation of an interactive online learning
system for university students. Thus, it was first used in a collaborative
glossary editing project carried out as part of university language courses at
the University of Vienna. As the tool proved to be flexible and adaptable
enough, it was also used and further developed in a number of other projects
collecting lexical data. </p>
<p>The interface is built around an XML editor that allows to process standard-based
lexicographic and terminological data. Basically any XML-based formats such as
LMF, TBX, RDF or TEI can be handled. The program provides a number of useful
functions to automate editing procedures. It can check the structural integrity
(well-formedness) of input on the fly. Technologically, it draws not only on the
XML core specification but also on several cognate technologies. XSLT and XPath
play an important role both for visualising and modifying existing datasets.
Lexicographers can insert elements on the basis of predefined XML schemas. Most
of the functions can be applied both to single and multiple lemmas. One of the
most recent improvements is a versioning system and an improved working mode
that allows lexicographers to work on the XML data without actually seeing the
tags. Furthermore, the editor also has a configurable interface enabling
lexicographers to access external corpora and to integrate example sentences
from them into dictionary entries. The communication between the dictionary
client and the server has been implemented as a RESTful web service.</p>
<p>The tool forms part of Austria’s contribution to the pan-European <ref target="http://www.clarin.eu/">CLARIN</ref> Research Infrastructure
Consortium and is freely available from the <ref target="http://www.oeaw.ac.at/acdh/vle">ACDH Website</ref>.</p>
</div>
<div type="div1" rend="DH-Heading">
<head>Formalised encoding framework</head>
<p>While the list of formats used in the lexicographic community is unfortunately
very long, there exists a de- facto standard which has been used widely in many
digital humanities projects, in numerous lexicographic projects and most of the
ACDH’s lexicographic endeavours: the Guidelines of the Text Encoding Initiative
(TEI). The application of digital (de-facto) standards in building digital
language resources is of particular concern when we think about interoperability
and re-usability of resources. The buzz-word of open life-cycles for research
data will remain meaningless unless researchers succeed in achieving a certain
degree of harmonisation in structuring their data and
meta data. The ACDH has been working on specialised schemata based on the
TEI (P5) dictionary module for quite some time. In all these efforts, they
have also aimed at a high degree of interoperability with the ISO standard
<ref target="http://www.lexicalmarkupframework.org/">LMF</ref> (Lexical
Markup Framework). In order to realise mechanisms for cross-dictionary
access, they have also been working with semantic technologies such as RDF
and SKOS.
</p>
<p> The basic reduced TEI schemata have been documented in form of guidelines which
give detailed accounts of how dictionaries in the ACDH collection were encoded.
These guidelines document and discuss the schema and furnish a number of
examples taken from actual dictionaries. The target group for this guide are
both the lexicographers working on ACDH projects as well as others who might
want to work along similar lines. These particular Guidelines were themselves
produced making use of the TEI framework. </p>
</div>
<div type="div1" rend="DH-Heading">
<head>Depositing infrastructure</head>
<p>The dictionary editor is a web-based application that allows lexicographers to work in groups. The data is stored on a server of CLARIN Centre Vienna. Being part of an official infrastructure, long-term availability will be vouchsafed. In addition, the owners of the lexicographical data can draw copies of their dictionaries at any time of the compilation process. </p>
</div>
<div type="div1" rend="DH-Heading">
<head>Publishing framework</head>
<p>The publishing infrastructure builds on
<emph>corpus_shell,</emph> a service-oriented architecture and a distributed and heterogeneous virtual landscape. The core functionality of this modular framework is to expose well-defined interfaces based on acknowledged standards. The principle idea behind the architecture is to decouple the modules serving data from the user-interface components. One of the nice features of the system is that you can build new interfaces via XSLT styles almost on the fly.
</p>
</div>
<div type="div1" rend="DH-Heading">
<head>Status and outlook</head>
<p>The conversion and import of the data has already been undertaken. Dagaare – English – Cantonese Dictionary is already available online. However, the working group is still improving the web-interface, a stable URL will be assigned by the end of 2015.</p>
<p>The Dagaare – English – Cantonese Dictionary is about to be improved from a content as well as collaboration / social infrastructure point of view:</p>
<p>(1) Audio files are to be added to support – mainly – the representation of the tone languages Dagaare and Cantonese. Doing so, we enlarge the network of people participating into the project for both,</p>
<p>(a) free and open Wikimedia audio tools as well as </p>
<p>(b) high performance audio tools e.g. supported by Forschungszentrum
Telekommunikation Wien (FTW <ref target="http://www.ftw.at/"
>http://www.ftw.at/</ref>) or Phonogrammarchiv at the AAS
<ref target="http://www.phonogrammarchiv.at/">http://www.phonogrammarchiv.at/</ref>,</p>
<p>(c) speakers with Cantonese mother tongue. (Bodomo Adams himselves represents Dagaare mother tongue).</p>
<p>(2) The dictionary will be fully embedded into the lexicographical research infrastructure of ACDH as well as the research of the Institut für Afrikawissenschaften at the University of Vienna. This implies both,</p>
<p>(a) experimental development of the dictionary content applying methods of other
disciplines e.g. Natural Language Processing and Semantic Technologies for
interlinking with other dictionaries, semi-automatic translation into other
languages starting with German, connecting with cultural content etc., e.g.
interlinking with cultural resources like songs; (b) embedding it into a
research framework for African Diaspora studies. <lb/>In doing so, the
representatives of both institutes open towards collaboration of global
communities of several disciplines that are until now not in touch. </p>
</div>
</body>
<back>
<div type="bibliogr">
<listBibl>
<head>Bibliographie</head>
<bibl>
<hi rend="bold">Bodomo, Adams</hi> (2004): <hi rend="italic">Dagaare –
Cantonese – English Dictionary for Lexicographical Field Research
Training</hi> (= Afrikawissenschaftliche Lehrbücher 14). Köln: Köppe. </bibl>
<bibl>
<hi rend="bold">Bodomo, Adams / Mora, Manolete</hi> (2007): “Documenting
Spoken and Sung Texts of the Dagaaba of West Afrika”, in: <hi rend="italic"
>Empirical Musicology Review</hi> 2, 3: 81-102. </bibl>
<bibl>
<hi rend="bold">Budin, Gerhard / Majewski, Stefan / Moerth, Karlheinz</hi>
(2012): “Creating Lexical Resources in TEI P5”, in: <hi rend="italic"
>Journal of the Text Encoding Initiative</hi> 3 <ref
target="https://jtei.revues.org/522">https://jtei.revues.org/522</ref>
[letzter Zugriff 08. Februar 2016]. </bibl>
<bibl>
<hi rend="bold">Budin, Gerhard / Moerth, Karlheinz</hi> (2011): “Hooking up
to the corpus: the Viennese Lexicographic Editor’s corpus interface”,in:
Kosem, Iztok / Kosem, Karmen (eds.): <hi rend="italic">Electronic
lexicography in the 21st century</hi>. New applications for new users.
Proceedings of eLex 2011 conference. Bled, Slovenia: Trojina, Institute for
Applied Slovene Studies 52-59.</bibl>
<bibl><hi rend="bold">Budin, Gerhard / Moerth, Karlheinz / Durco, Matej</hi>
(2013): “European Lexicography Infrastructure Components”, in: Kosem, Iztok
/ Kallas, Jelena / Gantar, Polona / Krek, Simon / Langemets, Margit /
Tuulik, Maria (eds.): <hi rend="italic">Electronic lexicography in the 21st
century: thinking outside the paper</hi>. Proceedings of the eLex 2013
conference, 17-19 October 2013. Tallin, Estonia: Trojina, Institute for
Applied Slovene Studies / Eesti Keele Instituut 76-92. </bibl>
<bibl>
<hi rend="bold">Declerck, Thierry / Lendvai, Pirsoka / Moerth,
Karlheinz</hi> (2013): “Collaborative Tools: From Wiktionary to LMF, for
Synchronic and Diachronic Language Data”, in: Francopoulo, Gil (ed.): <hi
rend="italic">LMF</hi>. Lexical Markup Framework. London / Hoboken: John
Wiley & Sons 175-186. </bibl>
<bibl>
<hi rend="bold">Declerck, Thierry / Moerth, Karlheinz / Wandl-Vogt,
Eveline</hi> (2014): “A SKOS-based Schema for TEI encoded Dictionaries
at ICLTT”, in: <hi rend="italic">LREC 2014, Ninth International Conference
on Language Resources and Evaluation</hi>. Reykjavik, Iceland: European
Language Resources Association 414-417.</bibl>
</listBibl>
</div>
</back>
</text>
</TEI>