[20640404 - Acta Periodica Duellatorum] Translation memory and computer assisted translation tool for medieval texts.pdf

(504 KB) Pobierz
Acta Periodica Duellatorum, Scholarly section
28
DOI 10.1515/apd-2015-0008
Translation memory and computer assisted
translation tool for medieval texts
Attila Törcsvári, Ars Ensis
Abstract – Translation memories (TMs), as part of Computer Assisted Translation
(CAT) tools, support translators reusing portions of formerly translated text. Fencing
books are good candidates for using TMs due to the high number of repeated terms.
Medieval texts suffer a number of drawbacks that make hard even “simple”
rewording to the modern version of the same language. The analyzed difficulties are:
lack of systematic spelling, unusual word orders and typos in the original. A
hypothesis is made and verified that even simple modernization increases legibility
and it is feasible, also it is worthwhile to apply translation memories due to the
numerous and even extremely long repeated terms. Therefore, methods and
algorithms are presented 1. for automated transcription of medieval texts (when a
limited training set is available), and 2. collection of repeated patterns. The efficiency
of the algorithms is analyzed for recall and precision.
Keywords – natural language processing, translation memories, computer
assisted translation
I. INTRODUCTION
Reconstruction of the meaning of medieval texts, especially codices, treaties and
Hausbuchs,
often available as manuscripts, is always a challenging task from a number of
aspects; the transliteration of handwriting, the specialties of the local dialect, yet non-
standardized spelling, simple typos and colloquial style—so to say,
syntactical difficulties—
are all obvious obstacles that precede in order and relevance the final aim: the
interpretation of the content itself; in the actual case, understanding, physical testing and
using in trainings and in practice the actions and techniques described in
Fechtbuchs.
This
interpretation is, besides considering media-rich content, such as video trainings, firstly
manifested in written form: either a translation to the modern version of the same
language—called
modernization
in this paper—in which the text was originally written, or
in translation to another language.
However, producing this “written form”, even the modernization, is not free from an
interpretative attitude of the experts of the field, right because of the wish and best will
of the transcriptor to provide an understandable text for the benefit of the readers, who
are not expected to make all the effort of resolving certain issues in the original. An
unquestionably important merit of this interpretative attitude during transcription is,
indeed, a kind of translation: replacement of obsolete terms to their contemporary
counterpart.
Acta Periodica Duellatorum, Scholarly section
29
Another challenge during both modernization and translation is to achieve a certain
consistency
so that the same terms and expressions of the original would be represented in
the same way in the transcripted or translated text, at least, whereas the context allows.
The efforts to analyze and propose possible solutions for supporting modernization and
translation were not made without practical reasons; we have kept in mind the primary
goal of translating Johannes Lecküchner’s “künst
vnd zedel ym messer”
[i]
(Lecküchner [1482]),
"The Art of Messer Fencing", Cgm 582 to Hungarian. This fencing
manual was completed in 1482, as a beautifully illustrated manuscript, and based on a
former manuscript of the same author. The text was transcribed and published by
Carsten und Julia Lorbeer, Johann Heim, Robert Brunner und Alexander Kiermayer,
under
http://www.pragmatische-schriftlichkeit.de/cgm582.html
[ii]
(Lorbeer et al [2006]),
and
used with the permission of the authors.
1
In this paper I present a method of automation of modernization of German medieval
texts to contemporary spelling and vocabulary, and presenting techniques to reduce
translation work and achieve consistency of translation by using translation memories.
II. ANALYSIS OF THE CORPUS, THE CURRENT TRANSLATION
PRACTICE AND STATISTICS
A set of well-known
Fechtbuchs
were analyzed to see the feasibility and possible benefits of
using a computer assisted approach of transcription and translation.
1. Analysis on effect of modernization
Our primary target was the translation of the original Early New High German
[ENHG] text.
The transcription was made by a team of researchers, as mentioned above. They have,
used
“a computer aided approach to find transcription errors by counting and finding all variations of
all words in the text”.
The scientific version of the transcription published in
[ii]
(Lorbeer et
al [2006])
contains all the notations and clarifications made on the original text, with
highest respect not only to the original, but also the pronunciation, usual spelling and
abbreviations at that time.
A considerable part of the text was translated to modern German, published by Falko Fritz
2
:
Pages
Paragraphs
Original
432
874
Modernized
79 complete pages
259
Ratio
18%
30%
1
“after talking to all co-authors we give you permission to analyze our transcription of cgm582, quote some parts
in your scientific article and to translate it to Hungarian.”
(Carsten Lorbeer)
2
http://www.hammaborg.de/de/transkriptionen/leckuechner_cgm582/index.php,
as downloaded in
September, 2012
30
Acta Periodica Duellatorum, Scholarly section
Though the modern German translation covered about a fair 30% (the most important
parts), it was still found that relevant techniques are detailed in the non-modernized part.
In order to estimate the difficulty of reading non-modernized text, a simple test was made
with a native German speaker trained in proofing and checking documents under various
conditions.
Similar size sections were selected from the translated and non-translated part (with
notations removed, but additions provided by the translators kept). To measure the
effect of getting used to the spelling and learning the vocabulary of the text, in both
cases a training page was given to the reader. As a third test, a piece of text was
manually modernized. The time required for simple reading was measured.
Translated
Training page
Test page
1’20“
1’23“
Original
2’03“ (~150%)
1’46“ (~125%)
Manually
modernized
1’44 (~125%)
1’20” (
)
From the tests we have concluded, that, it causes, as expected, measurable difficulties
(+25%) for the reader to interpret the spelling and vocabulary of the original 15
th
century text, even after training.
A more interesting test pointed out that the manually modernized text required about the
same speed as reading the translated text, at least after the training.
2. Translation
2.1 Modernization and translation issues detected in well-known texts are given
below as examples
1. Peter von Danzig, Longsword
3
20 v
… mit dem rechten fuess…
…vnd spring
mit dem rechten
fuess
hinder seinen lincken
füeß…
…mit dem rechten Fuß…
…Spring
mit deinem rechten
hinter seinen linken Fuß…
5
1
In the above case, the original text seems containing an overbroad word—but
may be
considered
4
more accurate than the modernized version.
2. Joachim Meyer, Longsword, „Gründtliche
Beschreibung…”,
ed. 1570
5
VIrv
Ochs
Ox
3
http://www.hammaborg.de/en/transkriptionen/peter_von_danzig/02_langes_schwert.php,
as downloaded
in September, 2012
4
Personal interpretation of the author of this article, marked with Italics in this article.
5
http://wiktenauer.com/wiki/Joachim_Me%C3%BFer/Longsword,
as downloaded in September, 2012
Acta Periodica Duellatorum, Scholarly section
…Zum Lincken Ochsen schick dich disem
zugegen / nemlich
trit mit dem Rechten Fuß
vor…
Vom versetzen ein nützliche vermanung
Schick dich in die Zornhut / wirt denn auff dich
von Oben her gehauwen / so
trit mit dem
Rechten fuß
Zirckel
…wischt er als dann mit den Armen undersich
dem Schwerdt nach / so
trit mit dem Rechten
fuß
wol beseits auff sein Rechte seiten…
…For the Left Ox reverse this, namely
stand with your Right Foot forward…
31
XVIv
XXrv
Of Displacing, a useful concept
Place yourself into the Wrathful Guard, if
you are then struck from above, then
step
with the right foot forward…
Circle
…the sword thus clips him with your arms
under yourself, then
step with the right
foot
to take on his right side…
It is clear, that the translator used different translations of the word “trit” (step) for a good
reason: the first case the translator took a static concept (stand), since speaking about a stance,
while in the second used a motion verb (step), expressing the movement of the foot.
Therefore, it is obviously not a translation mistake, to use “stand” instead of “step”.
However, even for a stance, to reach the proper position from the previously described
Right
Ox,
one must, indeed, make a step. Taking in consideration the
training concept,
that seemed
the original intention of the esteemed Author according to the Introduction
6
, it
may be
a
more appropriate translation to take a “step” rather than “standing” with right foot forward.
These cases are not at all translation mistakes, but
can be considered
as immediate
interpretations during translation, either “undertranslations” or “overtranslations”.
Judging all such particular cases, if discovered at all, takes some time for the reader.
Naturally, this time cannot be measured in any way to the time spared by the translators
providing us an already digested content. However, it may be more faithful to the
original providing a translated version that is consistent or inconsistent to the extent of
the original—or, at least, applying necessary marks in the translated version.
3. Statistics
A simple statistic was made for estimating the possible reuse ratio of terms in the
translated part (until page XL) of Meyer’s
Fechtbuch.
The terms were selected by
removing a minimal number of German stopwords (just definite and indefinite articles).
The file size was about 55 kB, there were about 350 independent terms found occurring
more than once—the maximal occurrence was 19 for “gegen
seiner Lincken”,
and,
6
“… from your clarity attain and exude the proper judgement in Stance and Strikes so that Youth will not have
to learn this art unguided because of your unspoken word…”
/
“…wie sie soll auß den erklerten häuen und Legern ins werck gericht werden / auff das nit allein die Jugend so
sich auff solche kunst zubegeben willens / durch solche inen unbekandte wort…”
(translated by Mike
Rasmusson)
32
Acta Periodica Duellatorum, Scholarly section
intrestingly
7
, 10 times only “gegen
seiner Rechten”—resulting
in about 8 kB sparing when
translating them only once, which is about
14% of the original.
In the subject corpus,
there were also some extremely long repeated n-grams detected, composed of 12
consecutive words.
This was according to my assumptions—due to the narrow scope of the
Fechtbuchs
and
the disciplined wording of the author.
4. Conclusion
From the statistics of modernization one can deduct, that providing a modernized version
decreases reading time. In the subsequent chapter it is presented, that
automation of
modernization
is feasible.
The above drawn (minor) translation inconsistencies and also considerable translation, or at
least, typing work can be supported by
translation memories.
III. AUTOMATING MODERNIZATION
The German original, given in early new high German, looks somewhat unfamiliar to
contemporary readers—and also for computers. As an average, about 50% (19k words
from 38k words) are reported as spell errors.
8
The baseline translation from the original to English, using Google Trans, resulted, as
expected, a poor translation: about 32% (279 of 860, using the chapter about
Zornhau)
of
the words were not found. As a comparison, the manually modernized version contains
about 3% of the words that could not be translated to English by Google Trans.
The above are not surprising, because of basic and also less obvious spelling issues.
At the first sight, many of the problems can be resolved by simple word-by-word, or, at
most, some pattern based replacement:
vechten
yn
fechten
ihn
vnd
seytten
und
seiten
As seen above, a simple word-to-word rewriting procedure, when a dictionary is
available, will provide a simpler to understand text, assuming, that such a dictionary can
be either obtained, created from scratch or built.
Applying manual translation caused nearly 5 minutes per page when producing the test
samples. Automation seemed therefore necessary.
7
Besides the mere statistical fact, the latter finding is very important for a fencer, since it points
out a main characteristics of Meyer’s school of fence. However, it may worth a detailed study to
compare the various
Fechtbuchs
from this aspect.
8
Using Microsoft Word German (Germany) spell checker.
Zgłoś jeśli naruszono regulamin