VIP: Voice-Text Integrated System for Interpreters

Gloria  Corpas Pastor

VIP: Voice-Text Integrated System for Interpreters

Gloria Corpas Pastor

2017, Proceedings of the 39th Conference Translating and the Computer

visibility

…

description

130 pages

Abstract

This paper introduces VIP, an R&D project that explores the impact and feasibility of using Human Language Technology (HLT) and Natural Language Processing (NLP) for interpreting training, practice and research. This project aims at filling the gap in and addressing the pressing need for technology in general for interpreters, which is reported to be scarce. Although most interpreters are unaware of interpreting technologies or are reluctant to use them, there are some tools and resources already available, mainly computer-assisted interpreting (CAI) tools. VIP is working on the development of technology and cutting-edge research with the potential to revolutionise the interpreting industry by lowering costs for interpreter training, fostering an online community which shares, generates and cultivates interpreting resources; and providing an efficient interpreter workbench tool (computer assisted interpreting software). Available at: http://www.tradulex.com/varia/TC39-london2017.pdf

Key takeaways
AI

VIP aims to enhance interpreter training using Human Language Technology (HLT) and Natural Language Processing (NLP).
The Microsoft ASR system outperformed Google ASR by achieving Word Error Rates (WER) of 30-56%.
Miro Translate is a cloud-based tool designed to streamline the subtitling process for educational videos.
The study identifies the need for an integrated platform to improve interpreter productivity and efficiency.
Terminology management tools must evolve to better serve interpreters' needs in real-time contexts.

Figures (44)

Users login to the Miro Translate platform, upload a given video and select the source language. The audio is sent to the Microsoft Translator Speech API to perform automatic speech recognition and generate automatic captions. The output is displayed on the Miro Translate platform, where a set of functionalities are available so users can edit imperfect transcriptions. The figure below shows a simplified diagram of the main components of Miro Translate. ASR system

Figure 2: Screenshot of the Miro Translate interface. Considering these general criteria, Miro Translate will incorporate the following features into its interface as shown in Figure 2:

Table 2: Selection of sentences for BTEC-50 - Examples I The selection has been conducted as follows. We have selected the sentences according to the first evaluation results. When the SMT output and the NMT output received contrasted scores, that is to say 1 vs. 3, the source sentence and the SMT and NMT outputs were added to the BTEC-50. Also some of the source sentences for which the system outputs received less contrasted scores were added to the score in order to see which average or bad scores were better accepted by students according to the systems. Finally some sentences for which the SMT and NMT outputs received both a good score, i.e. 3, were added. An overview of the selection done for creating BTEC-50 appears in Table 2.

Table 4: linguistic error analysis Having completed the first manual human evaluation, we proceeded to linguistic error analysis, using the error type from the Vilar's (2006) typology. Table 4 below shows the error types, and sub-error types encountered for each system, as well as the number of times that they occur.

We have evaluated the two systems (Bérard et al., 2016) using BLEU, TER and Meteor 1.4 metrics which results are shown in Table 5 hereafter. Table 5: BLEU/TER/Meteor 1.4 mono-reference scores

From the scores given by each participant for each sentence, we have computed a mean pe sentence as well as the related standard deviation. Then we have calculated the mean of all th scores per sentences, per participants. Results are shown in table 6. Table 6: Participants’ evaluation from 1 very bad to 4 very good Participants have equally judged both system with a mean of 2.166 for the SMT system and one of 2,310 for the NMT system. This means that the participants have evaluated both systems as bad, which is equivalent to a score of 2. This confirms the assumption as well as the results of the perception assessment presented in section 3.2 that students have a negative or low perception of MT systems. Nevertheless, when focusing only on the most contrasted translations, the NMT system is evaluated as almost good, thus increasing its mean, while there is a slight decrease for the SMT mean. On the whole, the experiment returns negative perceptions, regardless of the type of MT, statistical or neuronal, even if the NMT system slightly outpaces the SMT system.

Table 7: Student evaluation from | very bad to 4 very good

Appendix A: Experiment for Participant in the Master's degree in Multilingual Specialized Translation

The tool has been designed with an open interface between the CAI tool and the ASR system of choice, provided the ASR system meets the features described in the previous section. The specially designed open structure allows users to choose the ASR engine with the best quality output for the source language, domain, and operative system without having to change o1 adapt the CAI interface. Since the tool is based mostly on language-independent algorithms, for example to deal with morphological variants (database query) and to identify numbers and acronyms, the prototype supports the integration of ASR for any input language.

comprises 421 entries and has the size of a typical glossary compiled by interpreters. For this experiment, the terminological units under investigation are defined as the one- and multi-word terms that are present both in the speeches and in the glossary. The system will be tested on this set of terms (119) as well as on the numerals (11) contained in the texts. The latency of the ASR system was not object of testing. Table 1 reports the metrics of the texts. Ideally, the system should reach a high recall. This would mean that it is able to recognize all terminological units of the transcribed speech, independent of the presence of orthographical differences. It should also have high precision, i.e. present a low number of undesired or erroneous results. This ensures that interpreters are not prompted with superfluous results which may cause distraction.

Table 2: Correctly transcribed terms and numerals Table 2 summarizes the ASR performance on the set of stimuli defined above. This result is obtained after importing the list of English specialized terms contained in the glossary. With a word error rate (WER) of 5.04% on the terminology list, the ASR system performs well in recognizing the terminological units. It is worth mentioning that importing the list of specialized words from the glossary contributed to decrease the WER from the initial value of 10.92%. The transcription of numerals was completed without errors.

Table 1. Ranking of tablet features. Bea aaa So a, Table 3 offers an overview of features interpreters seek in styluses. Tablet interpreters ap- preciate styluses that are comfortable, durable, and pair with tablets and applications. The sty- lus should write quickly and quietly, glide smoothly on the tablet, and feel natural. Recharge- able styluses that charge via USB or similar are preferred over styluses with replaceable bat- teries; fine-tipped nibs are preferred over softer, rubbery nibs. Features such as appearance and heft are slightly less important, while buttons offering additional functionalities, a built-in pen clip, and the ability to lodge the stylus inside the tablet are seen as even less important. For the third time, cost was among the least important features.

Table 2. Ranking of note-taking application features. 4.2 User evaluation of note-taking applications for consecutive interpreting

Table 4: User evaluation of note-taking applications for tablet interpreting

Figure 1: Meme posted on a translators’ group on Facebook, mocking the use of Google Translate. Mistranslations can be hilarious. In fact, social media have become ideal outlets to share pictures of clumsy food menus and mislabelled street signs, as well as screenshots of translation errors produced by machine translation (MT) engines such as Google Translate. People share them, laugh at them, and criticise them openly. Professional translators also participate in the debates about such translations. On a regular basis, translators on LinkedIn, Facebook and Twitter engage in discussions about mistranslations and how they show that MT is not comparable to human translation (see Figure 1). Translators use these spaces to voice their frustration with MT and the implications it has on their profession.

(c) Translators engage in a discussion about whether or not MT can be acceptable in specific circumstances. Some of them argue MT can be useful for small business without the resources to pay for a_ professional translation, while others stress the fact that accepting MT as a valid option means lowering the standards of the profession. MT can be useful for small business without

4.3 Automatic Sentiment Analysis To annotate our entire collection of tweets (see Section 4.1), we leveraged Baziotis et al.’s (2017) automatic sentiment classifier.'' Their system scored first in the SemEval 2017 shared task on classifying the overall sentiment of a tweet (task 4, subtask A), i.e., deciding whether it expresses positive, neutral or negative sentiment (see Rosenthal et al., 2017). It uses a deep LSTM network (Hochreiter and Schmidhuber, 1997) with attention (Rocktiaschel et al., 2015) and is completely data-driven: rather than relying on linguistic information and hand-crafted rules, the system learns to classify from large collections of manually annotated tweets. We trained the svstem on the original SemEval data. meaning it is not specifically seared to

Figure 1: Cognitive resource footprint for simultaneous interpreting (Seeber 2007:1385) Seeber applies his Cognitive Load Model to shadowing, sight translation, and simultaneous interpreting, providing a “cognitive resource footprint’ of the three activities (Seeber, 2007:1383).

Figure 2: Conflict matrix for simultaneous interpreting (Seeber 2007:1385)

Figure 3: Cognitive resource footprint for SI performed with a CAI tool or an electronic glossary In simultaneous interpreting, the listening and comprehension task recruits auditory-verbal resources and cognitive-verbal resources at the perceptual and cognitive stage. The production and monitoring task also mobilises auditory-verbal and cognitive-verbal resources at the perceptual and cognitive stage but requires additional vocal-verbal resources in the response stage. What happens when the interpreter looks up terminology in the booth?

The same can be done for terminology management solutions, which we expect to vary as for the degree of visual-spatial resources at the perception stage and manual-spatial resources at the response stage that the interpreter must recruit to successfully use the tool in question. We hypothesise that computer-assisted interpreting tools, like InterpretBank’, which we adopt

Developing a flexible mode! 1n which to record this knowledge as it accrues is a key objective for this research. Early experiment suggests that a model based upon surface Fig. 3 forms encountered in the text will provide the best framework for recording analyses in preference to attempting to populate predetermined categories of items. Each lexeme encountered by the parse window is stored as part of a developing Language Model (LM). Where parses have been attempted these are stored together with the form. As more forms and parses are added common patterns emerge. A morphological pattern may find support from a number of parses and may in turn generate candidate stem lemmata. If close cognates are identified which confirm the relationship implied by the morphological analysis then the model’s confidence in that analysis rises and it may begin to use these parses to drive further analyses as the parse window moves on through the text. It is expected that this aggregation of knowledge within the model will enable local parses to be extended across the model as a whole as patterns emerge which are found to be endemic within the text. It must, however, be recognised that errors, inevitably, creep in as a consequence of the limited processing context. Such errors are a particular concern for a scenario which seeks to exploit limited analyses. If we are to exploit these analyses it is important that we have confidence in them. Amount of Data

Figure 1. Overview of the SCATE user interface. (A) Sentence to translate, (B) edit field, (C) hybrid MT, (D) translation alternatives and (E) TM matches. At all times, the preceding (F) and subsequent sentences (G) remain visible. Vertical bars (I) can visualise active filters (H) such as difficulty, responsibilities and progress.

Figure 2. System of ‘sliders’ for the selection of MT, TM-MT integration or TM. To produce hybrid TM-MT suggestions, the MT system is constrained to use certain word sequences (or pre-translations) extracted from the TM matches. Initially, a four-stage alignment procedure is followed for each triplet of input sentence, TM source sentence and TM target sentence (see Figure 3). Step 1 identifies the overlapping spans between the input sentence and TM source sentence. Step 2 aligns the TM source sentence with the TM targe sentence at the word level using the automatic word alignment and lexical probabilities derived by GIZA++ (Och and Ney, 2003) and Moses. Step 3 identifies consistently aligned spans® for the TM source and target sentence, using the grow-diag-final heuristic (Koehn, 2009), and consistently aligned sub-spans of these spans are identified based on the same criteria. Finally, step 4 couples the consistently aligned spans between TM source and target to overlapping spans in the input sentence. see

ne FS The extracted spans are subsequently filtered (based on criteria such as minimum span length, occurrence of at least one content word, and percentage of aligned words), weighted (taking into account span length, span frequency across TM matches, and fuzzy match score of the strongest match in which the aligned span occurs) and ranked. The best ranked non- overlapping spans are added to the input sentence using XML markup, and these augmented input sentences are sent to Moses.

Translating and the Computer 39 16-17 November 2017 One Birdcage Walk, London Proceedings 2017. Editions Tradulex, Geneva c AsLing, The International Association for Advancement in Language Tehcnology This document is downloadable from www.tradulex.com and www.asling.org ii Acknowledgements AsLing wishes to thank and acknowledge the support of the sponsors of TC39: Gold Sponsor Silver Sponsor Silver Sponsor Exhibitor Media Sponsor iii Preface For the past 39 years the international conference, Translating and the Computer (TC), has been a unique forum for academics, users, developers and vendors of translation technology tools. It is a distinctive event where translators, interpreters, researchers and business people, from translation companies, international organisations, universities and research labs, as well as freelance professionals, come together to exchange ideas and learn about the latest developments in translation technologies. Over the last two decades various translation tools such as Translation Memory programs presented at previous TC conferences, have revolutionised the work of translators. Regrettably, the same cannot be said for the work of interpreters who have yet to benefit from suitable language technology tools and resources which could assist them in their work. Given this situation, at this year’s 39th TC conference we have decided to put an emphasis on the new and emerging language technologies, tools and resources which can support the work of interpreters. The panel ‘New Frontiers in Interpreting Technology’ features leading experts and practitioners in interpreting and the TC39 programme offers several talks presenting tools for interpreters. We firmly believe that the presentations and discussions on this topic will encourage the development of innovative tools which will revolutionise the work of interpreters in the near future, as has already been the case for translators. This year’s conference also features stimulating talks on Translation Technology topics central to TC conferences including but not limited to machine translation, post-editing, CAT tools and terminology. We are confident that you will find that all the presentations and posters, panels and workshops, will provide interesting user perspectives and opportunities, and will lead to inspiring discussions. We trust that these e-proceedings with in-depth coverage of many of the conference contributions, accepted after a competitive reviewing process, will be an important reference resource and stimulus for future work. We are delighted to present our keynote speakers Alex Waibel, a key pioneer of neural translation tools and their use in simultaneous interpretation, and Roberto Navigli, the father of BabelNet, the largest, continuously-updated multilingual encyclopaedic dictionary, doing pioneering work in areas such as multilingual interpretation, mapping of terms and multilingual concept and entity extraction. The work of both is seminal in the development of language processing tools and resources relevant to translation and interpreting technologies. We would like to thank all those who submitted proposals to the conference, all presenters and all the authors who produced full versions of accepted papers for the proceedings. Special thanks go also to all the delegates who have come from so many countries to attend this conference and thus provide a living acknowledgement of this special event. We are also grateful to the members of the Programme Committee who carefully reviewed all the submissions: Anne Aboh-Dauvergne, Juanjo Arevalillo, Wilker Aziz, Sheila Castilho, David Chambers, Eleanor Cornelius, Gloria Corpas Pastor, David Filip, Sarah Griffin-Mason, Camelia Ignat, Joss Moorkens, Bruno Pouliquen, Antonio Toral, Paola Valli, Nelson Verástegui and David Verhofstadt. Many thanks to our publication chair Ivelina Nikolova for producing these e-proceedings. A big thank-you also goes to Joanna Drugan who, together with the conference chairs, played a leading role in the organisation as well as our Technical Advisor Jean-Marie Vande Walle. Last but not least, a big thanks to our sponsors. Conference Chairs João Esteves-Ferreira, Juliet Margaret Macan, Ruslan Mitkov, Olaf-Michael Stefanov London, November 2017 iv Editors of the Proceedings: João Esteves-Ferreira, Tradulex - International Association for Quality Translation Juliet Macan, independent translation technology consultant Ruslan Mitkov, University of Wolverhampton Olaf-Michael Stefanov, JIAMCATT, United Nations (ret.) Programme Committee: Anne Aboh-Dauvergne, United Nations Juan José Arevalillo, Hermes Traducciones Sheila Castilho, Dublin City University Wilker Aziz, University of Amsterdam David Chambers, AsLing Honorary Member Eleanor Cornelius, FIT COuncil Member and University of Johannesburg Gloria Corpas Pastor, University of Malaga David Filip, CNGL / ADAPT Sarah Griffin-Mason, Institute of Translation and Interpreting Camelia Ignat, Joint Research Centre of the European Commission Joss Moorkens, Dublin City University Bruno Pouliquen, World Intellectual Property Organization Antonio Toral, University of Groningen Paola Valli, University of Trieste Nelson Verástegui, International Telecommunications Union (ret.) David Verhofstadt, International Atomic Energy Agency (IAEA) Conference Organising Committee: Coordinator: João Esteves-Ferreira Session Chairs: Joanna Drugan, Juliet Macan, Ruslan Mitkov and Olaf-Michael Stefanov Treasurer: Jean-Marie Vande Walle Publications Chair: Ivelina Nikolova Education Room Coordiantor: Silke Lührmann Hospitality Officer: Helen O’Horan Social Media Officers: María Recort Ruiz and Nelson Verástegui v Table of Contents Towards a Hybrid Intralinguistic Subtitling Tool: Miro Translate Laura Cacheiro Quintas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 VIP: Voice-Text Integrated System for Interpreters Gloria Corpas Pastor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Evaluation of NMT and SMT Systems: A Study on Uses and Perceptions Emmanuelle Esperanca-Rodier, Caroline Rossi, Alexandre Berard and Laurent Besacier . . . . . . . 11 Speech Recognition in the Interpreter Workstation Claudio Fantinuoli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Building a Custom Machine Translation Engine as part of a Postgraduate University Course: a Case Study Michael Farrell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 A Comparative User Evaluation of Tablets and Tools for Consecutive Interpreters Joshua Goldsmith . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 MT and Post-Editing from a Translator’s Perspective Dimitra Kalantzi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Using Online and/or Mobile Virtual Communication Tools in Interpreter and Translator Training: Pedagogical Advantages and Drawbacks Koen Kerremans and Helene Stengers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 When Google Translate is better than Some Human Colleagues, those People are no longer Colleagues Samuel Läubli and David Orrego-Carmona . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 On the Need for New Tools for “Translating Writers” in Industry Claire Lemaire and Christian Boitet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Designing a Multimethod Study on the Use of CAI Tools during Simultaneous Interpreting Bianca Prandi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Learning from Sparse Data - Meeting the Needs Big Data Can’t Reach Jon D Riding and Neil Boulton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Terminology Management Tools for Conference Interpreters – Current Tools and How They Address the Specific Needs of Interpreters Anja Rütten . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 The SCATE Prototype: A Smart Computer-Aided Translation Environment Vincent Vandeghinste, Sven Coppers, Jan Van den Bergh, Tom Vanallemeersch, Bram Bulté, Ayla Rigouts Terryn, Els Lefever, Iulianna van der Lek-Ciudin, Karin Coninx and Frieda Steurs . . . . 104 The Localisation Industry Word Count Standard: GMX-V. Slaying the Word Count Dragon Andrzej Zydroń . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 vii Towards a Hybrid Intralinguistic Subtitling Tool: Miro Translate Laura Cacheiro Quintas Université de Perpignan Via Domitia 52, Avenue Paul Alduy – 66100 Perpignan [email protected] Abstract Making audiovisual educational material accessible for non-native speakers and people who are deaf or hard-of-hearing is an ongoing challenge and the state-of-the-art in this field shows that no current software provides a fully automatic, high-quality solution. This article presents Miro Translate, a hybrid intralinguistic subtitling tool developed in response to this challenge by the Miro Programme at the University of Perpignan Via Domitia. This cloud-based solution integrates the automatic speech recognition (ASR) technology provided by the Microsoft Translator Speech API to generate automatic captions. It also incorporates a set of editing functionalities to provide a readable and legible target output that complies with subtitling conventions. In conclusion, the aim of Miro Translate is to produce a cost- efficient solution that meets the increasing demand for high quality captions in instructional video. 1 Introduction The global eLearning industry is continuously growing and evolving. In 2015, it was estimated to be worth USD 165 billion and it is expected to grow by 5 % between 2016 and 2023 (Docebo, 2014). Similarly, the 2010 Sloan Survey of Online Learning in the United States revealed that online enrolment rose by almost one million students in only one year. Both the academic and non-academic sectors have adopted the use of engaging and interactive tools to create and deliver educational content in a fast, efficient and economical way. Video lectures are an appealing way to communicate complex messages in an attractive manner because they can integrate written text, images and speech through visual and auditory channels (Díaz Cintas, 2014). Earlier research projects studied the impact of educational videos on learning effectiveness (Zhang et al. 2006), showing that interactive videos are a valuable means to improve learners’ effectiveness, engagement and satisfaction in multimedia e- Learning environments. However, making audiovisual educational material accessible for non- native speakers and the deaf or hard-of-hearing is an ongoing challenge. The potential offered by technology to distribute the same audiovisual document with various subtitling tracks in different languages has been identify by eLearning suppliers, both in the private and the public sector (Díaz Cintas, 2004). Compared to other audiovisual translation modalities such as dubbing or voice-over, intra and interlinguistic subtitling is a relatively cheap and rapid solution to distribute instructional videos. This article presents Miro Translate research project, a hybrid intralinguistic subtitling tool developed by the Miro Programme at the University of Perpignan Via Domitia as a cost- effective solution for the subtitling of instructional videos. First, a general overview of the Miro Programme and Miro Translate will be provided. Second, the choice of the Microsoft ASR system will be explained. Finally, a set of editing functionalities will be presented to meet technical and linguistic subtitling standards. 2 Background and related work Many academics have studied subtitling practices from a descriptive and prescriptive approach. The work of Ivarsson and Carrol (1998), Karamitroglou (1998) and Diaz Cintas (2003) tries to establish certain conventions to provide a readable and legible output. Some countries offer 1 Proceedings of the 39th Conference Translating and the Computer, pages 1–6, c London, UK, November 16-17, 2017. 2017 AsLing guidance to help promote subtitling for the deaf and hard of hearing. For example, the BBC Guidelines (Ford, 2009), the French Conseil Supérieur de l’Audiovisuel (2011) or the norms established by the Spanish association AENOR (2012). The first subtitling programmes appeared on the market in the mid-70s. Since then, major advances have been made by commercial organisations and open source projects. Several research projects such as MUSA, eTITLE or SUMMAT studied the integration of CAT tools in subtitling. Similarly, the MLLP research group developed an online platform with an advanced post-editing interface for the transcription and translation of educational videos. Since November 2009, Google offers automatic captioning and subtitling for user-generated videos in YouTube. Using Google’s ASR technology, video owners have the possibility to edit the auto-generated-captions and upload the new version. Major progress continues to be made, particularly at a technical level, with subtitling programmes continually being updated. Despite the evolution in this field, however, no current software provides a fully automatic, high-quality solution. Furthermore, new trends related to cloud subtitling are emerging such as fansubs or crowdsubtitling impacting subtitling practices and traditional workflows. 3 The Miro Translate research project 3.1 The Miro Programme Miro Translate is part of an ongoing eLearning research project conducted by the MIRO.EU- PM Programme1, which is part of the French National Research Institute (ANR) Initiatives for Excellence in Innovative Training (IDEFI). This project runs over seven years, from 2012 to 2019, and consists in the digital transformation of higher education through experimentation. The Miro Programme delivers multilingual online training on the cultural tourism industry via digital platforms such as Moodle, FUN or Miriada X. The current training offer includes an online Master’s degree and MOOCs available in French, Spanish, Catalan and English. Furthermore, a multilingual continuing education programme is currently being developed, with content and tuition adapted to the needs of cultural tourism socio-economic stakeholders. The Miro Programme has a media repository totalling 40 hours run time (four HD resolutions possible between 360p and 1080p) available in mp4 format. Video lectures are generally recorded in a professional studio using modern equipment to ensure maximum quality. Videos are normally 3 to 10 minutes long and include one or two speakers (a lecturer or a professional) that present a given subject sometimes using a slideshow or images to illustrate the main ideas. Intra and interlinguistic subtitling is available on all course content for the deaf and hard of hearing and the languages covered for both transcription and translation are Spanish, French, English and Catalan. Subtitling software was initially used to provide captions and subtitles. However, due to the increasing number of hours of audiovisual content and of source languages, innovative computer-aided captioning strategies were considered. 3.2 The Miro Translate platform Miro Translate is a cloud-based solution that produces machine-generated captions using the Microsoft Translator Speech API, part of the Microsoft Cognitive Services API2. This cloud- based system performs ASR using a deep neural network (DNN) system and TrueText technology to improve speech readability. Miro Translate incorporates a series of functionalities to comply with certain subtitling conventions. 1 https://www.programmemiro.fr/?lang=en 2 https://www.microsoft.com/en-us/translator/speech.aspx 2 Users login to the Miro Translate platform, upload a given video and select the source language. The audio is sent to the Microsoft Translator Speech API to perform automatic speech recognition and generate automatic captions. The output is displayed on the Miro Translate platform, where a set of functionalities are available so users can edit imperfect transcriptions. The figure below shows a simplified diagram of the main components of Miro Translate. COGNITIVE SERVICES SPEECH TRANSLATOR CENT OS 7 VM User profile Node.js project files MySQL Text transcription from wav stream Media tools MYSQL FFMPEG WEB APP DATABASE Input files Output files MP4 Video files WEBVTT SRT TXT file file file Figure 1: Main components of the Miro Translate platform ASR system An experiment was conducted to compare the automatic generated captions provided by the Microsoft ASR system incorporated in the Microsoft Translator Speech API and the Google ASR system. The performance of both systems was assessed using Word Error Rate (WER), a common metric used to measure the difference between automatic and manual transcriptions. A set of four instructional videos from the Miro Programme were used in this experiment. The source language of the selected corpus is French and speakers presented particular speech styles and different accents, including non-native. The acoustic conditions were good with no background noise and the vocabulary included topics related to information technology and cultural tourism. The manual reference transcription of each document was compared to the Google and Microsoft automatic transcription using the WER++ programme developed by Nico Martínez- Santos3 from the MLLP Research Group at the Universitat Politèctica de València. Google ASR system achieved WERs of about 45-70 %, whereas the Microsoft ASR system obtained WERs levels of about 30-56 %. When comparing the WER results for each video, a difference of approximatively 15 percentage points was obtained, with the Microsoft system providing the lowest WERs. Two conclusions were drawn from this experiment. On one hand, the Microsoft ASR system achieved lower WERs than the Google system. On the other, Word Error Rates provided by both systems are above the accepted threshold of 25 % (Munteanu et at. 2008), making human intervention a requirement to enhance the quality of the imperfect ASR-produced transcripts. The Miro Translate platform includes a set of functionalities to facilitate the task of subtitle editing. 3 https://github.com/nsmartinez/WERpp 3 Editing functionalities Auto-generated captions offer a solution to speed up the text input task. The output of current ASR systems needs to be improved to comply with subtitling conventions. As stated previously, these conventions differ within countries and academics. Nevertheless, there is a general consensus around universal criteria and parameters. Their identification helps to define editing functionalities better and offers various options to comply with a given convention. Miro Translate research team analysed the conventions and guidelines mentioned in Section 2 and selected some general subtitling criteria that the system should implement in the form of settings or tools. These have been classified in three sections: • Technical parameters regarding subtitles and space / layout, including number of lines, number of characters per line, position on the screen, typeface, font and background colour, identification of speakers, sound effects and music. • Technical consideration regarding subtitles and duration, including maximum and minimum duration of both single and two-line subtitles, gap between consecutive subtitles, shot changes, synchronisation between speech and subtitle. • Linguistic considerations. Subtitles should be a syntactic unit that form a sense block or grammatical unit and must follow grammar, spelling and punctuation rules, with the exception of situations where incorrect forms have a specific purpose like argot or foreign speakers. Considering these general criteria, Miro Translate will incorporate the following features into its interface as shown in Figure 2: Figure 2: Screenshot of the Miro Translate interface. • A video preview window displays the synchronised subtitles with play / pause options. • Below this window, a timeline with an audio wave displays volume changes and vertical lines indicate shot changes. The spotting process remains a challenge, this tool will help users define the entrance and exit times of subtitles in an easier and more accurate manner. • General settings will help users predefine technical parameters such as values related to the number of characters per second, characters per line, maximum and minimum duration limits or pauses values between consecutive subtitles. Other general settings will include spelling and grammar check, split / joint subtitles, delete / add subtitle, search and replace or undo / redo options. • Each block will include certain features, as shown in Figure 3. o Specific settings for adjustment of technical parameters, error rates will be displayed to assist users. 4 o Edit area so users can manage layout parameters such as line breaks, font style, size and colour, or position on the screen. Figure 3: Screenshot of a subtitle block in Miro Translate • Export option, supported files formats include srt. vtt. and PDF. 4 Conclusion and further research Manually transcribing the fast-growing number of instructional videos produced by the Miro Programme using subtitling programs is expensive and time-consuming. This situation calls for creative solutions that find synergies between technological advances and professional subtitlers. In respond to this challenge, the Miro Programme decided to develop a subtitling tool adapted to eLearning environments. Miro Translate is a cloud-based solution that produces machine-generated captions using the Microsoft Translator Speech API. An experiment was conducted to compare the WER levels achieved by the Google and the Microsoft ASR system. Results showed that the Microsoft ARS system achieved lower WERs by about 15 percentage points. However, WER values were above the accepted threshold of 25 %, making human intervention necessary. The analysis of subtitling conventions and guidelines helped define general technical and linguistic subtitling parameters in order to identify the editing functionalities of the system. This work is part of an ongoing study and Miro Translate is currently under development. Further efficiency and usability tests will be performed with the intention of improving system performance. The final stage of this research project is to incorporate an interlinguistic subtitling option through the customisation of the neural machine translation system proposed by Microsoft Translator Text API. 5 References AENOR. 2012. Norma UNE 153010. Subtitulado para personas sordas y personas con discapacidad auditiva. AENOR, Madrid. http://implantecoclear.org/documentos/accesibilidad/UNE_153010_2012.pdf [last accessed September 02, 2017]. Allen, I. Elaine, Seaman Jeff. 2010. Class differences: Online education in the United States, 2010. Babson Survey Research Group. Conseil Supérieur de l’Audiovisuel. 2011. Charte relative à la qualité du sous-titrage à destination des personnes sourdes ou malentendantes. http://www.csa.fr/en/Espace-juridique/Chartes/Charte-relative-a-la-qualite-du- sous-titrage-a-destination-des-personnes-sourdes-ou-malentendantes-Decembre-2011 [last accessed September 02, 2017]. Del Pozo, Arantza. 2014. SUMAT Final Report. http://www.fp7-sumat-project.eu/uploads/2014/07/D1-5_Final- Report-June-2014.pdf [last accessed September 02, 2017]. Díaz Cintas, Jorge. 2014. Technological Strides in Subtitling. In Chan, S-W, editor in chief, Routledge Encyclopedia of Translation Technology. Routledge, London, pages 632-643. Díaz Cintas, Jorge. 2003. Teoría y práctica de la subtitulación: inglés-español. Barcelona: Ariel. Docebo. 2014. E-Learning Market Trends and Forecast 2017-2021. https://www.docebo.com/resource/elearning-market-trends-and-forecast-2017-2021/ [last accessed September 02, 2017]. Ford Williams, Gareth. 2009. Inline Subtitling Editorial Guidelines V1.1. http://www.bbc.co.uk/guidelines/futuremedia/accessibility/subtitling_guides/online_sub_editorial_guidelines _vs1_1.pdf [last accessed September 02, 2017]. Ivarsson, Jan, and Carroll, Mary. 1998. Code of good subtitling practice. In Language Today, April: 1–3. https://www.esist.org/wp-content/uploads/2016/06/Code-of-Good-Subtitling-Practice.PDF.pdf [last accessed September 02, 2017]. Karamitroglou, Fotios. 1998. A Proposed Set of Subtitling Standards in Europe. In Translation Journal, 2:2. http://translationjournal.net/journal/04stndrd.htm [last accessed September 02, 2017]. Melero, Maite, Oliver, Antoni, Badia, Toni. 2006. Automatic Multilingual Subtitling in the e-TITLE Project. http://www.mt-archive.info/Aslib-2006-Melero.pdf [last accessed September 02, 2017]. Munteanu, Cosmin, Baecker, Ron, and Penn, Gerald. 2008. Collaborative editing for improved usefulness and usability of transcript-enhanced webcasts. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, Italy, pages 373–382. MUSA IST Project. http://sifnos.ilsp.gr/musa/ [last accessed September 02, 2017]. Patiniotaki, Emmanouela. 2013. Assistive Technology and Audiovisual Translation: A key combination for Access Services in Online Education. In A Global Village 11: 52–55. http://aglobalvillage.org/journal/issue11/e-democracy/emmanuela-patiniotaki/ [last accessed September 02, 2017]. Valor Miró, Juan Daniel, Silvestre-Cerdà, Joan Albert, Civera, Jorge, Turró, Carlos, and Juan, Alfons. 2015. Efficiency and usability study of innovative computer-aided transcription strategies for video lecture repositories. In Elsevier’s Speech Communication journal, 74: 65-75. Zhang, Dongsong, Zhou, Lina, Briggs, Robert O., and Nunamaker Jr., Jay F. 2006. Instructional video in e- learning: Assessing the impact of interactive video on learning effectiveness. Information and Management, 43: 15–27. 6 VIP: Voice-Text Integrated System for Interpreters Gloria Corpas Pastor University of Wolverhampton University of Malaga [email protected] Abstract This paper introduces VIP, an R&D project that explores the impact and feasibility of using Human Language Technology (HLT) and Natural Language Processing (NLP) for interpreting training, practice and research. This project aims at filling the gap in and addressing the pressing need for technology in general for interpreters, which is reported to be scarce. Although most interpreters are unaware of interpreting technologies or are reluctant to use them, there are some tools and resources already available, mainly computer-assisted interpreting (CAI) tools. VIP is working on the development of technology and cutting-edge research with the potential to revolutionise the interpreting industry by lowering costs for interpreter training, fostering an online community which shares, generates and cultivates interpreting resources; and providing an efficient interpreter workbench tool (computer- assisted interpreting software). 1 Introduction Interpreting is an extremely strenuous task, since much effort is devoted in terms of decoding, memorising and encoding a message. Interpreters should, as translators and other language professionals do, benefit from the development of technology and, thereby, enjoy considerable improvement of their working conditions. However, currently their work relies by and large on traditional or manual methods, and the technological advances in interpreting have been extremely slow. By way of illustration, in the comprehensive Routledge Encyclopedia of Interpreting Studies, edited by Franz Pöchhacker (2015), technology is almost absent. A similar situation can be found when it comes to research. Fortunately, there is a growing interest in developing tools addressed at interpreters as end users, although the number of these technology tools is still very low and they are not intended to cover all interpreters’ needs. The VIP project intends to develop an interpreting workbench tool which will have the same effect that language technologies for translators have had in the translation industry in recent decades. 2 Tools for interpreters Until recently, interpreters have rarely benefited from language technologies and tools to make their work more efficient (Costa, Corpas Pastor and Durán Muñoz, 2014). Although most interpreters are unaware of interpreting technologies or are reluctant to use them (Corpas Pastor and Fern, 2016), there are some tools and resources already available (Sandrelli, 2015, Fantinuoli, 2018/in press). Several attempts to meet interpreters’ needs have been made in different interpreting contexts and modes by developing different types of language tools, mainly computer-assisted interpreting (CAI) tools. These tools include (i) terminology management CAI tools, i.e. specialised computer software that is used to compile, store, manage and search within glossaries, these are created previously by the user and are used to prepare terminology for an interpretation service, independently of the interpretation mode; (ii) note-taking CAI tools, which support note taking by consecutive interpreters; (iii) speech- 7 Proceedings of the 39th Conference Translating and the Computer, pages 7–10, c London, UK, November 16-17, 2017. 2017 AsLing to-text converters, which automatically transcribe speech into text; (iv) Computer-Assisted Interpreter Training (CAIT) tools; and (iv) other assisted applications. The state-of-the art tools for terminology management have been investigated and their advantages and disadvantages analysed (cf. Costa, Corpas Pastor and Durán Muñoz, 2018/in press). Many of the existing tools are easy to use and have a user-friendly interface, however they can only be used on a certain platform (e.g. Mac OS - Intragloss, Windows - InterpretBank and LookUp, iPad - The Interpreter's Wizard). Most of the tools cannot process documents, but only glossaries (InterpretBank, interplex UE, LookUp, the Interpreter’s Wizard) and do not support integration of meta-information and the generation of glossaries or terminology management needs to be done manually, except for the EU-Bridge Interpreter Support Tool, which includes a term extraction and a named-entity recognition module. They accept a wide range of languages, although most of them permit only bilingual glossaries. Some, like InterpretBank or Intragloss, are well-documented, but this is usually not the case. Import options are included in tools such as InterpretBank, Intragloss, Interplex UE, or LookUp, but they are limited to Word/Excel formats or formats produced by the same tool (interplex UE). Finally, most of them only assist during the preparation phase and it is possible to print/export the generated glossaries for use during the interpretation. The second group concerning note-taking applications is directly addressed at consecutive interpreters and their needs during the interpretation services (Orlando, 2010). Even now consecutive interpreters use pen and paper to takes notes, but they are increasingly turning to mobile devices to take notes or to support their note-taking. As in the previous group, most of them are platform-dependent (e.g. iPad – Inkeness, Android – LectureNotes and PenSupremacy, Android and iOs tablets – My Bic Notes). Two main types can be distinguished in this group: a) those whose main functionalities are to take notes electronically and make sketches and share them by email (e.g. Inkeness, LectureNotes, PenSupremacy, My Bic Notes), and b) those which are capable of recording spoken words and synchronising them with notes that users manually write on special paper (e.g. Sky Wifi Smartpen, Echo Smartpen, Livescribe, Equil JOT). The recording of the notes can be uploaded over Bluetooth, Wireless or USB, and reproduced. Speech-to-text converters are another tool that can be of great use for interpreters. Instead of taking notes, these tools transcribe the speeches into text automatically. In this group we can encounter basic tools, such as Voice Dictation [quanticapps.com] for iOS or Voice Pro [forum.voicepro.it] for Android, which are examples of easy-to-use voice recognition applications, and very specialised systems. The third and fourth groups of CAI tools are very limited nowadays, and most of them are based on voice recording (Audacity, Adobe Audition, AudioNote, Notability, QuickVoice) and on a collection of exercises and complete speeches for interpreting practice, such as InterpretaWeb [www.interpretaweb.es] and Linkinterpreting [linkterpreting.uvigo.es]. Nevertheless, Black Box (Sandrelli, 2005), and its updated version VIE (Virtual Interpreting Environment), are good examples (Sandrelli and Hawkins, 2006). Black Box is a CAIT (computer-assisted interpreting training) tool designed to help trainee interpreters, and professionals, work with materials of different sources (texts, audio, video, exercises) and store their results for later review. Users decide what they want to do: either interpret some audio or video clips or do some interpreting exercises, such as shadowing, cloze exercises or sight translation. It also allows users to edit and break down video and audio recordings to create different exercises and adapt authentic conference materials to different levels of expertise. The updated version, VIE intends to develop “a fully-immersive virtual conference centre, along the lines of simulators available in the computer games industry.” (Sandrelli and Hawkins, 2006). According to the authors, it can be used on-line (live sessions) and off-line (recorded teaching materials). 8 In the last group we can find other assisted applications that contribute to the interpreters’ work including text-to-speech converters, such as Speak it!, Web Reader HD, Voice Dream Reader, Voxdox and Talk – Text to Voice, Verbose Text to Speech, Text 2 Speech, eSpeak and TextSpeech Pro, which allow users to listen to words, texts, e-mail, etc. in several languages and formats, and practice and check pronunciation. Other systems that are useful for interpreters are unit converters, which convert units (such as temperature, distance, currency, acceleration, finance, speed, weight/mass, amongst other topics) from one system to another. Illustrative examples of these tools are ConvertUnits [www.convertunits.com] and OnlineConversion [www.onlineconversion.com]. Other computer-assisted applications that can be considered a type of CAI tool are corpora and corpus management tools. By using a compiled corpus as information source, the interpreter can access the phraseological and lexical information used in the documents, as well as the meaning and context of new terminology (for further information, see the edited volumes by Straniero and Falbo, 2012; and Corpas Pastor and Seghiri, 2016). Many of the existing tools offer high accuracy and precision, but they have only been trained for very specific user cases and a very limited number of languages: only 2-3 combinations (e.g. Asura, Sync/Trans, Vermobil, EUTrans, IBM Mastor). Most recent systems are capable of processing a higher number of language combinations (e.g. VoiceTra 4U, Jibbigo, Google Translate App, SpeechTrans), but their performance and accuracy is much lower and they can only process short sentences, in some cases only monodirectionally. Some of these tools had a military use (Phrasealator P2, BOLT) but others are intended for general dialogues and are mainly addressed at travel-related conversations, such as VoiceTra 4U. 3 Methodology and objectives VIP’s ultimate goal is to develop an interpreting workbench tool. There is a multitude of possible interpreting scenarios, and therefore, any technological tools developed for interpreters should account for this. Interpreting service requires quick response, and development of language technology for interpreters has not been able to address the efficiency of translation. While most of the current tools assist interpreters during the preparation phase, particularly managing terminology and creating glossaries, the tools are not used during the interpretation service. To the best of our knowledge, a system that integrates a suite of tools to assist interpretation has not yet been developed. The VIP project aims at filling this existing gap and providing an integrated platform to assist interpreters both during the preparation phase and during the service. To that end, VIP will not only focus on consecutive and bi-lateral interpreting, particularly teleinterpreting, which are extended types of interpreting, but it will also include some functionalities in order to support simultaneous interpreting. VIP goals do not intend to replace human interpreters, since they are an essential part of multilingual communication, especially in clarifying ambiguities, avoiding inaccurate pronunciation and enhancing the results to guarantee successful understanding. Nevertheless, this project intends to automate their tasks as much as possible, improve their working conditions, and speed up their response. Both professional interpreters and trainees have been targeted. The initial hypothesis of VIP is that it is possible to improve the existing tools and develop next generation technologies to address the needs of interpreters. Three main objectives are pursued: (1) Identify the real needs of interpreters by questionnaires and explore how and to what extent their work can be automated; (2) Develop the first cross-platform integrated system to enhance the productivity of the work of interpreters, based on cutting-edge technology and innovative computer-assisted solutions; (3) Develop a virtual learning 9 environment for interpreter trainees and professional interpreters, based on actual needs and HLT/NLP novel technologies. The VIP platform will be composed of 3 modules. Module 1 is intended to be used during the preparation phase (documentation). It will contain several components (terminology extractor, named-entity recognition, corpus compilation, corpus management tool, cross- lingual survey summarization tool, glossaries, dictionary management tool). Module 2 is designed to be used during the interpretation phase. The outcome of this second module will be a self-contained prototype with several components (automatic note-taking, machine translation, glossary query). Module 3 is envisaged as a training tool (prototype 2) that makes use of materials developed in Modules 1-2. It will also contain vocabulary exercises, memory quizzes and other self-training resources. In order to increase the impact of the proposed prototypes (and submodules) on interpreters’ workflow and efficiency, extensive feasibility studies will be carried out. In addition, performance and speed assessment will be performed (intrinsic and extrinsic evaluation). User-centred evaluations, involving professional interpreters, will serve to measure the impact of the proposed tools on interpreters during interpreting assignments and to train new interpreters, as well as the combined performance of both prototypes. Despite interpreters’ reluctance to use language technologies in their profession, it is clear that CAI tools represent an important advance in the field of interpretation, thus, in the multilingual communication context. VIP is just a pioneer new development along those lines. Acknowledgements The research presented in this study has been carried out in the framework of research projects VIP (317471-FP7-PEOPLE-2012-ITN) and (partially) INTERPRETA 2.0 (PIE2017- 015). References Corpas Pastor, G. and Fern, L. 2016. "A survey of interpreters’ needs and practices related to language technology". Technical paper [FFI2012-38881-MINECO/TI-DT-2016-1]. University of Malaga. Corpas Pastor, G. and Seghiri, M. (eds.) 2016. Corpus-based Approaches to Translation and Interpreting: From Theory to Applications. Frankfurt: Peter Lang. Costa, H.; Corpas Pastor, G. and Durán Muñoz, I. 2014. "Technology-assisted Interpreting". vol. 25, nº 3, MultiLingual 143. April/May. 27-32. Costa, H.; Corpas Pastor, G. and Durán Muñoz, I. (2018/in press). Assessing Terminology Management Systems for Interpreters". In G. Corpas Pastor and I. Durán Muñoz (eds.) Trends in e-tools and resources for translators and interpreters. Leiden: Brill. 57-84. Fantinuoli, C. 2018/in press. “Computer-assisted interpreting: Challenges and Future Perspectives”. In G. Corpas Pastor and I. Durán Muñoz (eds.) Trends in e-tools and resources for translators and interpreters. Leiden: Brill. 153-174. Orlando, M. 2010. “Digital pen technology and consecutive interpreting: another dimension in note-taking training and assessment”. The Interpreters' Newsletter 15. 71-86. Pöchhacker, F. (ed.). 2015. Routledge Encyclopedia of Interpreting Studies. London: Routledge. Sandrelli, A. 2005. “Designing CAIT (Computer-Assisted Interpreter Training) Tools: Black Box.” MuTra 2005 –Challenges of Multidimensional Translation: Conference Proceedings. Saarbrücken, 2-6 May 2005. Sandrelli, A. and Hawkins, J. 2006. “From Black Box to the Virtual Interpreting Environment (VIE): another step in the development of Computer Assisted Interpreter Training.” The Future of Conference Interpreting: Training, Technology and Research. University of Westminster, 30 June -1 July 2006. London. Sandrelli, A. 2015. “Becoming an interpreter: the role of computer technology”. MonTI. Monografías de Traducción e Interpretación 2. 111-138. Straniero, S. and Falbo, C. (eds.) 2012. Breaking Ground in Corpus-based Interpreting Studies. Frankfurt: Peter Lang. 10 Evaluation of NMT and SMT Systems: A Study on Uses and Perceptions Emmanuelle Esperança-Rodier Caroline Rossi Univ. Grenoble Alpes, CNRS, Univ. Grenoble Alpes, ILCEA4, Grenoble INP*, LIG, 38000 Grenoble, 38000 Grenoble, France France Caroline.Rossi@univ- Emmanuelle.Esperanca- grenoble-alpes.fr Rodier@univ-grenoble- alpes.fr Laurent Besacier Alexandre Bérard † Univ. Grenoble Alpes, CNRS, Univ Lille, CNRS, Centrale Lille, Grenoble INP*, LIG, 38000 Grenoble, UMR 9189 CRIStAL – Lille France France [email protected] Laurent.Besacier@univ- lille1.fr grenoble-alpes.fr Abstract Statistical and neural approaches have permitted fast improvement in the quality of machine translation, but we are yet to discover how those technologies can best “serve translators and end users of translations” (Kenny, 2017). To address human issues in machine translation, we propose an interdisciplinary approach linking Translation Studies, Natural Language Processing and Philosophy of Cognition. Our collaborative project is a first step in connecting sound knowledge of Machine Translation (MT) systems to a reflection on their implications for the translator. It focuses on the most recent Statistical MT (SMT) and Neural MT (NMT) systems, and their impact on the translator's activity. BTEC-corpus machine translations, from in-house SMT and NMT systems, are subjected to a comparative quantitative analysis, based on BLEU, TER (Translation Edit Rate) and Meteor. Then, we qualitatively analyse translation errors from linguistic criteria (Vilar, 2006) using LIG tools, to determine for each MT system, which syntactic patterns imply translation errors and which error type is mainly made. We then assess translators’ interactions with the main error types in a short evaluation task, completed by participants in the Master's degree in Multilingual Specialized Translation of Grenoble Alps University. 1 Introduction In a context where statistical and neural approaches have allowed an extremely rapid improvement in the quality of machine translation (MT), we propose an interdisciplinary approach linking Translation Studies, Natural Language Processing (NLP) and Philosophy of Cognitive science, which has three objectives: • Identify the uses and perceptions of Statistical/Neuronal MT (SMT/NMT) systems in professional translators and trainee translators; * Institute of Engineering Univ. Grenoble Alpes † also at LIG, Univ. Grenoble Alpes 11 Proceedings of the 39th Conference Translating and the Computer, pages 11–24, London, UK, November 16-17, 2017. 2017c AsLing • Compare these uses and perceptions with the architecture, functioning and effective potentialities of the systems; • Put these comparisons into perspective with the conceptions of human action and the conceptions of cognition underlying SMT/NMT. Access to the site is guaranteed because of the involvement of one of the project's members in the Master's degree in Multilingual Specialized Translation at Grenoble Alps University (UGA). Current research on MT is for the most part carried out in a single disciplinary field, that of Natural Language Processing (NLP). However, some aspects are also covered in Translation Studies, in particular the cognitive ergonomics of post-editing, (inter alia, O’Brien, 2012, Martikainen and Kübler, 2016). Research that articulates good knowledge of the functioning of MT systems and a reflection on their implications for the translator is still very rare. The efforts of P. Koehn (2013 and 2016) or A. Way (2010), to facilitate understanding of current developments and encourage interactions between linguists and computer scientists are remarkable in this respect but they remain exceptional, just like the book by Ehrensberger- Dow et al. (2015), which brings together ten multidisciplinary contributions to advance our understanding of translation processes. Furthermore, to our knowledge, no attempt has been made to anchor these interactions between Translation Studies and NLP in a broader epistemological reflection on the conceptions of language, cognition and action underlying the empirical turn of MT. Our collaborative project is a first step in filling these gaps. We are interested in the most recent MT systems, based on statistical and then neural models, and the impact of these systems on the translator's activity. The project combines three disciplines. The role of Translation Studies is to identify the uses and perceptions of SMT/NMT in professional translators and trainee translators (i.e. students of the Master's degree in Multilingual Specialized Translation at UGA). The role of Natural Language Processing is to provide thorough knowledge of the internal functioning of the MT systems which will be compared with the representations and uses of the translators. The object of this comparison is to know whether the translators have a vision of the systems that is faithful to their internal functioning and to examine the relation between this vision and their capacity to exploit the potentials of the systems and to know their limits. The role of Philosophy of Cognitive science is to include these questions in broader conceptions. First, we seek to put into perspective the representations of translators and the functioning of systems with the conceptions of human cognition underlying SMT/NMT. Secondly, it will be necessary to articulate the question of uses with a more general conception of human action and its relation to mental states. This broadening of perspective to a more general reflection on human cognition and action is all the more necessary as the deep learning algorithms implemented in recent MT systems (Bahdanau et al., 2014, Cho et al, 2014, Jean et al., 2014) have emerged as a promising conceptual tool for modelling some aspects of linguistic cognition (Dupoux, 2016; Becerra- Bonache & Jimenez Lopez 2016). The present paper is a case study which puts into perspective the differences between SMT and NMT, combining NLP metrics and error coding with surveys to document perceptions. We have used in-house SMT and NMT systems, to translate documents from the Basic Travel Expression Corpus (BTEC) from French to English. The SMT and NMT used as well as our corpus are described in the second section of this article. The data collected will be the subject of a comparative quantitative analysis, based on BLEU, Meteor and its empowered version from the LIG (Servan & al, 2016), and TER (Translation Error Rate), and the most often corrected errors will then be analysed more deeply, using LIG tools to perform the analysis of translation errors according to linguistic 12 criteria such as those proposed by Vilar (2006) to determine a set of implemented strategies. The results of those comparisons and analyses are given in the first part of the article’s third section. The second part of the third section is dedicated to our analysis of perceptions of MT in trainee translators (Master’s students). We distinguish two stages in the analysis of perceptions. The first deals with students’ overall perceptions of MT, based on questionnaires that were answered before and after a 12-hour MT class, as well as on focus group data. Second, we seek to assess students’ perception of the differences between an SMT and an NMT system. Metrics are used to convey an objective evaluation of the systems before we discuss students’ assessment, in the last subpart. 2 Tools and Corpus To achieve this study, we needed to perform a detailed comparison of an SMT to an NMT system. While SMT systems are yet quite well known, NMT models are not so obvious to seize, even if we can find a lot of available tools to construct one's own. This is why we are going to describe in greater detail the NMT system we have developed. Our NMT model is an attention-based encoder-decoder neural network (Sutskever et al., 2014; Bahdanau et al., 2015). LIG implementation, described in Bérard et al. (2016) is based on the seq2seq model implemented by TensorFlow (Abadi et al., 2015). It reuses some of its components, while adding a number of features, like a bidirectional encoder (Bahdanau et al., 2015), a beam-search decoder, a convolutional attention model and a hierarchical encoder (Chorowski et al., 2015). The NMT model uses a decoder with 2 layers of 256 LSTM units, with word embeddings of size 256. Encoder is a 2-layer bidirectional LSTMs, with 256 units. We use a standard attention model. For training, we use the Adam algorithm with an initial learning rate of 0.001 (Kingma and Ba, 2014), and a mini-batch size of 64. We apply dropout (with a rate of 0.5) during training on the connections between LSTM layers in the encoder and decoder (Zaremba et al., 2014). Turning now to the SMT baseline we use, it is a phrase-based model using Moses Toolkit (Koehn et al., 2007), trained on BTEC train, that represents a 201k words for French, and a 189k words for English), without any monolingual data added, and tuned on BTEC dev of 12.2k words for French, and 11,5k for English. As a corpus, we have chosen to work on the BTEC (Basic Travel Expression Corpus) which, as described in the BTEC Task of the IWSLT 2010 evaluation campaign1, "[…] is a multilingual speech corpus containing tourism-related sentences similar to those that are usually found in phrasebooks for tourists going abroad". We thought that as the BTEC contains short sentences (10 words/sentence on average), it would be easier and quicker for our students to work on it. We have worked on the translations of BTEC Test 1, which represents 3,9k words for French and 3,6k for English, from our SMT system and our NMT one. We have first proceeded to a trivial empirical evaluation of the output quality of both MT systems based on fluency and adequacy. From the source text, we have given a score to the corresponding output translation, i.e. 1 when the translation was bad (not fluent and/or not adequate), 2 when the translation was average which means that it was adequate and/or fluent, and finally 3, when the translation was good (fluent and adequate). Table 1 below shows an extract of this first manual human evaluation. 1 http://iwslt2010.fbk.eu/ 13 Source Text SMT output Evaluation NMT output Evaluation au secours ! help something like 1 help . 2 pouvez-vous nettoyer ma can you clean my room ? 2 could you clean 3 chamber ? … my room ? … … Table 1: First Evaluation Once this first evaluation was done, among the BTEC Test 1 corpus, we have selected a total of 50 source sentences along with their corresponding translation using the SMT system and their corresponding translation issued from the NMT system, thus building the so-called BTEC-50 to be evaluated by students in the Master's degree in Multilingual Specialized Translation of Grenoble Alps University. The selection has been conducted as follows. We have selected the sentences according to the first evaluation results. When the SMT output and the NMT output received contrasted scores, that is to say 1 vs. 3, the source sentence and the SMT and NMT outputs were added to the BTEC-50. Also some of the source sentences for which the system outputs received less contrasted scores were added to the score in order to see which average or bad scores were better accepted by students according to the systems. Finally some sentences for which the SMT and NMT outputs received both a good score, i.e. 3, were added. An overview of the selection done for creating BTEC-50 appears in Table 2. Source Text SMT output Eval. NMT output Eval. BTEC-50 au secours ! help something like 1 help. 2 Yes pouvez-vous nettoyer ma can you clean my 2 could you clean 3 Yes chambre ? room? my room? … … … c'est trop brillant it is too brilliant 1 it is too flashy 3 Yes pouvez-vous me conseiller une can you recommend a 3 can you 1 Yes bonne boite de nuit ? good night club? recommend a good night for me? … … … I am nauseus I am nauseus j'ai la nausée … 3 … 3 No … where is the service where is the lost où se trouve le service des charge and found? 2 and found? 2 No objets trouvés ? … … lie down over here and please lie down … déboutonnez your shirt 2 here and your shirt 2 Yes allongez-vous ici et déboutonner votre chemise. … … … Table 2: Selection of sentences for BTEC-50 - Examples Having completed the BTEC-50, we created an Excel sheet, (reproduced in Appendix A), to be given to the students in the Master's degree in Multilingual Specialized Translation in order for them to rank the translated output from 1 very bad to 4 very good. We have decided not to show from which MT system the output were coming, so that the participants could approach the evaluation without prejudice. Nevertheless, we did present the two systems side 14 by side for comparison, so that preferences may appear: the outputs found in the column labelled "EN translation 1" from the Excel sheet come from the SMT system defined previously, whereas the outputs found in the column labelled "EN translation 2" come from the above-mentioned NMT system. 3 Experiment 3.1 Linguistic error analysis As we said previously, we have performed a first evaluation of the overall quality of SMT and NMT systems. In table 3, we show the results obtained for BTEC-50. NMT SMT 1 2 3 1 2 3 (bad) (average) (good) (bad) (average) (good) 16 17 17 14 19 17 Table 3: First evaluation If we look at table 3, we cannot find a real distinction between the results leading us to conclude that both systems give equivalent results. Looking at scores more in depth, and focusing on the common results, we found out that the NMT and SMT systems obtained 6 times a bad score (1) on the same source sentences, while they got 7 times an average score (2) on the same source sentences and 6 times a good score (3). For any scores given, 1, 2 or 3, when the NMT output obtains the same score as the SMT output, it can be because they provide two outputs that have the same mistakes, see example 1. Example 1 French source: de rien SMT translation: * 'anything' NMT translation: * 'anything' It also can be that the two outputs provide the same correct translation, as in example 2. Example 2 French source: je vais prendre la même chose, s'il vous plaît. SMT translation: ' i'll have the same, please . ' NMT translation: ' i'll have the same, please. ' Or, the two outputs can be two distinct correct translations, as in example 3. Example 3 French source: est-ce que je dois réserver ? SMT translation: 'shall I book? ' NMT translation: 'do I have to make a reservation?' But it can also be two outputs that are different and not corresponding to the source, as shown in example 4 below. 15 Example 4 French source: avez-vous de la sauce de salade au bleu ? SMT translation: * 'do you have sauce of salad in blue?' NMT translation: * 'do you have any chicken salad?' Having completed the first manual human evaluation, we proceeded to linguistic error analysis, using the error type from the Vilar's (2006) typology. Table 4 below shows the error types, and sub-error types encountered for each system, as well as the number of times that they occur. NMT SMT Missing Words/Content Words 13 7 Word Order/Word Level/Local Range 0 3 Word Order/Word Level/Long Range 0 1 Incorrect Words/Sense/Wrong Lexical Choice 17 13 Incorrect Words/Incorrect Forms 6 15 Incorrect Words/Extra Words 11 0 Incorrect Words/Style 1 2 Incorrect Words/Idiom 4 5 Unknown words/Unknown Stem 0 12 Punctuation 1 0 Table 4: linguistic error analysis Again, we cannot find a huge discrepancy between the results of the linguistic error analysis, concluding again that both MT systems are equivalent. Nevertheless we could spot four error types, out of the ten errors encountered, for which there is a significant difference. A first error type for which we find a difference is the Missing Words with sub-type Content words. This error type is used to label the non translation of a word that appears in the source sentence. The sub-type indicates that the missing word is a word without which the translation cannot be understood. That is to say that the translation of the meaning of a content word, as opposed to filler word, from the source sentence, does not appear in the target sentence. The Content Words error sub-type happens 13 times for the NMT system and only occurs 7 times for the SMT system. Example 5 shows one of those occurrences. Example 5 French source: c' est le contrat d' achat de mes chèques de voyage. SMT translation: *' it's the purchase agreement of my checks. ' NMT translation: *' it's the seniority wage system.' The analysis of this error sub-type can be dealt at the same time as the Unknown Word error type, and especially the sub-type Unknown Stem. This sub-type is used to tag when a source occurrence is not translated and is put as it stands in the translation. The NMT system occurrences of such an error never happen while for the SMT system it occurs 12 times. It can be easily explained by the fact that SMT systems are more likely to reproduce as a translation a word from the source sentence when the system does not recognize the stem as shown in Example 6. At the same time, the core functioning of NMT systems entails a bias among NMT system toward hallucinated translations as there is no linguistic link between “jeux 16 videos” and its translation provided by the NMT system "in fashion". Such things cannot happen with SMT system as it only considers the source. Example 6 French source: les adolescents japonais aiment les jeux vidéos . SMT translation: *' the adolescents japanese love electronic vidéos . ' NMT translation: *' japanese teenagers are interested in fashion .' A second sub-type Incorrect Forms that falls into the Incorrect Words error type. This error type is used to tag mistranslations. The NMT system provided 6 occurrences of this type of error while the SMT system gave 15 occurrences of this type of error. One of the errors, as shown in example 7, is due to the tense use when asking questions. This gap could be explained by the core functioning of the NMT system which is better at lexical diversity. Example 7 French source: pouvez-vous nettoyer ma chambre ? SMT translation: *' can you clean my room? ' NMT translation: ' could you clean my room?' The last error sub-type is also part of the Incorrect Word error type, labelled Extra Words. This error sub-type is used when a word appears in the translation while it does not exist This time, it is the NMT system occurrences of this error that are more numerous, eleven errors, than the ones from the SMT system which do not ever happen! This also can be explained by the core functioning of NMT systems, which use a beam search to enlarge the space of translations in which the system can find more appropriate solutions. Sometimes when the best solution cannot be find, the NMT system goes on and produces a wrong translation of a word from the source or a kind of stuttering of the last word translated, as shown respectively in examples 8 and 9 below. Example 8 French source: avez-vous un menu ? SMT translation: ' do you have a menu? ' NMT translation: *' do you have a fixed menu?' Example 9 French source: je voudrais manger de la vraie nourriture indienne SMT translation: *' I'd like to have true food indienne' NMT translation: *' I'd like to eat some food food' 3.2 Assessing students' perceptions During the course of the Master's degree in Multilingual Specialized Translation, the students were trained on SDL Trados Studio, but they did not integrate MT to their computer-aided translation environment. It is known that students with less experience in working with MT systems are the ones who have the most sceptical perceptions of such systems (see e.g. Koskinen and Ruokonen, 2017: 18). Students from this Master's degree had little experience in working with MT systems. Fourteen out of nineteen had already used an MT system, but when it came to using MT in a professional environment, only one had had this experience in the course of an internship. The task-based assessment consisted of two timed tasks. The first one consisted in manually correcting MT outputs from two different systems (Google’s NMT versus MT@EC, 17 the MOSES-based SMT engine provided by the European Commission). As for the second task, the whole group had to alternate tasks of translation and post-editing: this was done using a simple word processor and tracking changes. After each task, the students were asked to give their feelings, and what they wrote was collected as a small corpus for perception analysis (Rossi, submitted). It was clear from the corpus that although students figured out that the MT system helped them and speeded them up; this realisation did not significantly impact their primary perceptions. In order to better assess these perceptions, a series of two 20-question surveys were used to get a contrastive assessment of students' perceptions before and after the course. From those surveys, negative perceptions and fears of MT appeared to have been slightly reinforced by the course, and a positive correlation was evidenced suggesting that fear accounted for lower self- efficacy scores (Rossi, ibid). We concluded that the students’ fears needed to be addressed in order to make sure they received proper training with MT and were well- equipped to deal with contemporary translation environments. However, the perception of loss of control and authorship voiced by students is likely to increase with the current improvement of MT systems. If indeed NMT brings about unprecedented change in the quality of MT outputs, it remains to be seen how students will react. In order to gain insight on the impact of such differences, we started by measuring the differences in our NMT versus SMT corpora, using three distinct evaluation metrics, before asking students to produce broad, comparative judgments on the quality of the translated sentences. 3.3 Evaluation Metrics We have evaluated the two systems (Bérard et al., 2016) using BLEU, TER and Meteor 1.4 metrics which results are shown in Table 5 hereafter. Corpus NMT SMT BLEU TER Meteor 1.4 BLEU TER Meteor 1.4 Dev 51.56 30.75 40.58 54.35 28.66 43.40 Test1 47.07 33.16 39.73 49.44 32.20 42.07 Table 5: BLEU/TER/Meteor 1.4 mono-reference scores Results concerning Test1 corpus show that the NMT system gives similar results to the SMT system as regards to the three metrics, which is quite promising as we now know that NMT systems need time to get better. It also confirms the results from the Linguistic error analysis on a smaller set, i.e. BTEC-50. 3.4 Evaluation results from the participants in Master's degree in Multilingual Specialized Translation At the end of our study, as we have mentioned earlier, the participants from the Master's degree in Multilingual Specialized Translation were sent an Excel file in which they had to evaluate from 1 (very bad) to 4 (very good) the SMT output and the NMT output for the BTEC-50, without knowing which output was provided by which system, thus making sure we were not introducing a bias. However, the students did know they were dealing with MT outputs, and this might have had an impact on their choice of scores. A first set of 16 answers gave us the following results. 18 From the scores given by each participant for each sentence, we have computed a mean per sentence as well as the related standard deviation. Then we have calculated the mean of all the scores per sentences, per participants. Results are shown in table 6. NMT SMT Mean Standard Mean Standard deviation deviation BTEC-50 2.310 0.1633 2.166 0.1625 Only most contrasted 2.893 0.7656 1.836 0.1904 Table 6: Participants’ evaluation from 1 very bad to 4 very good Participants have equally judged both system with a mean of 2.166 for the SMT system and one of 2,310 for the NMT system. This means that the participants have evaluated both systems as bad, which is equivalent to a score of 2. This confirms the assumption as well as the results of the perception assessment presented in section 3.2 that students have a negative or low perception of MT systems. Nevertheless, when focusing only on the most contrasted translations, the NMT system is evaluated as almost good, thus increasing its mean, while there is a slight decrease for the SMT mean. On the whole, the experiment returns negative perceptions, regardless of the type of MT, statistical or neuronal, even if the NMT system slightly outpaces the SMT system. Furthermore, if we look at the standard deviation obtained for each MT system, we can notice than there is almost the same agreement between participants for the SMT outputs (standard deviation of 0.1625) as for the NMT outputs (standard deviation of 0.1633). Once again, the different evaluation and assessment tend to prove that the NMT and SMT systems are equivalent. If we now put together the above evaluation results with the linguistic error analysis described in section 3.1, we obtain table 7 below, in which we concatenated the main error types found during the linguistic evaluation along with the evaluation by participants. Once again, as regards the evaluation from the students, the NMT and SMT systems we have worked on seem to be equivalent. We can notice only for two examples, i.e. example 5 and example 8 a difference going from very bad (1) to bad (2) and from bad to good. From example 5, we can deduce that the hallucinated translation issued from the NMT system was less appreciated by the participants than the missing translation in the SMT output, which still makes sense. Looking at example 8, it is the other way round; it seems that the students were more indulgent with the NMT system than with the SMT system. 4 Conclusion and Perspectives Even if the study has to be performed on a larger dataset, we can already see that all the experiments have proved that our two systems were equivalent, with only a slight advantage for the NMT system. Nevertheless, we have to take into account the fact that our in-house NMT system at the time of the experiment was at its very beginning and that we now should try with its improved version to see if the promising results we have found here are confirmed or even more conclusive. Finally, it is worth noting that the human evaluation seems to correlate with the metrics obtained. Performing similar tests on richer data would enable us to see whether there is more to this result than a mere coincidence. 19 NMT SMT Score Standard Score Standard SOURCE Translation Translation mean deviation mean deviation Example 1 - De rien * 'anything' 1 0 * 'anything' 1 0 Example 2 - je vais 'I'll have the 3.647 0.5398 'I'll have the 3.75 0.4062 prendre la même chose, same, please. ' same, please. ' s'il vous plaît. Example 3 : ' do I have to 3.533 0.4977 : ' shall I book? 3.133 0.577 make a ' French source: est-ce reservation?' que je dois réserver ? Example 4 * 'do you have 1.47 0.5536 * 'do you have 1.375 0.4687 any chicken sauce of salad French source: avez- salad?' in blue?' vous de la sauce de salade au bleu ? Example 5 *'it's the seniority 1.3529 0.4962 *'it's the 2.5625 0.4922 wage system.' purchase French source: c'est le agreement of contrat d'achat de mes my checks. ' chèques de voyage. Example 6 *'japanese 1.187 0.3046 *'the 1.6875 0.5156 teenagers are adolescents French source: les interested in japanese love adolescents japonais fashion.' electronic aiment les jeux vidéos . vidéos. ' Example 7 'could you clean 3.5294 0.4982 *' can you 2.93 0.4680 my room ?' clean my French source: pouvez- room? ' vous nettoyer ma chambre ? Example 8 *' do you have a 2.5882 0.6228 'do you have a 3.375 0.5468 fixed menu?' menu? ' French source: avez- vous un menu ? Example 9 *' I'd like to eat 1.4117 0.5328 *' I'd like to 1.5 0.5625 some food food' have true food French source: je indienne' voudrais manger de la vraie nourriture indienne Table 7: Student evaluation from 1 very bad to 4 very good Acknowledgements We would like to thank the "Pôle Grenoble Cognition" for the grant we have received to conduct our interdisciplinary project, and the students who agreed to participate in this research. References Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, Ł., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and Zheng, X. (2015). TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software available from tensorflow.org. 20 Becerra-Bonache, L. & Jiménez López, M.D (2016). Could Machine Learning Shed Light on Natural Language Complexity? Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity, pages 1–11, Osaka, Japan, December 11-17. Bahdanau, D., Cho, K., and Bengio, Y. (2015). Neural Machine Translation by Jointly Learning to Align and Translate. In ICLR 2015, pages 3104–3112, San Diego, California, USA. Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural Machine Translation by Jointly Learning to Align and Translate. CoRR, abs/1409.0473. Retrieved from http://arxiv.org/abs/1409.0473 Bérard, A., Pietquin, O., Besacier, L., Servan, C. Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text Translation. NIPS Workshop on end-to-end learning for speech and audio processing, Dec 2016, Barcelona, Spain. 2016. 〈hal-01408086〉 Cho, K., Merrienboer, B. van, Gülçehre, Ç., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. CoRR, abs/1406.1078. Retrieved from http://arxiv.org/abs/1406.1078 Chorowski, J. K., Bahdanau, D., Serdyuk, D., Cho, K., and Bengio, Y. (2015). Attention-Based Models for Speech Recognition. In Advances in Neural Information Processing Systems (NIPS 2015), pages 577–585, Montréal, Canada. Dupoux, E. (2016). Cognitive Science in the era of Artificial Intelligence: A roadmap for reverse-engineering the infant language-learner. CoRR, abs/1607.08723. Retrieved from http://arxiv.org/abs/1607.08723 Ehrensberger-Dow, M., Göpferich, S., & O’Brien, S. (2015). Interdisciplinarity in Translation and Interpreting Process Research. John Benjamins Publishing Company. Jean, S., Cho, K., Memisevic, R., & Bengio, Y. (2014). On Using Very Large Target Vocabulary for Neural Machine Translation. CoRR, abs/1412.2007. Retrieved from http://arxiv.org/abs/1412.2007 Kingma, D. and Ba, J. (2014). Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980. Koskinen, Kaisa et Minna Ruokonen, Love letters or hate mail? Translators’ technology acceptance in the light of their emotional narratives. In D. Kenny (Ed.), Human issues in translation technology, Londres et New York, Routledge, 2017, p. 8–24. Knowles, R., & Koehn, P. (2016). Neural interactive translation prediction. AMTA 2016, Vol., 107. Philipp Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin,A., and Herbst, E., 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, pages 177– 180, Prague, Czech Republic, June. Association for Computational Linguistics. Koehn, P. (2017). Introduction to Neural Machine Translation, webinar du 24 janvier 2017, Webinar series by Omniscien Technologies. Koehn, P. et al. (2013). Cognitive Analysis and Statistical Methods for Advanced Computer Aided Translation (CASMACAT), Final Public Report. http://www.casmacat.eu/uploads/Deliverables/final-public-report.pdf [consulté le 26 janvier 2017]. Kublička, F., Toral, A., Sanchez-Cartagena, V. (2017) Fine-grained human evaluation of neural versus phrase- based machine translation. The Prague Bulletin of Mathematical Linguistics. Available from: https://www.researchgate.net/publication/317304955_Fine- Grained_Human_Evaluation_of_Neural_Versus_Phrase-Based_Machine_Translation Martikainen, H., & Kübler, N. (2016). Ergonomie cognitive de la post-édition de traduction automatique : enjeux pour la qualité des traductions. ILCEA. Revue de l’Institut des langues et cultures d’Europe, Amérique, Afrique, Asie et Australie, (27). Consulté à l’adresse https://ilcea.revues.org/3863 O’Brien, S. (2012) Translation as human–computer interaction. Translation Spaces, Vol. 1(1), pages 101-122. Rossi, C. (submitted) ‘Introducing statistical machine translation in translator training: from uses and perceptions to course design, and back again’ Tradumatica. Sutskever, I., Vinyals, O., and Le, Q. V. (2014). Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems (NIPS 2014), pages 3104–3112, Montréal, Canada. Vilar, D. Xu, J., D’Haro L. F., et al., 2006. Error analysis of statistical machine translation output. In : Proceedings of LREC. 2006. p. 697-702. Zaremba, W., Sutskever, I., and Vinyals, O. (2014). Recurrent neural network regularization. arXiv preprint arXiv:1409.2329. 21 Appendix A: Experiment for Participant in the Master's degree in Multilingual Specialized Translation # FR EN translation 1 Your EN translation 2 Your Any comments evaluation evaluation you may wish 1-very bad 1-very bad to add (vous 2-bad 2-bad pouvez les 3-good 3-good écrire en 4-very good 4-very good français) 1 au secours ! help something like help . 2 pouvez-vous nettoyer ma chambre can you clean my room ? could you clean my room ? ? 3 c' est le contrat d' achat de mes it 's the purchase agreement of my it 's the seniority wage system . chèques de voyage. checks . 4 pouvez-vous conduire plus can you take more slowly , please ? can you speak more slowly , please ? lentement , s' il vous plaît ? 5 je voudrais parler à monsieur i 'd like to sir smith . i 'd like to speak to jane . smith. 6 avez-vous un menu ? do you have a menu ? do you have a fixed menu ? 7 les adolescents japonais aiment les the adolescents japanese love japanese teenagers are interested in jeux vidéos . electronic vidéos . fashion . 8 combien de temps devrons-nous how long should we wait ? how long should we wait ? attendre ? 9 aurons-nous du temps libre will we spare time for the trip ? do we have time time during the trip pendant le voyage ? ? 10 une petite voiture à deux portes , s' a compact car at two doors , please . a small car car , please . il vous plaît . 11 qui est en train de parler , s' il vous who is talking , please ? who 's speaking with you , please ? plaît ? 12 c' est trop brillant . it 's too brillant . this is too flashy . 13 pouvez-vous me recommander une can you recommend a good night can you recommend a good night for bonne boîte de nuit ? club ? me ? 14 je voudrais manger de la vraie i 'd like to have true food indienne . i 'd like to eat some food food . nourriture indienne . 22 15 pouvez-vous garder mes sacs ici can you keep my bags here until five can you hold my bags here until five jusqu' à cinq heure ? hour ? time ? 16 avez-vous des petites voitures ? do you have any small cars ? do you have any cars cars ? 17 pardonnez-moi , puis-je passer ? excuse me , can i make ? excuse me , may i get through ? 18 quels sont les liens que vous what are the liens you entretenez what are the short-term effects with entretenez avec la personne à qui with the person you leave tour ? you visit in the visit ? vous rendez visite ? 19 veuillez démarrer le compteur , s' il please start the meter , please . please start the meter . vous plaît . 20 c' est trop petit pour moi . it 's too small for me . it 's too little for me . 21 je vais prendre la même chose , s' i 'll have the same , please . i 'll have the same , please . il vous plaît . 22 ça prend à peu près deux minutes it takes about two minutes by train . it takes about about two minutes by en train . train . 23 voulez-vous que je vous aide ? would you help me ? would you like me to help you ? 24 la population totale du japon est d' the total population of japan is about the total population of japan is about environ cent trente millions d' one hundred thirty million . one hundred thirty million . habitants . 25 pourriez-vous arriver plus tôt ? will arrive earlier ? could you make it soon ? 26 est-ce que les toilettes sont libres ? are the toilets available ? are the toilets occupied ? 27 merci pour votre aide . c' est pour thank you for your help . this is for thank you for your help . it 's for you vous . you . . 28 vous l' avez bien en main . you have it right in . you have appendicitis . 29 ça alors , un appareil photo gee , a camera japanese much is it then , a japanese camera is less japonais coûte moins cher ici qu' cheaper here what to japan . expensive in japan . au japon . 30 est-ce que je dois réserver ? shall i book ? do i have to make a reservation ? 31 et quel est l' objet de votre visite ? and what 's the item of your visit ? what is the purpose of your visit ? 32 allongez-vous ici et déboutonnez lie down over here and déboutonnez please lie down here and your shirt . votre chemise . your shirt . 33 ce sont des silencieux . these are any silencieux . these are food coupons . 34 de rien . anything . anything . 23 35 voulez-vous un billet ouvert ou would you like a ticket open or do do you want to have a ticket or white avez-vous des dates fixes ? you have any dates fixes ? ? 36 c' est pour mon ami . il a trente ans it 's for my friend . he has thirty . it 's for my friend . he 's thirty years . old . 37 j' aimerais aller à l' église . i 'd like to go to the church . i 'd like to go to the party . 38 le vol qf vingt et un pour tokyo , s' the flight qf twenty-first for tokyo , flight number two one to tokyo , il vous plaît . please . please . 39 avez-vous de la sauce de salade au do you have sauce of salad in blue ? do you have any chicken salad ? bleu ? 40 je fais un régime . i 'm go on a diet . i 'm a diet . 41 le vendeur responsable est absent . the vendeur manager 's not in . can the man is out right now . could you pouvez-vous patienter un instant ? you wait a moment ? wait for a moment ? 42 voudriez-vous tenter l' oden ? would you like tenter the oden ? would you like to check out ? 43 je m' intéresse à l' histoire des i 'm interested in history the united i 'm interested in history . etats-unis . states . 44 combien coûte un aller simple pour how much is a one-way to new york how much is a one-way ticket to new york ? ? new york ? 45 merci infiniment pour tout . thank you so much for everything . thank you very much for everything . 46 il est remarquable . it 's remarquable . he 's unconscious . 47 pourrais-je avoir un café ou un thé may i have coffee or tea ? may i have some coffee or tea ? ? 48 le tour de taille doit être raccourci the tower of size must be short cut the waist needs taking in by three de trois centimètres by three centimeters . centimeters . 49 je viens juste de mettre de l' argent i just put money in this vending i just just put money in this machine dans ce distributeur mais rien n' en machine but nothing in is out . , but nothing came out . est sorti . 50 je m' appelle ueda . j' ai fait une i am apelle ueda . i made a my name is ueda . i made a réservation . reservation . reservation . 24 Speech Recognition in the Interpreter Workstation Claudio Fantinuoli Johannes Gutenberg Universität Mainz/Germersheim [email protected] Abstract In recent years, computer-assisted interpreting (CAI) programs have been used by professional interpreters to prepare for assignments, to organize terminological data, and to share event-related information with colleagues. One of the key features of such tools is the ability to support users in accessing terminology during simultaneous interpretation. The main drawback is that the database is queried manually, adding an additional cognitive effort to the interpreting process. This disadvantage could be addressed by automating the querying system through the use of Automatic Speech Recognition (ASR), as recent advances in Artificial Intelligence have considerably increased the quality of this technology. In order to be successfully integrated in an interpreter workstation, however, both ASR and CAI tools must fulfil a series of specific requirements. For example, ASR must be truly speaker-independent, have a short reaction time, and be accurate in the recognition of specialized vocabulary. On the other hand, CAI tools face some challenges regarding current implementations, and need to support the handling of morphological variants and to offer new ways to present the extracted data. In this paper we define and analyse a framework for ASR-CAI integration, present a prototype and discuss prospective developments. 1 Introduction In recent years, computer-assisted interpreting (CAI) programs have been used by professional interpreters to prepare for assignments, to organize terminological data, and to share event-related information among colleagues (Corpas Pastor and May Fern, 2016; Fantinuoli, 2016, 2017a). One of the main features of such tools is the ability to support users in accessing multilingual terminology during simultaneous interpretation (SI). With state-of-the-art CAI tools, interpreters need to manually input a term, or part of one, in order to query the database and retrieve useful information. This manual lookup mechanism is considered the primary drawback of this approach, as it appears time-consuming and distracting to search for terminological data while interpreters are performing an activity that requires concentration and rapid information processing. Although initial empirical studies on the use of CAI tools seem to support the idea that interpreters in the booth may have the time and the cognitive ability to manually look up specialized terms (Prandi, 2015; Biagini, 2016), an automated querying system would undoubtedly represent a step forward in reducing the additional cognitive effort needed to perform this human-machine interaction. With this in mind, it is reasonable to assume that a CAI tool equipped with an automatic lookup system may have the potential to improve the interpreters’ performance during the simultaneous interpretation of specialized texts. Automatic speech recognition (ASR) has been proposed as a form of technology to automate the querying system of CAI tools (Hansen-Schirra, 2012; Fantinuoli, 2016) . In the past, the difficulty of building ASR systems accurate enough to be useful outside of a carefully controlled environment hindered its deployment in the interpreting setting. However, recent advances in Artificial Intelligence, especially since the dissemination of deep learning and neural networks, have considerably increased the quality of ASR (Yu and Deng, 2015). With systems that achieve a 5.5 percent word error rate1 , the deployment of ASR in the context of 1 https://www.ibm.com/blogs/watson/2017/03/reaching-new-records-in-speech-recognition [last access 28 25 Proceedings of the 39th Conference Translating and the Computer, pages 25–34, London, UK, November 16-17, 2017. 2017c AsLing interpretation appears conceivable nowadays. Some scholars regard ASR as a technology “with considerable potential for changing the way interpreting is practiced” (Pöchhacker, 2016, p. 188). For example, it has the potential to dramatically change the way consecutive interpreting is usually performed (through note-taking with pen and paper) and may outcome alternative technology-based methods recently proposed, such as the digital pen (Orlando, 2014). With ASR, the consecutive interpreter may use the transcription of the spoken word to sight-translate the speech segment, with obvious advantages in terms of precision and completeness. In the context of simultaneous interpreting, ASR can be used not only to query the interpreter’s glossary, as pointed out above, but also to implement innovative features that aim at facilitating the processing of typical “problem triggers” in interpretation, such as numbers, acronyms and proper names. In order to be successfully integrated in an interpreter workstation, however, both ASR and CAI tools must fulfill a series of specific requirements. For example, ASR must be truly speaker-independent, have a short reaction time, and be accurate in the recognition of specialized vocabulary. On the other hand, CAI tools need to overcome some challenges of current implementations. For instance, they must be able to handle morphological variants and offer ergonomic ways to present extracted information. This paper is organized as follows: Section 2 describes computer-assisted interpreting tools and the unique features and limitations of their use in the booth as a terminology lookup system; Section 3 gives an overview of the potential shortcomings of ASR systems that may arise from their integration into an interpreter workstation and summarizes the requirements that ASR systems and CAI tools need to meet for successful integration; Sections 4 and 5 present a prototype of ASR-CAI integration and the results of an empirical test on the ability of the tool to identify relevant information from three English specialized speeches; finally, Section 6 summarises the topics introduced in this paper and presents some future perspectives. 2 Computer-assisted interpreting tools Technology is growing as an important aspect of the interpreting profession. There is general consensus that some of the recent advances in information and communication technology have facilitated some aspects of the profession (Tripepi Winteringham, 2010; Fantinuoli, 2016, 2017a). Suffice it to say how easy it is today to find domain-related texts on a large variety of subjects and languages and to consult the plethora of terminological resources available on the Web. Advance preparation is considered one of the most important activities to ensure quality, especially in the interpretation of highly specialized domains (Kalina, 2005; Gile, 2009), and the use of correct and precise terminology can facilitate communication and increase the perceived professionalism of interpreters (Xu, 2015). Hence, it is not surprising that the introduction of technological advances is favoured by the interpreting community (Fantinuoli, 2017b) considering the evident improvement of preparation and assignment management. Among the different kinds of technology used by interpreters, computer-assisted interpreting tools have emerged as the most distinctive development in recent years. CAI tools are computer programs designed to support interpreters during different phases of an assignment, from the preparation stage to accessing information in the booth. In the last decade, various CAI tools have been designed and used by practitioners with the goal of rationalizing and optimizing some steps of the interpreting workflow2 . CAI tools generally focus on the lexical and terminological aspect of the profession. They aim at supporting the user in acquiring and managing linguistic information, creating multilingual glossaries, and accessing them during the preparation stage. September 2017] 2 For a classification of CAI tools, see Fantinuoli (2017b); for a tentative evaluation of available terminology solutions for interpreters, see Will (2015). 26 This is true particularly when interpreters learn and memorize event-related terminology or when they follow up on the completed terminology work. CAI tools have also been proposed as a means to access target language equivalents (specialized terminology) in the booth whenever interpreters are not able to retrieve them from their long-term memory, and alternative strategies, such as the use of paraphrasing, hypernyms, etc. are not possible or, if used, would lead to a loss of quality or compromise the complete and accurate rendition of the original. While working in the booth, however, the idea of being supported by a computer program has been perceived by practitioners with mixed feelings. Some seem to be enthusiastic and appreciate the possibility of accessing subject-related translations in real time, while others are reluctant and consider it unnatural (cf. Tripepi Winteringham, 2010; Berber-Irabien, 2010; Corpas Pastor and May Fern, 2016). Although first empirical experiments suggest an improvement of terminological rendition in highly specialized conferences if a CAI tool is used (cf. Biagini, 2016; Prandi, 2015), there are objective constraints in interpretation that make the use of such tools in the booth less straightforward than during preparation or follow-up work. In the case of simultaneous interpretation, such constrains are primarily related to the time pressure and the cognitive load involved in this activity. Since interpreters often work on the edge of saturation (Gile, 2009), as many concurring activities are taking place at the same time, including listening, comprehension, translation, text production and monitoring, the use of a tool for terminology search adds further cognitive load to an already precarious balance. For this reason, the interpreter controlling even a carefully designed lookup solution (i.e. inputting a term, searching for the most adequate result, etc.) may experience a cognitive overload with following deterioration of the quality of interpretation. There is no doubt that the limitations of state-of-the-art term search mechanisms adopted by CAI tools can benefit from recent advances in artificial intelligence. One of the most promising developments has been indicated in the integration of automatic speech recognition. Automating the lookup mechanism by means of ASR can not only reduce the additional cognitive effort needed to perform human-machine interaction for terminology lookup, but the integration of ASR can also allow the implementation of other innovative features, such as automatic transcription of numbers, abbreviations, acronyms, and proper names. Since these linguistic forms are generally considered to be potential problems for interpreters because of heavy processing costs on cognitive resources3 – with severe errors and disfluencies as a consequence (Gile, 2009, cf.) – being prompted with a transcription of this information may alleviate the work load during simultaneous interpretation. In light of the preceding considerations, it is reasonable to suggest that the integration of ASR and algorithms to identify specialized terms as well as numbers, proper names and abbreviations in a transcribed speech would contribute to further increase the usability of CAI tools, leading to an improvement in the terminological rendition and in the overall performance of interpreters during the simultaneous interpretation of specialized texts. A CAI tool with ASR integration could act like an electronic boothmate, providing useful information to the colleague whenever necessary. Since the cooperation between boothmates (writing down numbers, terms, etc.) is generally seen as positive among interpreters (Setton and Dawrant, 2016) and – when silent and discrete – not considered a source of distraction, this development may lead to an increase in the acceptance of CAI tools in the booth. 3 According to the “effort model”, names and numbers tend to increase the effort of the interpreter and may lead to cognitive saturation. 27 3 Speech Recognition and CAI integration Speech recognition or automatic speech recognition (ASR) is the process of converting human speech signals to a sequence of words by means of a computer program (Jurafsky and Martin, 2009). ASR has been around for more than three decades and has been used in many areas, such as human-machine interface or for dictation purposes, but only recently has there been a renewed interest for this technology. There are several reasons for this. On the one hand, new computational approaches, especially Neural Networks and Deep Learning, have significantly improved the quality of ASR systems. On the other, the commercial interest for ASR nowadays is on its verge, with global players such as Microsoft, Amazon and Apple investing significant funding and research in improving their commercial products Cortana, Alexa and Siri, just to name a few. Such improvements are expected to continue in the years to come. Yet, ASR is far from perfect. Language is a complex system and language comprehension consists of more than simply listening and decoding sounds. Humans use acoustic signals together with background information, such as information about the speaker, world knowledge, subject knowledge, as well as grammatical structures, redundancies in speech, etc. to predict and complete what has been said. All of these features are difficult to model in a computer program. As a consequence, the problems that ASR systems have been pressed to solve are many. In connection with the integration in an interpreter’s workstation, the following issues for ASR can be identified: • Use of spoken language - Speakers may use a variety of styles (e.g. careful vs. casual speech). In formal contexts, such as conference venues, political meetings, speakers use spontaneous language, read aloud prepared texts, or use a mixture of both. The correct transcription of casual speech represents a big challenge for ASR. Especially in spontaneous speech, humans make performance errors while speaking, i.e. disfluences such as hesitations, repetitions, changes of subject in the middle of an utterance, mispronunciations, etc. The presence of such elements of spoken language poses a serious problem for ASR and generally leads to poor system performance. • Speaker variability - Speakers have different voices due to their unique physical features and personality. Characteristics like rendering, speaking style, and speaker gender influence the speech signal and consequently require great adaptation capabilities by the ASR. Regional and social dialects are problematic for speaker-independent ASR systems. They represent an important aspect in the interpreting setting both for widely spoken and less spoken languages considering the variability of pronunciation is vast. Furthermore, in the context of English as lingua franca ASR should be able to cope with both native and foreign accents as well as mispronunciations. • Ambiguity - Natural language has an inherent ambiguity, i.e. it is not easy to decide which of a set of words is actually intended. Typical examples are homophones, such as “cite” vs. “sight” vs. “site” or word boundary ambiguity, such as “nitrate” vs. “night-rate”. • Continuous speech - One of the main problems of ASR is the recognition of word boundaries. Besides the problem of word boundary ambiguity, speech has no natural pauses between words, as pauses mainly appear on a syntactic level. This may compromise the quality of a database querying mechanism, as this relies on the correct identification of word units. • Background noise - A speech is typically uttered in an environment with the presence of other sounds, such as a video projector humming or other human speakers in the 28 background. This is unwanted information in the speech signal and needs to be identified and filtered out. In the context of simultaneous interpretation, the restrictive standards for the audio signal in the booth4 offer the best setting for good quality transcription. In other settings, however, such as face-to-face meetings, noise is expected to pose a problem for the quality of the ASR output. • Speed of speech Speeches can be uttered at different paces, from slow to very high. This represents a problem both for human interpreters, as they need sufficient time to correctly process the information, and for ASR. One reason is that speakers may articulate words poorly when speaking fast. • Body language - Human speakers do not only communicate with speech, but also with non verbal signals, such as posture, hand gestures, and facial expressions. This information is completely absent with standard ASR system and could only be taken into consideration by more complex, multimodal systems. However, for the integration of ASR in CAI tools, this shortcoming does not seem to play an important role, as the ultimate goal is to trigger a database search for terminology units, and not to semantically “complete” the oral message uttered by the speaker. There are different applications for speech recognition depending on the constraints that need to be addressed, i.e. the type of utterances that can be recognized. ASR solutions are typically divided into systems that recognize isolated words, where single words are preceded and followed by a pause (e.g. to command digital devices in Human-Machine interface), and systems that recognize continuous speech, where utterances are pronounced naturally and the tool has to recognize word boundaries. These two basic classes can be further divided, on the basis of vocabulary size, spontaneity of speech, etc. Integration of CAI with ASR is a special case of human-computer interaction and automatic transcription of speech. The whole talk needs to be transcribed for the CAI tool to select pertinent chunks of text to start the database query algorithm and to identify entities, such as numerals and proper names. To be used with a CAI tool, an ASR system needs to satisfy the following criteria at minimum: • be speaker-independent • be able to manage continuous speech • support large-vocabulary recognition • support vocabulary customisation for the recognition of specialized terms • have high performance accuracy, i.e. a low word error rate (WER) • be high speed, i.e have a low real-time factor (RTF)5 ASR systems can be both stand-alone applications installed on the interpreter’s computer, such as Dragon Naturally Speaking6 or cloud services, such as the Bing Speech API7 . For privacy reasons, it seems more advisable to prefer stand-alone ASR systems for integration into CAI tools, as they do not require the user to send (confidential) audio signals to an external service provider. 4 See for example the norm ISO 20109, Simultaneous interpreting — Equipment — Requirements. 5 RFT is the metric that measures the speed of an automatic speech recognition system. 6 https://www.nuance.com/dragon.html 7 https://azure.microsoft.com/en-us/services/cognitive-services/speech/ 29 As for CAI tools, in order to successfully support the integration of a ASR system, the tool needs to satisfy the following requirements: • high precision, precision being the fraction of relevant instances among the retrieved instances • high recall, recall being the fraction of relevant instances that have been retrieved over the total amount of relevant instances present in the speech • if a priority has to be set, precision has priority over recall, in order to avoid producing results that are not useful and may distract the interpreter • deal with morphological variations between transcription and database entries without increasing the number of results • have a simple and distraction-free graphical user interface to present the results In the next sections the implemented prototype will be briefly presented together with the results of an experimental test designed to test the quality of the CAI implementation. 4 Prototype The prototype described in this study was designed and implemented within the framework of InterpretBank8 , a CAI tool developed to create assignment-related glossaries accessible in a booth-friendly way (Fantinuoli, 2016). The tool reads the transcription provided by an ASR system and automatically provides the interpreter with the following set of information: • entries from the terminology database • numerals The tool has been designed with an open interface between the CAI tool and the ASR system of choice, provided the ASR system meets the features described in the previous section. The specially designed open structure allows users to choose the ASR engine with the best quality output for the source language, domain, and operative system without having to change or adapt the CAI interface. Since the tool is based mostly on language-independent algorithms, for example to deal with morphological variants (database query) and to identify numbers and acronyms, the prototype supports the integration of ASR for any input language. The acoustic input signal required by the system is the same that interpreters receive in their headset. Since most standard booth consoles have more than one audio output for headphones9 , one of Speaker these can be connected to the audio input of the computer equipped with the ASR-CAI tool. If a second audio output is not available, ASR: a headphone splitter can be used to provide an audio signal both to transcription the interpreter’s headphones and to the computer audio card. The working procedure can be divided into two main phases: the CAI: information tool first reads the provided transcription and pre-processes the text. extraction It then queries the terminological database and identifies the entities from the text flow, visualizing the results in an interpreter-tailored Interpreter graphical user interface. The algorithms are triggered any time a 8 www.interpretbank.com 9 Like the Sennheiser or the BOSCH interpreter console Figure 1: Workflow 30 new piece of text is automatically provided by the ASR. For this reason, the tool needs the transcription to be provided by the ASR system in chunks of text. The chunks are the final version of the portion of text that has been transcribed (in contrast to a “live” temporary version that will be changed by the ASR system during the further elaboration of the acoustic signal). Whenever a chunk of text is provided by the ASR system, the text is normalized. A series of rules have been written to take into account the different ways SR systems may enrich the transcribed text, for example by adding capitalization and punctuation, or converting numerals. The provided normalized text is tokenized. The tokenization aims at distinguishing sections of a string of characters and producing a list of words (tokens) contained in the transcribed text. The tokens are used to identify numerals and as a query string to match entries in the terminological database. The approach used is based on n-gram matching. From the tokenized input, n-grams are created and matched against one-word and multi-word units previously saved in the database. The algorithm needs to match also n-grams that may appear in different forms between the glossary and the transcription (for example plurals: fuel vs. fuels). In order to do so, the prototype implements a fuzzy match approach which should produce good results with most European languages. This approach is based on string metrics for measuring the difference between two sequences. The tool uses the Levenshtein distance (the distance between two words being the minimum number of single-character edits, such as insertions, deletions or substitutions, required to exchange one word with another) and computes a percentage of variation. N-grams are matched if they differ less than a given percentage. In order to reduce the number of potential results retrieved by the tool (which is a prerequisite for its usability), a series of heuristics are applied that aim at identifying the most probable term given the various results. There are limitations with this approach, for example in its use with agglutinative languages. In the future, to extend the querying function to such languages, other language-dependent matching approaches to term recognition should be analysed, such as stemming or inflection analysis (Porter, 2001). In order to take into account the specific constraints of interpreting, not only does the tool need to achieve a high precision and recall, but it also needs to minimize the visual impact of the extracted data. For this reason, the interface is kept as clean as possible. Information is divided into three sections, one for the transcription, one for terminology, and one for numerals. The visualisation appears in chronological order. Among other things, the user is able to set background color, font size and color and to influence some extra parameters, such as the possibility to suppress the repetition of the same terms, etc. Terminology and entity data are visualized essentially in real time. The time span between the moment an utterance has been said and the data visualization depends on the speed of the ASR system and its latency. 5 Evaluation The overall quality of a CAI system with an integrated ASR engine depends on two factors: the quality of the transcription provided by the ASR system (low word error rate) and the ability of the CAI tool to retrieve and identify useful information. For the purpose of this paper, the integrated system10 has been empirically evaluated by measuring the precision and recall scores for the identification of terminology and numerals. The test has been conducted using three speeches in English which are rich in terminological units. The speeches are the same used by Prandi (2017) in a pilot study designed to empirically test the use of CAI tools during simultaneous interpretation. All three texts are on the subject of renewable energy. The bilingual glossary used to test the terminology retrieval quality 10 CAI: InterpretBank 4; ASR: Dragon Naturally Speaking 13. 31 comprises 421 entries and has the size of a typical glossary compiled by interpreters. For this experiment, the terminological units under investigation are defined as the one- and multi-word terms that are present both in the speeches and in the glossary. The system will be tested on this set of terms (119) as well as on the numerals (11) contained in the texts. The latency of the ASR system was not object of testing. Table 1 reports the metrics of the texts. Tokens Terms Numerals Text 1 1533 39 7 Text 2 1513 40 2 Text 3 1512 40 2 Total 4558 119 11 Table 1: Text metrics Ideally, the system should reach a high recall. This would mean that it is able to recognize all terminological units of the transcribed speech, independent of the presence of orthographical differences. It should also have high precision, i.e. present a low number of undesired or erroneous results. This ensures that interpreters are not prompted with superfluous results which may cause distraction. Table 2 summarizes the ASR performance on the set of stimuli defined above. This result is obtained after importing the list of English specialized terms contained in the glossary. With a word error rate (WER) of 5.04% on the terminology list, the ASR system performs well in recognizing the terminological units. It is worth mentioning that importing the list of specialized words from the glossary contributed to decrease the WER from the initial value of 10.92%. The transcription of numerals was completed without errors. Terms Numerals Text 1 38 (of 39) 7 (of 7) Text 2 39 (of 40) 2 (of 2) Text 3 36 (of 40) 2 (of 2) Total 113 (of 119) 11 (of 11) Table 2: Correctly transcribed terms and numerals Table 3 summarizes the results of the terminology retrieving algorithms on the transcription delivered by the ASR engine. The system was able to retrieve and visualize 112 terminological units out of the 119 contained in the texts, which corresponds to 94.11%, while the number of terms erroneously retrieved was 3. With an F1 score11 of 0.97, the overall quality of the identified terminology seems to be satisfying. Among the missing terms, there are complex plural forms (nucleus vs. nuclei) and quasi-synonyms (“coal-fired plants” and “coal-fired power plants”). Among the erroneously retrieved terms there are phrases such as save energy that was matched against the terminological unit wave energy. It is worth noting that the fuzzy searching algorithm implemented in the CAI tool was able to “correct” terms wrongly transcribed by the ASR system, such as malting which was transcribed as moulding, and was able to identify and visualize the correct term. 11 F1 score considers the precision p and the recall r of the test to compute the score, being p the number of correct positive results divided by the number of all positive results and r the number of correct positive results divided by the number of positive results that should have been returned. An F1 score reaches its best value at 1 and worst at 0. 32 Visualized Recognized Missing Errors Text 1 38 37 (of 39) 2 (of 39) 1 Text 2 40 39 (of 40) 1 (of 40) 1 Text 3 37 36 (of 40) 4 (of 40) 1 Total 115 112 (of 119) 7 (of 119) 3 Table 3: Performance of CAI terminology retrieval The identification of numerals does not represent a problem for the ASR system and the CAI retrieving algorithm. The system reaches an F1 score of 1, meaning no number has been left out and no wrong numbers have been retrieved and presented to the user. 6 Conclusions In this paper, we have proposed the integration of automatic speech recognition in computer-assisted interpreting tools as a means to improve their lookup mechanism. A prototype of ASR-CAI integration has been presented and its output tested in terms of precision and recall of terminology retrieval and numbers identification. Although available ASR engines are still not perfect and fail under certain circumstances (non native accents, unknown words, etc.), they already reach high precision values in standard conditions, even within specialized domains. The ASR-CAI integration tested in our experimental setting reaches an F1 value of 0.97 for terminology and 1 for numerals. This value is quite promising and seems to suggest that the use of this technology is – at least in “standard” interpreting settings – already possible. In the future, with the expected increase of ASR quality, the proposed technology may be good enough to be also used in more difficult settings, with mispronunciations, background noise, etc. The proposed technology has the potential to change the way interpreting will be performed in the future. However, further investigation would be necessary to evaluate its impact on the interpreting process and product. For example, it has to reveal whether the interpreter may experience a visual (and cognitive) overload when working with ASR-CAI tools or if their use may lead to the expected quality increase in the interpretation of specialized texts. References Berber-Irabien, Diana-Cristina. 2010. Information and Communication Technologies in Conference Interpreting. Lambert Academic Publishing. Biagini, Giulio. 2016. Printed glossary and electronic glossary in simultaneous interpretation: a comparative study. Master’s thesis, Università degli studi di Trieste. Corpas Pastor, Gloria and Lily May Fern. 2016. A survey of interpreters’ needs and their practices related to language technology. Technical report, Universidad de Málaga. Fantinuoli, Claudio. 2016. InterpretBank. Redefining computer-assisted interpreting tools. In Proceedings of the Translating and the Computer 38 Conference. Editions Tradulex, London, pages 42–52. Fantinuoli, Claudio. 2017a. Computer-assisted interpreting: challenges and future perspectives. In Isabel Durán Muñoz and Gloria Corpas Pastor, editors, Trends in e-tools and resources for translators and interpreters, Brill, Leiden. Fantinuoli, Claudio. 2017b. Computer-assisted preparation in conference interpreting. Translation & Interpreting 9(2). Gile, Daniel. 2009. Basic Concepts and Models for Interpreter and Translator Training: Revised edition. John Benjamins Publishing Company, Amsterdam, 2nd edition. Hansen-Schirra, Silvia. 2012. Nutzbarkeit von Sprachtechnologien für die Translation. trans-kom 5(2):211–226. 33 Jurafsky, Dan and James H. Martin. 2009. Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall series in artificial intelligence. Pearson Prentice Hall, Upper Saddle River, N.J, 2nd edition. Kalina, Sylvia. 2005. Quality Assurance for Interpreting Processes. Meta: Journal des traducteurs 50(2):768. Orlando, Marc. 2014. A study on the amenability of digital pen technology in a hybrid mode of interpreting: Consec-simul with notes. Translation & Interpreting 6(2). Prandi, Bianca. 2015. The use of CAI tools in interpreters’ training: a pilot study. In Proceedings of the 37 conference Translating and the Computer. Editions Tradulex, London, pages 48–57. Prandi, Bianca. 2017. Designing a multimethod study on the use of cai tools during simultaneous interpreting. In Translating and the Computer 39. London. Pöchhacker, Franz. 2016. Introducing Interpreting Studies. Routledge, 2nd edition. Setton, Robin and Andrew Dawrant. 2016. Conference interpreting: a complete course. Number volume 120 in Benjamins translation library (BTL). John Benjamins Publishing Company, Amsterdam & Philadelphia. Tripepi Winteringham, Sarah. 2010. The usefulness of ICTs in interpreting practice. The Interpreters’ Newsletter 15:87–99. Will, Martin. 2015. Zur Eignung simultanfähiger Terminologiesysteme für das Konferenzdolmetschen. trans-kom 8(1):179–201. Xu, Ran. 2015. Terminology Preparation for Simultaneous Interpreters. PhD thesis, University of Leeds. Yu, Dong and Li Deng. 2015. Automatic speech recognition: a deep learning approach. Springer, London. 34 Building a Custom Machine Translation Engine as part of a Postgraduate University Course: a Case Study Michael Farrell IULM University Milan, Italy Abstract In 2015, I was asked to design a postgraduate course on machine translation (MT) and post-editing. Following a preliminary theoretical part, the module concentrated on the building and practical use of custom machine translation (CMT) engines. This was a particularly ambitious proposition since it was not certain that students with undergraduate degrees in languages, translation and interpreting, without particular knowledge of computer science or computational linguistics, would succeed in assembling the necessary corpora and building a CMT engine. This paper looks at how the task was successfully achieved using KantanMT to build the CMT engines and Wordfast Anywhere to convert and align the training data. The course was clearly a success since all students were able to train a working CMT engine and assess its output. The majority agreed their raw CMT engine output was better than Google Translate’s for the kinds of text it was trained for, and better than the raw output (pre-translation) from a translation memory tool. There was some initial scepticism among the students regarding the effective usefulness of MT, but the mood clearly changed at the end of the course with virtually all students agreeing that post-edited MT has a legitimate role to play. 1 Introduction After teaching an undergraduate course on Computer Tools for Translators and Interpreters for six years at the International University of Languages and Media (IULM), Milan, Italy, I was asked to design a postgraduate course module specifically aimed at teaching the use of machine translation and post-editing as part of a Master’s Degree in Specialist Translation and Conference Interpreting1. The course module began with a brief summary of the history of machine translation from its early stages, full of optimism, to the slow-down in the 1960s (ALPAC report), and on to today's more realistic and pragmatic application. It then went on to a simplified discussion of the theoretical aspects of rule-based and statistical machine translation systems, and a brief outline of neural machine translation. It also laid out the concept and goals of post-editing, illustrated the benefits of pre-editing and controlled language authoring, and explained some machine translation quality assessment techniques. In addition there were practical exercises on pre-editing, controlled language authoring and post-editing. Once this preliminary part was out of the way, after the first semester, the course moved on to the practical use of custom machine translation (CMT) engines. This was a particularly ambitious and challenging proposition since it was not at all certain that a group of students with undergraduate degrees in languages, translation and interpreting, without particular knowledge of computer science or computational linguistics, would succeed in putting together the necessary corpora and building a CMT engine. Another aim was to keep the cost to the university and to the students as low as possible. 1 Machine Translation and Post Editing, Course Module Syllabus, International University of Languages and Media (IULM), Milan, Italy: http://bit.ly/2wxitJZ 35 Proceedings of the 39th Conference Translating and the Computer, pages 35–39, London, UK, November 16-17, 2017. 2017c AsLing 2 Methods After comparing various commercial programs and platforms for the building of custom machine translation engines (notably including Slate Desktop 2 and Lilt 3 ), I opted for KantanMT4. The deciding factors were: • KantanMT is cloud-based, and can therefore be used by students at home; • KantanMT provides Library data, in case the bilingual corpora produced by the students do not reach the critical mass required to get meaningful output from the engine built: • KantanMT’s generous Academic Partner Programme. The Academic Partner Programme provides access and use of the platform free of charge for students and lecturers for the duration of the course module, and one-to-one online training for lecturers to help create lesson plans. Besides allowing students to build custom machine translation engines, the platform also gives them a feel for the automatically generated evaluation metrics (Bilingual Evaluation Understudy [BLEU], F-Measure and Translation Edit Rate [TER]). In order to create the corpora needed to train our Italian to English CMT engines, we contacted several companies, all of which freely publish user manuals for their products on the Internet in several languages. We asked permission to use their data for teaching purposes, and two firms replied: Philips5 and Smeg (Smalterie Metallurgiche Emiliane Guastalla)6. Both companies market products in fairly limited domains, thus making their manuals ideal for building bilingual corpora to train CMT engines. One of our aims was precisely to restrict the domain sufficiently to reduce post-editing requirements to a bare minimum. At the time of the course KantanMT could only be used to build statistical machine translation (SMT) engines. The neural machine translation version was not available to Academic Partners. To build an SMT engine, you ideally need a monolingual corpus (language model) and a bilingual corpus (translation model). However we were only able to put together bilingual corpora since we did not know for certain which the original source language was. Using translated material to build the language model would probably lead to a defective model since it is very often possible to identify the source language in medium- length translations 7 . Moreover, there was virtually always an English language version of every manual in Italian, so it made more sense to use all the material available to maximize the amount of bilingual training data. In any case, a slight stink of translation, due to the lack of a language model, is not particularly important for the type of material we were training our CMT engines to translate (user manuals for household appliances), so the absence of this model was unlikely to be a big issue. Unfortunately the manuals we downloaded from the Internet were in PDF format, and unaligned. To convert and align the files, I prescribed the use of the on-line translation environment tool Wordfast Anywhere8. This tool was chosen for three main reasons: • it is cloud-based, so students can use it from home; • it is free to use; 2 Slate Desktop, https://slate.rocks 3 Lilt, https://lilt.com 4 KantanMT, https://www.kantanmt.com 5 Philips, https://www.philips.com 6 Smeg, http://www.smeg.com 7 Hans van Halteren, 2008. Source Language Markers in EUROPARL Translations. 8 Wordfast Anywhere, https://www.freetm.com 36 • in tests carried out before the start of the course, I was impressed by the high quality both of the PDF conversion feature and Wordfast Autoaligner (the alignment function). Wordfast Anywhere converts PDF files to Microsoft Word doc format, and the Autoaligner only worked if one of the two languages being aligned was English. This restriction, which was not an issue in our case, has since been lifted. The 42 students in the class were first divided into two groups (Smeg and Philips) to download as many manuals as they could from the Internet. They then worked together in pairs within their groups to convert the files and carry out the alignment. One student in each pair dealt with the source language files (Italian) and the other with the target language ones (English). It was decided to work this way because Wordfast Anywhere assumes the PDF file is in the source language of the active memory. Wordfast Anywhere creates an empty memory file when a project is set up since it expects to be used as a translation memory tool, and not simply as a PDF converter. Obviously the language settings can be reversed, but it is less time consuming to leave things as they are and convert PDFs written in one language only. Each member of the pair then gave half their files to the other and began the alignment process. Wordfast Autoaligner produces three types of aligned file: Translation Memory eXchange (TMX), plain text (TXT) and Microsoft Excel (XLS). All students chose to use TMX format. The students pooled all the data they aligned with the other group members, although they did not necessarily all use the same data to create their corpora. After the alignment was complete, the students formed smaller groups to build CMT engines with KantanMT. Several students chose to work alone. To assess their engines, besides considering the automatic metrics generated by KantanMT (BLEU, F-Measure and TER), the students carried out a series of comparisons. They took a manual, for which there was an existing translation which had not been used as training data for the CMT engine, and used it as input in three different tools: • Their KantanMT CMT engine. • Google Translate9. • A classic translation environment tool set up using the CMT engine training data corpus as a translation memory and only using the translation memory system features of the tool. The raw output from each was then compared with the official existing version published by Philips or Smeg on their websites. Everyone chose to use SDL Trados Studio10 as translation environment tool, except one student who used OmegaT (freeware)11. To put all the aligned files together into one TMX memory file for the translation memory system, the students used Heartsome TMX Editor (freeware)12. Moreover the students compared the time required to produce an unaided human translation of part of the same manual with how long it took to post-edit the raw output from their CMT engine. In order not to remain influenced by one task when performing the other, the student who did the unaided human translation was always different from the student who post-edited the raw output. They also assessed the degree of similarity of these two versions to the official translation. 9 Google Translate, https://translate.google.com 10 SDL Trados Studio, http://www.sdl.com 11 OmegaT, http://omegat.org 12 Heartsome TMX Editor, https://github.com/heartsome/tmxeditor8 37 3 Results All the students were able to build at least one working CMT engine (a total of 26 engines). The BLEU scores for the students’ engines reported by KantanMT ranged from 32% to 79% (mean: 64%). F-measure went from 52% to 85% (mean: 75%) and TER from 14% to 66% (mean: 34%). In most cases these are truly remarkable results also considering that no one had to use KantanMT’s Library data. The majority of students agreed, on the basis of their human quality assessments, that their raw KantanMT CMT engine output was better than Google Translate’s raw output for the kinds of text it was trained for (35/36 = 97%) and better than the raw output (pre-translation) obtained using the TM features of a translation environment tool (22/32 = 69%). In reality, in some cases, there was not much difference in quality between the raw translation produced by the translation environment tool and the raw CMT engine output, but several students observed that it would be quicker in practice to post- edit the CMT engine output since there is an editable proposal for every segment; translation memory systems leave the segment blank when no useful match is found. Unfortunately none of the students actually ran tests to verify this. A couple of students made the interesting observation that, for a few segments, their KantanMT CMT engine had produced a translation which was better than the official version stating that it sounded better. Almost everyone reported that it took less time to post-edit their raw CMT engine output than it did to produce an unaided human translation (27/28 = 96%). Only one person said the post-editing had taken slightly longer (1/28 = 4%). More than one student preferred their post- edited versions, defining the style as more manual-like. This of course could be due to the fact that students are not professional translators specialized in translating manuals. Another important goal was to keep the cost to the university and to the students as low as possible. This was successfully achieved, by exploiting the Kantan Academic Partner Programme, freeware tools (Wordfast Anywhere, Google Translate, Heartsome TMX Editor and OmegaT), and existing software licences (SDL Trados Studio). 4 Discussion Although I clearly laid out the aims, chose the tools, and suggested possible evaluation methods, I gave the students complete freedom to organize themselves, choose the files to include in their corpora, and establish their own human assessment criteria; some worked in teams, some in pairs and many alone, which explains why 42 students produced only 26 CMT engines. In addition, some of the students only reported part of the data according to what they found most interesting. All this unfortunately means that it is absolutely impossible to analyse their human evaluation data to produce aggregate scores. I have no intention of remedying this in future editions of the course module, since that would mean imposing rigid scoring models and the choice of material for the corpora. Given the degree of cynicism some of the students showed towards MT at the outset, such impositions risk giving grounds to accusations of result rigging. Seven students also managed to find time to experiment with Lilt (fourteen-day free trial), and five of them (5/7 = 71%) were very enthusiastic about it. Lilt also allows users to build CMT engines, and has the look and feel of a highly simplified on-line translation environment tool. I did not choose Lilt as primary tool for the course mainly because it does not generate any standard evaluation metrics (BLEU, TER, etc.). Since Lilt's MT system is adaptive and interactive, the output changes while the translator works in the application. For this reason, existing static evaluation metrics are not suited to it. In future editions of the course module, I will encourage more students to try Lilt out. 38 5 Conclusion The proposition was quite evidently a success since all the students were able to build at least one working CMT engine, try it out, and assess its output. At the beginning of the course, there was a certain amount of scepticism among the students regarding the effective usefulness of machine translation and post-editing. Although I did not aim to evangelize, there was a clear mood change in the end with all students except one stating – some perhaps still a little begrudgingly – that post-edited machine translation has a legitimate role to play in the translation industry (41/42 = 98%). The dissenter wrote: “Of all the systems used during the course, I remain of the opinion that the best translation is manual, albeit more laborious and slower, because it requires less [pre-editing and post-editing] than any other translation system.” Acknowledgements All trademarks and trade names are the property of their respective owners. References Hans van Halteren, 2008. Source Language Markers in EUROPARL Translations. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pages 937–944. 39 A Comparative User Evaluation of Tablets and Tools for Consecutive Interpreters Joshua Goldsmith University of Geneva [email protected] Abstract Since the release of the first modern tablets, practicing interpreters have begun to consider how tablets could be used to support their interpreting practice. The first phase of a recent mixed methods study assessed the pros and cons of different tablets, note-taking applications and styluses, finding that professional interpreters were effectively using tablets for consecutive interpreting in a wide range of settings (Goldsmith & Holley 2015). This paper presents the second phase of this pilot study, building on previous conclusions and a survey of practicing interpreters to derive an instrument for carrying out a comparative user evaluation of these tablet interpreting tools. In light of survey results, user preferences for tablet, application and stylus features were ranked. Results from the comparative user evaluation were also utilized to compare and contrast note-taking applications currently used by tablet interpreters. The conclusions of the user evaluation and comparison of note-taking applications are expected to serve as a useful guide to allow interpreters to pick the tablets, applications and styluses which best meet their needs for consecutive interpreting. Keywords: tablet interpreting, consecutive interpreting, tablet, note-taking, comparative user evaluation 1 Introduction As tablets have become more prevalent, pioneering interpreters have begun to use them in their daily work, even asking if they might constitute “the ideal boothmate” (Hof 2012). Prac- ticing interpreters have examined the pros and cons of using tablets for interpreting and dis- cussed resources and applications that could be useful in professional settings (Drechsel 2013a, 2013b, 2017; Drechsel & Behl 2016; Goldsmith & Drechsel 2015a, 2015b, 2016; Scott 2012). Interpreters have also described their experiences testing and using tablets for note-taking (Behl 2013a, 2013b, 2015; Rosado 2013); a few of them have provided more con- crete recommendations for various applications, styluses and tablets (Goldsmith & Drechsel 2016; Rosado 2013). The first mention in the literature of using tablets for note-taking appears to date to 2014. In an article on “technology-assisted interpreting,” Costa, Corpas Pastor & Durán Muñes report- ed on glossary-building and knowledge management tools, the use of voice recorders for in- terpreter training, and note-taking applications. Although the article did not present empirical data, it noted that “more and more interpreters are turning to mobile devices to take notes” (Costa, Corpas Pastor & Durán Muñoz 2014b: 31) and suggested a few applications that the authors believed might prove useful for note-taking. Since then, several empirical studies on tablet interpreting have considered how tablets are used by practicing interpreters for simultaneous interpreting and preparation (Paone 2016) and consecutive interpreting (Goldsmith & Holley 2015). In their study, Goldsmith & Holley found that the functionalities offered by using a tablet for consecutive interpreting might out- strip the functionalities offered by pen and paper. Furthermore, respondents reported that tab- let interpreting equals or surpasses pen and paper interpreting in many contexts and settings, and that tablet interpreters appreciate the additional features offered by using a tablet for in- terpreting. Although tablet interpreting is generally well received in most settings, some con- cerns exist, and a series of factors occasionally lead practitioners to select pen and paper over 40 Proceedings of the 39th Conference Translating and the Computer, pages 40–50, London, UK, November 16-17, 2017. 2017c AsLing tablets. In the conclusions of their study, Goldsmith & Holley presented lists of the features that the pioneering tablet interpreters they had interviewed found relevant in their practice. An article by Goldsmith (forthcoming) summarizes these results, while a chapter by Drechsel & Goldsmith (forthcoming) considers issues such as cognitive load in tablet interpreting and the pros and cons of using tablets for conference preparation and simultaneous interpreting, argu- ing that tablets should be introduced into interpreter training programs. A recent experimental study (Oceguera López, 2017) analyzed the effect of training on the acquisition of tablet interpreting skills. Over the course of four 40-minute sessions, eight un- dergraduate interpreting students were trained to use tablets for consecutive interpreting. Dur- ing each session, participants took consecutive interpreting notes on a tablet and recorded their renditions of these speeches. Think aloud protocols revealed challenges such as the need to familiarize oneself with the note-taking software and the experience of rendering a speech from digital notes. A questionnaire identified similar benefits to those presented in Goldsmith & Holley (2015), although participants had mixed feelings about whether tablets outstripped pen and paper for note-taking, possibly due to their limited tablet interpreting experience. In the most novel part of the study, four participants transcribed their recordings and identified omissions, errors and incorrect use of vocabulary. The results indicated that tablet interpreting performance improved with training. Related research has also investigated “simultaneous consecutive interpreting,” which en- tails recording a speech that would normally be rendered in consecutive mode, playing it back on headphones and rendering it in simultaneous mode; playback can be slowed down if nec- essary, e.g. for particularly difficult passages. Scholars have found that this approach resulted in better interpreting performance, which was seen in “more fluid delivery, closer source- target correspondence” (Hamidi & Pöchhacker 2007:14), greater accuracy, fewer “disfluen- cies” (hesitation phenomena), greater interpreter confidence, and a more complete rendition (Orlando 2014). Other studies have found that digital pens could be used for training budding interpreters: playing back recordings of the note-taking process helped promote metacogni- tion, allowing students to identify gaps in their technique and design tailored strategies to ad- dress them (Orlando 2015a, 2015b; see also Orlando 2016). Recent technical developments also allow interpreters to use a tablet and stylus for simultaneous consecutive interpreting (El- Metwally 2017). 2 Methodology Goldsmith & Holley’s (2015) pilot study represented the first stage in a multiphase mixed methods research project aiming to (1) map the field of those who use tablets for consecutive interpreting and (2) develop an instrument to evaluate the various tools and technology available in this field. Through six in-depth interviews with professional interpreters working in a wide variety of settings, they carried out the exploratory sequential design phase of this project, collecting and analyzing qualitative and quantitative data with a view to later developing an instrument (Creswell & Plano Clark, 2011; Creswell, Plano Clark, Gutmann, & Hanson 2003). After deriving a set of inductive codes and analyzing the in-depth interviews using NVivoTM, Goldsmith & Holley (2015) presented a set of features to consider when assessing new and existing tablets, applications and styluses to determine their potential effectiveness. Based on the conclusions of the first stage of this project, the study presented in this article set out to conduct a user evaluation by answering two questions: • Which features of tablets, note-taking applications, and styluses are most important for tablet interpreters working in the consecutive mode? • Which tools on the market offer the greatest number of these features? 41 Although user evaluations have yet to be conducted on tablet interpreting, several studies have assessed various terminology management programs for interpreters. For example, based on a literature review and a description of eight terminology management tools on the market at the time, Costa, Corpas Pastor & Duran Muñoz (2014a) aimed to establish a system for evaluating key features to determine the extent to which terminology tools met interpreters’ needs. They awarded up to 10 points for five “fundamental” features and up to five points for 10 “secondary” features. For each feature, they established a system for awarding points; they then evaluated the tools they had selected based on the criteria they had identified and deter- mined which tools best met the perceived needs of interpreters. Will (2015:187) analyzed a more limited set of four “generally available and utilized” terminology management tools based on three key criteria – view, data processing and operation and use – awarding 0 to 5 points for each of these criteria using the following point system: “not implemented or recog- nizable” (0), “insufficient” (1), “sufficient” (2), “satisfactory” (3), “good” (4), or “very good” (5). These approaches to conducting a user evaluation present several methodological short- comings. For example, the researchers selected features to assess based on their perceptions of which features were most important. In the case of Costa, Corpas Pastor & Duran Muñoz (2014a), the authors also decided that certain features were more relevant than others, award- ing twice as many points to these features. Furthermore, the researchers selected the tools that they decided to evaluate based on their perception of which tools were most relevant. Finally, they used point scales to determine the extent to which a tool offered a given feature, yet nei- ther approach seems to be based on scientific criteria. Costa, Corpas Pastor & Duran Muñoz (2014a) used variable criteria for awarding points – some features were awarded points on an “all or nothing” basis depending on whether or not a feature was present, while others had variable point values (e.g. 0, 3, 7, or 10 vs. 4, 7, or 10 vs. 5 or 10) that were assigned for seemingly unclear reasons. In the case of Will (2015), the difference between, i.e. “sufficient” and “satisfactory” seems to be unclear and subjective. This study adopted a different approach to identifying which features to evaluate, determin- ing the relevance of features, and awarding points based on whether these features were avail- able in a given tool. The list of features was derived from the series of interviews with practi- tioners reported in Goldsmith & Holley (2015). Subsequently, practicing tablet interpreters were asked to rank the importance of each of these features by means of a questionnaire; re- sponses were averaged to derive a weighting coefficient for each feature, allowing features to be ranked based on their importance. The questionnaire distributed to practicing tablet inter- preters was used to select the tools that were evaluated for this study – this was considered to be a reliable indicator of the leading tools on the market. Finally, all features were assessed on a yes/no basis depending on whether or not a given application offered a given feature; a final score for each tool was calculated by multiplying the weighting coefficient by all available features and averaging the total values. Data was collected using a questionnaire. Since research has shown that reliability and va- lidity can be maximized by offering between four and seven options on rating scales (Lozano, García-Cueto & Muñiz 2008), that participants prefer a larger number of options (Muñiz, Cueto & Lozano, 2005), and that 6-response categories yield more consistent effects than 5- response categories (Moors 2007), six options were offered. Respondents rated each feature by answering the question “On a scale of 0 to 5, how important are each of these features for you?”, where 0 represented “not important” and 5 represented “very important.” A numerical scale with interval data was also expected to avoid some of the problems inherent in Likert scales, where the distance between ordinal responses like “always,” “often,” and “sometimes” is not always equal (Sullivan & Artino, 2013). 42 All participants completed the standard University of Geneva – Faculty of Translation and Interpreting informed consent form. All responses were anonymous and confidential, and were collected using an online survey tool. The questionnaire also gathered data on the tab- lets, operating systems, applications, and styluses used by respondents as well as statistical information. The survey was circulated over social media and email, including via the “Interpreter Tech- nology group” on Facebook, which has over 500 members. Using a variant on snowball sam- pling, participants were encouraged to forward the survey to any other tablet interpreters they knew. 3 Population Eleven (11) respondents completed the survey. In the additional information category, one respondent indicated that s/he worked as a full-time translator rather than as an interpreter. Consequently, this response was excluded from results. The ten respondents included in the population ranged in age from 27 to 57 (x̅ = 42). Re- spondents’ professional domiciles were located in North America (25%) and Europe (75%).1 Eight of the participants (80%) were a member of at least one professional translation and/or interpreting association. All but one participant (90%) worked with at least two active lan- guages; most had several additional passive languages (x̅ = 2.1). Respondents worked in a wide range of interpreting contexts, including conference interpreting (70%), diplomatic in- terpreting (50%), community interpreting (40%), legal / court interpreting (40%), medical in- terpreting (40%), business interpreting (30%), and media interpreting (20%).2 Respondents had between 3 and 32 years of professional experience (x̅ = 13.7) and between 2 and 7 years of tablet interpreting experience in the consecutive mode (x̅ = 4.6). Respondents worked fre- quently in consecutive mode (x̅ = 9 days / month), and had used tablets for over 1300 total assignments (x̅ = 165.6). 4 Results Although every effort was made to promote the survey and reach potential tablet interpreters, only 11 individuals responded; one participant was not an interpreter, and this set of responses was discarded. Given the small number of respondents, the results are not expected to be statistically significant. In light of this, results should be construed as indicative of current trends; further research will be needed to determine the size of the entire population of tablet interpreters and assess whether the results presented herein can be generalized to the population as a whole. Surprisingly, 90% of respondents used an iPad; only 1 participant (10%) used the Microsoft Surface. Five respondents (50%) used an iPad Pro, although sizes varied – one respondent used the 9.7” iPad Pro (10%), two used the 10.5” iPad Pro (20%), one used the 12.9” iPad Pro (10%), and one respondent did not indicate iPad size. Non-iPad Pro users utilized several dif- ferent types of iPads, including the iPad Air (10%), iPad mini (10%), and iPad 2 (10%). Re- sults therefore indicate that tablet interpreters used tablets offering a variety of form factors, from the 7.9” iPad mini to the 12.3” Surface Pro or 12.9” iPad Pro. Six of the ten respondents (60%) used first-party styluses – either the Apple Pencil (50%) or Surface Pen (10%). Respondents – especially those with older iPad models – also used a variety of third-party styluses, including active styluses (53 Pencil and Apex) and passive sty- 1 Several respondents left the questions about professional domicile, average days of consecutive assignments per month, and total number of consecutive assignments blank. These responses have been excluded from the statistics presented herein. 2 These categories were derived from self-reported data in Goldsmith & Holley (2015). 43 luses (Wacom Bamboo and Maglus). Interestingly, in response to the question about sty- lus(es) used, one participant wrote “none,” perhaps indicating that a finger was used for note- taking. Nearly every respondent who indicated their operating system used the most up-to-date op- erating system available. This is particularly relevant for iPad users utilizing iOS 11, which was released just three weeks before the survey was conducted, potentially indicating that tab- let interpreters tend to be early adopters of technology. In terms of note-taking applications, the Surface Pro user utilized Nebo, while iPad users worked with Notability (60%), Noteshelf (30%), Penultimate (30%), Bamboo Paper (20%), iOS Notes (20%), AudioNote (10%), GoodNotes (10%), and Whink (10%). Several respond- ents indicated that they use a variety of note-taking applications. Additional applications used for support while taking notes included document annotation applications such as Readdle Documents (30%) and Adobe Reader (10%); dictionary applica- tions such as Linguee (30%), WordReference (10%) and various unnamed dictionary applica- tions; glossary applications such as Interplex (10%), BoothMate for Interpreters’ Help (10%), an unnamed “glossary application,” Proz.com glossaries accessed through a web browser (10%), and iBooks for viewing one’s own glossaries (10%); word processing and office suites such as Mobisystems (10%); and other applications, like a web browser (20%) or Productivity (10%).3 In short, it appears that document annotation, dictionary, and glossary applications are most frequently used alongside note-taking applications for multi-tasking purposes. 4.1 Ranking of features Based on answers from respondents, the arithmetic mean was calculated for each feature. These results were then ranked from highest to lowest to determine the most and least relevant features for each of the three categories: tablets, applications and styluses. Given the limited number of respondents, other more advanced statistical tests were not applied, as it was not expected that they would yield statistically significant results. As such, the results below should be taken as preliminary, reflecting the nature of this pilot study. Table 1 presents a ranking of features that interpreters seek in tablets. Unsurprisingly, in- terpreters seek tablets that run smoothly and quickly, are portable, reliable, durable, and un- likely to crash, and offer good battery life. As they use their tablets for consecutive interpret- ing, it comes as no surprise that low latency, a smooth, quick writing experience, and a clear, easily visible screen are also important. When it comes to internet access, interpreters prize wireless access over 3G/4G functions, perhaps because they tend to work in locations where Wi-Fi is available or tether their tablets to their smartphones. Although participants used tab- lets of various sizes, they nevertheless found that the size of their tablet was important. Of slightly less importance were filing and organizing capabilities, multitasking and split-screen functionalities, appearance, boot time, and built-in wrist protection. Interestingly, the availa- bility of a USB port, ability to disable multitasking gestures, and cost were seen as being among the least important features. Table 2 presents a ranking of features that interpreters seek in note-taking applications. When it comes to note-taking applications, tablet interpreters appreciate reliable, stable appli- cations that are easy to use. The writing experience is key – applications should allow for fast, smooth writing, offer palm rejection and good handwriting recognition, and result in notes that are clear and easy to read. Respondents seemed to prefer vertical scrolling over horizontal page turns; in both cases, being able to move from one section to another within a set of notes was crucial. Changing between ink colors or stroke thickness, backing up notes to the cloud, organizing and filing notebooks, or zooming in was somewhat less important. Custom paper, 3 One respondent listed an application named “Interpret,” which to the author’s knowledge, does not exist. 44 Tablet features Rating Runs smoothly and quickly 5 Portable 4.9 Reliable 4.9 Battery life 4.7 Screen is clear and easily visible 4.7 Stable build / not likely to crash 4.7 Sufficient memory 4.7 Writing speed 4.7 Durability 4.6 Weight 4.5 Internet access (Wi-Fi) 4.4 Reference materials/documents easily accessible 4.4 Size 4.4 Writes smoothly 4.4 Filing and organizing capabilities 4.1 Limited number of cables 4 Split screen functionality / multitasking 4 Comfort 3.9 Professional appearance 3.9 Boot time 3.8 Built-in wrist protection 3.7 Quick learning curve 3.7 Internet access (3G / 4G) 3.7 USB port available 3.1 Ability to disable multitasking gestures 3 Cost 2.8 Table 1. Ranking of tablet features. converting handwriting to text, cut and paste, bookmarks, and embedding files into notes were among the least relevant features. Yet again, cost came in last in the ranking. Table 3 offers an overview of features interpreters seek in styluses. Tablet interpreters ap- preciate styluses that are comfortable, durable, and pair with tablets and applications. The sty- lus should write quickly and quietly, glide smoothly on the tablet, and feel natural. Recharge- able styluses that charge via USB or similar are preferred over styluses with replaceable bat- teries; fine-tipped nibs are preferred over softer, rubbery nibs. Features such as appearance and heft are slightly less important, while buttons offering additional functionalities, a built-in pen clip, and the ability to lodge the stylus inside the tablet are seen as even less important. For the third time, cost was among the least important features. Overall, interpreters working in the consecutive mode seek tablets, note-taking applications and styluses that are reliable, durable, and comfortable to use, offering a smooth writing expe- rience and resulting in clear, easy-to-read notes. Other features – such as the ability to organ- ize and file notes – and additional options – such as a variety of ink colors and thicknesses and professional appearance – are slightly less important. USB ports, buttons with added functions, a pen clip, the ability to lodge the stylus inside the tablet, and features such as cut and paste, bookmarks, and embedding seem to be among the least useful features. Finally, cost was consistently rated among the least important features, perhaps indicating that tablet interpreters are willing to invest more in equipment that allows then to do professional work. 45 Note-taking application features Rating Clear and easy to read 4.9 Pairs with stylus 4.8 Reliability 4.7 Comfortable to use 4.6 Smooth writing 4.6 Stable build / limited crashing 4.5 Writing speed 4.5 Palm rejection / wrist protection 4.3 Erasing 4.2 Speed of page turns 4.1 Vertical scrolling 4 Quality of handwriting recognition 3.9 Split screen functionality 3.9 Connectivity with other applications 3.7 Quickly change color or thickness of ink 3.7 Experience mirrors writing on paper 3.6 Visualize multiple pages simultaneously 3.6 In-app access to dictionaries / reference materials / internet 3.4 Variable stroke thickness 3.4 Cloud backup available 3.3 Filing and organization of “notebooks” 3.3 Variety of ink colors 3.3 Zoom 3.3 Horizontal page turns 3.2 Custom paper available 2.8 Converts handwriting to text 2.7 Cut and paste 2.6 Ability to add bookmarks for in-app navigation 2.5 Embed other files into “notebooks” 2.5 Cost 2.4 Table 2. Ranking of note-taking application features. 4.2 User evaluation of note-taking applications for consecutive interpreting As all but one respondent was an iPad user, the user evaluation was limited to note-taking applications utilized for consecutive interpreting on the iPad. All applications mentioned in the survey were assessed to determine which note-taking features they offered. Testing was conducted on a 2016 iPad Pro 9.5” (Model number MLMV2LL/A) running iOS 11 and using an Apple Pencil (Model number MK0C2AM/A). Table 4 presents a user evaluation of the eight note-taking applications that respondents re- ported utilizing for consecutive interpreting.4 Four applications – Noteshelf, GoodNotes, No- tability and Penultimate all scored similarly, offering approximately 85% of the most com- monly appreciated features. Whink, iOS Notes, and Bamboo Paper offered fewer features, while Audio Note clearly lagged behind its competitors. However, only Audio Note and No- tability offer recording that is synched with notes – a feature which is necessary for simulta- neous consecutive interpreting, but which did not emerged during the interviewers conducted during the first round of this study in 2015 (see Goldsmith & Holley, 2015). 4 The following versions of each application were tested: AudioNote 2 (2.0.1), Bamboo Paper – Notebook (2.1.5), GoodNotes 4 (4.12.6), iOS Notes (iOS 11.0.2), Notability (7.0.2), Noteshelf 2 (1.3), Penultimate (6.2.2), Whink (5.2). 46 Stylus features Rating Comfortable to hold 4.9 Durability 4.9 Integration / pairing with tablet 4.9 Writing feel (natural) 4.9 Writing volume (silent) 4.8 Compatible with all apps 4.7 Pairs with apps 4.7 Charges via USB (or similar) 4.6 Fine-tipped nib 4.6 Glides well 4.6 Natural hand position 4.5 Writing speed 4.5 Size similar to a pen 4.4 Replacement nibs available 4.3 Professional look 3.8 Heft 3.6 Replaceable batteries 3.3 Button with added functions 3.2 Softer/rubbery nib 3 “Spring” on screen 3 Cost 2.9 Built-in clip (pen clip) 2.7 Lodges inside tablet 2.4 Table 3: Ranking of stylus features Of course, a yes/no scale calls for clarity of definitions, and may mask minor differences between applications. For the sake of this evaluation, for example, the “variable stroke thick- ness” category was defined as a minimum of five stroke thicknesses; several applications with only three stroke thicknesses were not awarded points for this category, and other applications offered 10, 16, or even unlimited customizable thicknesses. Similar variety was detected in the “variety of ink colors” category, the number of active styluses an application paired with, number of other applications an application could connect to, number of levels for filing and organizing notebooks, split screen functionalities, and number of different types of files that could be embedded into notebooks. This level of detail was lost by adopting a yes/no filter for evaluation applications. Nevertheless, this type of assessment paved the way for conducting a scientifically-motivated user evaluation which bore in mind the preferences of practicing tab- let interpreters. 5 Conclusions This study represents the first comparative user evaluation of tools used by tablet interpreters working in the consecutive mode. Given the limited size of the population, the results of the pilot study are not statistically significant, and therefore should not be generalized to the larger population. Future research would be needed to determine the size of the population; various filters – including member- ship in a professional association or interpreting and tablet interpreting experience – could also help to clearly define the population and determine how many members of the larger population of interpreters are also using a tablet for note-taking in consecutive mode. 47 Audio Bamboo Good- iOS Not- Note- Pen- NB: Shaded boxes indicate availability of feature Coefficient Note Paper Notes Notes ability shelf ultimate Whink Clear and easy to read 0.98 x x x x x x x x Pairs with stylus 0.96 x x x x x x x x Reliability 0.94 x x x x x x x x Comfortable to use 0.92 x x x x x x x x Smooth writing 0.92 x x x x x x x x Stable build / limited crashing 0.9 x x x x x x x x Writing speed 0.9 x x x x x x x x Palm rejection / wrist protection 0.86 x x x x x x x x Erasing 0.84 x x x x x x x x Speed of page turns 0.82 x x x Vertical scrolling 0.8 x x x x x Quality of handwriting recognition 0.78 x x x x x x Split screen functionality 0.78 x x x x x x Connectivity with other applications 0.74 x x x x x x x x Quickly change color or thickness of ink 0.74 x x x x x x Experience mirrors writing on paper 0.72 x x x x x x x x Visualize multiple pages simultaneously 0.72 x x x x x x In-app access to dictionaries, etc. 0.68 Variable stroke thickness 0.68 x x x x x Cloud backup available 0.66 x x x x x x x x Filing and organization of “notebooks” 0.66 x x x x Variety of ink colors 0.66 x x x x x x x Zoom 0.66 x x x x x Horizontal page turns 0.64 x x x x Custom paper available 0.56 x x x Converts handwriting to text 0.54 x x x Cut and paste 0.52 x x x x x Ability to add bookmarks 0.5 x x x x Embed other files into “notebooks” 0.5 x x x x x x x Free / Free, in-app $6.99/yr. purchases Cost 0.48 (pro) available $7.99 Free $9.99 $4.99 Free $2.99 AVERAGE 57.83% 72.38% 85.63% 75.16% 84.99% 86.84% 84.80% 78.96% Table 4: User evaluation of note-taking applications for tablet interpreting 48 Despite these limitations, the study presents the first ranking of features that are important for tablet interpreters working in the consecutive mode. Overall results indicate that interpret- ers seek tablets, note-taking applications and styluses that are reliable, durable, and comforta- ble to use, offering a smooth writing experience and resulting in clear, easy-to-read notes. Other features are somewhat less important; cost was consistently among the least important features, indicating that interpreters may be willing to invest in the tools they need to do pro- fessional work. Results also indicate that tablet interpreters working in the consecutive mode most fre- quently use the iPad Pro and utilize tablets offering a variety of form factors; first party sty- luses – especially the Apple Pencil – are their styluses of choice. Tablet interpreters utilize a variety of applications for note-taking and to support their consecutive interpreting practice, although Notability was far and away the most popular note-taking application used by re- spondents in this study. Document annotation, dictionary, and glossary applications were most frequently used alongside note-taking applications for multi-tasking purposes. Four note-taking applications – GoodNotes, Notability, Noteshelf, and Penultimate – all scored similarly, offering the greatest number of features appreciated by tablet interpreters working in the consecutive mode. This pilot study presents a novel methodology for conducting a user evaluation of interpret- ing technology. It entails conducting broad, interview-based research to survey the field and determine relevant features, running a survey to test these features among practitioners, deriv- ing ranking and weighting from their answers, and evaluating the tools they report using. Despite the limitations inherent in any pilot study, the conclusions of the user evaluation and comparison of note-taking applications are expected to serve as a useful guide to allow interpreters to pick the tablets, applications and styluses which best meet their needs for con- secutive interpreting. It is expected that study results could give rise to a guide for interpreters interested in learning how to use these tools and shape future training courses on tablet inter- preting. References Behl, Holly. 2013a. The paperless interpreter experiment: Part I. http://www.paperlessinterpreter.com/paperless- interpreter-part-i/ Behl, Holly. 2013b. The paperless interpreter experiment: Part II. http://www.paperlessinterpreter.com/paperless-interpreter-part-ii/ Behl, Holly. 2015. The paperless interpreter experiment Part III: Microsoft Surface Pro 4. http://www.paperlessinterpreter.com/the-paperless-interpreter-experiment-part-iii-microsoft-surface-pro-4/ Creswell, John W., & Plano Clark, Vicki L. 2011. Designing and conducting mixed methods research (2nd ed.). Thousand Oaks, California: SAGE Publications. Creswell, John W., Plano Clark, Vicki L., Gutmann, Michelle L., & Hanson, William E. 2003. Advanced mixed methods research designs. In Abbas Tashakkori & Charles Teddlie (Eds.), Handbook of mixed methods in social and behavioral research (pp. 209-240). Thousand Oaks, CA: SAGE Publications. Costa, Hernani, Corpas Pastor, Gloria & Durán Muñoz, Isabel. 2014. “A comparative user evaluation of terminology management tools for interpreters.” In 25th International Conference on Computational Linguistics (COLING’ 14), 4th International Workshop on Computational Terminology (CompuTerm'14), pp. 68–76, Dublin, Ireland. Costa, Hernani, Corpas Pastor, Gloria & Durán Muñoz, Isabel. 2014. “Technology-assisted interpreting.” Multilingual 143, 27-32. Drechsel, Alexander. 2013a. The tablet interpreter. http://vkdblog.files.wordpress.com/2013/03/tabletinterpreter- public.pdf Drechsel, Alexander. 2013b, November 20. iPad interpreter. http://www.youtube.com/watch?v=qk3RNDGpe0Y&list=PLklixbOFpKxodoeh8lua0Zh9BkeI4GwLo&index =4. 49 Drechsel, Alexander. 2017. The tablet interpreter. (2017 edition). https://static1.squarespace.com/static/52d4015ce4b0eab6f2d76b6f/t/594b8b7a414fb54310f5957d/149812313 2497/The+Tablet+Interpreter+Manual.pdf Drechsel, Alexander & Behl, Holly. 2016. Kiss paper goodbye: Tablet technology for consecutive and simultaneous interpreting. Paper presented at the ATA 57th Annual Conference, San Francisco, California. Drechsel, Alexander & Goldsmith, Joshua. Forthcoming. Tablet Interpreting: The use of mobile devices in interpreting. In CIUTI-Forum 2016: Equitable Education through intercultural communication: Role and responsibility for non-state actors (eds. Forstner, Martin & Lee-Jahnke, Hannelore). Frankfurt am Main: Peter Lang. El-Metwally, Maha. 2017. Consec-Simo as a tool for Consecutive Interpreting. Webinar presented online through eCPD webinars. Goldsmith, Joshua. Forthcoming 2018. Tablet interpreting: Consecutive interpreting 2.0 for Public Service Interpreters. Translation and Interpreting Studies 13(3). Goldsmith, Joshua & Drechsel, Alexander. 2015a. The tablet interpreter. Talk presented at the 2015 CIUTI Conference, Geneva, Switzerland. Goldsmith, Joshua & Drechsel, Alexander. 2015b. Is there an app for that? Getting the most out of tablets in community interpreting. Workshop presented at the 2015 Critical Link Conference, Edinburgh. Goldsmith, Joshua & Drechsel, Alexander. 2016. Tablet interpreting: Tips, tools and applications to make the most of your tablet while interpreting. Webinar presented at the Proz 2016 Virtual Conference for International Translation Day. Goldsmith, Joshua & Holley, Josephine. 2015. Consecutive Interpreting 2.0: The Tablet Interpreting Experience.” (Unpublished MA thesis.) University of Geneva. Hamidi, Miriam & Pöchhacker, Franz. 2007. “Simultaneous consecutive interpreting: A new technique put to the test.” Meta: Journal des traducteurs (52.2), 276-289. Hof, Michelle. 2012. iPad: The ideal boothmate. http://aiic.net/p/6354. Lozano, Luis, García-Cueto, Eduardo & Muñiz, José. 2008. “Effect of the number of response categories on the reliability and validity of rating scales.” Methodology 2008:4, 73-79. Moors, Guy. 2007. “Exploring the effect of a middle response category on response style in attitude management.” Quality & Quantity 42:6, 779-794. Muñiz, José, García-Cueto, Eduardo, & Lozano, Luis. 2005. “Item format and the psychometric properties of the Eysenck Personality Questionnaire.” Personality and Individual Differences, 38, 61–69. Oceguera López, Patricia. 2017. El uso de aplicaciones para tablets en la toma de notas del intérprete. (Unpublished BA thesis.) Universidad Autónoma de Baja California, Mexico. Orlando, Marc. 2010. “Digital Pen Technology and Consecutive Interpreting: Another Dimension in Note- Taking Training and Assessment.” The Interpreters’ Newsletter 15, 71-86. Orlando, Marc. 2014. “A study on the amenability of digital pen technology in a hybrid mode of interpreting: Consec-simul with notes.” International Journal of Translation and Interpreting Research 6(2), 39-54. http://www.trans-int.org/index.php/transint. Orlando, Marc. 2015. “Implementing digital pen technology in the consecutive interpreting classroom.” In Andres, Dorte & Behr, Martina (eds.). To Know How to Suggest ... Approaches to Teaching Conference Interpreting, 171-200. Berlin: Frank & Timme. Orlando, Marc. 2015. “Digital pen technology and interpreting training, practice and research: Status and trends.” In S. Erlich and J. Napier (Eds.), Interpreter education in the digital age: Innovation, access and change, 125–152. Washington, DC: Gallaudet University Press. Orlando, Marc. 2016. Training 21st century translators and interpreters: At the crossroads of practice, research and pedagogy. Berlin: Frank & Timme. Paone, Matteo Domenico. 2016. Mobile Geräte beim Simultandolmetschen mit besonderem Bezug auf Tablets (Unpublished MA thesis.) University of Vienna, Austria. Rosado, Tony. 2013. Note-taking with iPad: Making our life easier. http://rpstranslations.wordpress.com/2013/05/28/note-taking-with-ipad-making-our-life-easier-2/ Scott, Juliette. 2012. One interpreter’s road kit. http://www.catherinetranslates.com/interpreter-road-kit Sullivan, Gail, & Artino, Anthony Jr. 2013. “Analyzing and interpreting data from Likert-type scales.” Journal of Graduate Medical Education 5(4): 541-2. Will, Martin. 2015. “Zur Eignung simultanfähiger Terminologiesysteme für das Konferenzdolmetschen.” trans- kom 8(1), 179-201. 50 MT and Post-Editing from a Translator’s Perspective Dimitra Kalantzi Translation Pozitron Ltd 4 Blackburn Road, Accrington, BB5 1HD, UK [email protected] Abstract There is no doubt that MT is nowadays one of the major trends in the translation industry. Indeed, more and more translation agencies offer MT and post-editing services to their clients, and professional translators are more and more likely to be offered post-editing tasks in their everyday work. In this context, and drawing from my own experience with MT as a translator, post-editor and MT evaluator, this paper discusses some common myths around MT and post-editing, suggests some additional services that freelance translators can offer in relation to MT, and also puts forward some reservations and ideas regarding MT evaluation within the translation industry. A plea is also made to universities and academics involved in the teaching of MT courses and modules to also cater to the needs of practicing translators looking to expand their knowledge and skills as part of their Continuing Professional Development (CPD). 1 Introduction There is no doubt that Machine Translation (MT) is nowadays one of the major trends in the translation and localisation industry. Everyone is talking and debating about it in social media, blogs and at conferences and almost everyone, including end clients, government bodies, translation agencies, technologists and even freelance translators, is trying their hand at it. Drawing from my own experience with MT as a translator, post-editor and MT evaluator, this paper discusses some common myths around MT and post-editing, suggests some additional services that freelance translators can offer in relation to MT, and also puts forward some reservations and ideas regarding MT evaluation within the translation industry. A plea is also made to universities and academics involved in the teaching of MT courses and modules to also cater to the needs of practicing translators looking to expand their knowledge and skills as part of their Continuing Professional Development (CPD). 2 Some Myths around MT and Post-editing Although the use of MT and post-editing in the translation industry is a hotly debated topic, some common myths still prevail. Some of them are of a more theoretical nature and others more practical. Here, we will discuss the following myths:  The assumed hostility of translators towards technology in general and MT in particular. That is, we often read and hear that translators dislike technology and particularly MT and post-editing (for instance, see O’Brien and Moorkens’ discussion of the reasons translators dislike post-editing, 2014, as well as Kelly’s article on why so many translators hate translation technology, 2014). Such statements can only be seen as gross generalisations; they might hold true for some translators but cannot be taken to be representative of the majority of translators today (for a discussion of this, see Stafilia, 2016). More importantly, what such statements fail to take into account is the main reason behind many translators’ skepticism towards MT; and that is the fact that because of the way they are currently practiced by some in the translation industry, MT and post-editing are often viewed as tools mainly targeted at lowering translation rates.  The assumption that the only role translators can have in relation to MT is that of the post-editor. Nonetheless, willing translators can also be of immense help and offer their 51 Proceedings of the 39th Conference Translating and the Computer, pages 51–54, London, UK, November 16-17, 2017. 2017c AsLing services in other areas related to MT, such as MT evaluation (including the design of evaluation tests), and the maintenance and clean-up of translation memories used in the training of MT engines. And, of course, translators can also build and train their own engines.  The myth of light post-editing. That is not meant to deny the existence of light post- editing, but to underscore the fact that light post-editing seems to be a rather rare scenario in the translation industry. Apart from that, the whole idea of light post-editing remains rather elusive, at least for many translators, as for every such task the translator and the end-client (and often the translation agency in-between) need to clearly specify what constitutes an error to be post-edited and what is outside the scope of post-editing.  The often taken-for-granted suitability of software translation for MT post-editing. IT and consumer electronics are often among the verticals for which custom MT systems are built. However, many user interface (UI) strings consist of a limited number of words, in some cases even 1 word, and are notoriously difficult to translate even for professional translators. That is particularly true when it comes to target languages that are morphologically richer than the language of the source text (a case in point being the pair of English into Greek). For instance, “Off”, an extremely common UI string, can be translated in Greek in at least 6 different ways depending on the gender and number of the noun that corresponds to the feature that is off, while in some cases the most appropriate translation might be a simple “No”.  The myth of increased productivity without qualification. As translators, we often hear that the expected or standard productivity rate for post-editing is 5,000 words per day, going up to as much as 7,000 or even 8,000 words per day, as opposed to 2,000 – 3,000 words for standard translation (see Memsource blog, 2015; KantanMT blog, 2014; DePalma, 2013). Such metrics, however, need to be taken with a grain for salt and the conditions under which such rates can be achieved should be clearly specified.  The myth that MT post-editing always entails a discounted rate. This is, however, by no means to be taken for granted, especially in the case of end-clients employing their own MT systems. Very often, in such cases, there is no discount for post-editing and translators are paid their usual translation rates. 3 Some Problems with MT evaluation in the Translation Industry Drawing from my own experience with MT evaluation in the translation industry as an evaluator involved in quality output tests (but also timed productivity tests) for clients translating from English into Greek in the verticals of IT and consumer electronics, I’d like to point out some problems regarding, among other things, the number and choice of evaluators involved, the choice of texts, the form of the test and the actual metrics used. 3.1 Number and choice of evaluators In all evaluations I have taken part in, only two evaluators are involved. Adding a third evaluator, at the very minimum, would seem to further ensure the validity and accuracy of the results, especially in cases where the scores of the two evaluators vary greatly. In addition, it would also be highly beneficial both for the evaluation and the implementation phase of MT and post-editing to always involve translators who are already specialised in the particular field and familiar with the specific client for whom the evaluation is undertaken. 52 3.2 Form of the test and choice of texts In most of the quality output tests I have carried out, a set of 50 random sentences is provided to evaluate. The source text appears on one side and next to it two (2) MT outputs, each from a different engine/training of an engine. This setup has the following effects:  Having to evaluate random sentences one by one rather than a running text means that in many cases it is impossible to rate the quality of the MT output. This is due to the lack of valuable context which if available would allow, for instance, to decipher what a pronoun such as “it” in the source might refer to (and hence if it has been translated correctly), as well as to judge the quality of the output in terms of cohesion (cohesive devices used in English and in Greek may differ greatly).  Due to the way the sentences are presented, the evaluator is bound to see the source text first and then the MT output. However, this actually might not be desirable, as reading the source text might interfere with the reading and understanding of the machine- translated text. That is, given the benefit of added information provided by the original sentence, the evaluator might be under the impression that the MT output is better and makes more sense than it actually does, and thus give it a higher score. Providing the MT output first and only then revealing the source might be a better strategy in that respect.  On a different level, the choice of texts is another important factor particularly relevant for the implementation phase (provided the test is actually a pass). That is, if the suitability of MT and post-editing for a client is evaluated, for instance, on the basis of a specific type of texts (e.g. manuals), this should not entail that all texts and all types of text by the same client can and should be machine-translated and post-edited. However, this is unfortunately not always the case. 3.3 Metrics used For the quality output tests I have carried out, all sentences are evaluated on a scale of 1 to 5 (1 being the lowest and 5 the highest score; half scores are also allowed). A set of instructions is also provided with a description of what each score stands for in terms of quality (these are the same in all tests). The test is a pass if the average score between both evaluators is 2.8 and above. There are two main problems with this type of evaluation and the particular metric used (i.e. quality). First of all, quality is a rather difficult thing to quantify and in this case, this is further aggravated by the fact that the actual descriptions provided for each score are far from clear, making it extremely difficult to distinguish one score from the other (this is particularly true for scores 2 and 3, and to a lesser extent 3 and 4). In many cases, it is quite hard for the evaluator to decide which score to assign to a particular MT output; this not only leads to frustration and the belief that the evaluation is not being done properly, but can also negatively affect the validity of the actual test. In fact, I have often felt that if I were given the same set of sentences a few days later, my own results would be far from consistent in at least a few cases. Secondly, it can be argued that the scores themselves have been sort of twisted to actually favour a pass. That is, although the pass score is 2.8 (instead of a mere 2.6), the way the scale has been set up means that a score of 3 is awarded to translations of rather average quality (according to the description). It seems that the scale should be redesigned and the descriptions rewritten in order to make sure that average translations are awarded an average score (i.e. around 2.5) so that results are not skewed. 53 4 MT and Post-editing as Part of Translator’s CPD MT and post-editing are here to stay and as such it only seems natural that they form part of translators’ CPD activities. Formal offerings, however, are rather scarce in that respect. Apart from attending conferences and relevant webinars and obtaining the post-editing certifications by TAUS and SDL, practicing translators who would to like to learn more about MT and post- editing or who would even like to learn how to train their own engines are left to their own devices. Admittedly, in many countries, there are now both undergraduate and post-graduate translation courses where MT and post-editing are part of the curriculum. However, doing an entire degree might be more suitable for aspiring translators or translators with little experience, while it might not be an attractive option for practicing translators who have already completed their studies and/or have many years of experience. As such, this paper makes a plea to universities and academics involved in the teaching of MT courses and modules to also cater to the needs of practicing translators not interested in registering in undergraduate and post-graduate courses. To start with, MT and post-editing modules already taught could become available to practising translators on a per module basis, ideally also allowing for distance learning attendance. In the longer term, separate modules, seminars, short courses and even summer schools could be offered focusing on the theoretical and practical aspects of MT and post-editing, including any relevant programming and IT skills needed to train one’s own engine both with SMT and NMT. Such offerings would no doubt be much welcome and particularly useful as they would be attuned to practicing translators’ needs. 5 Conclusion The overall aim of this paper was to discuss some aspects of MT and post-editing from the point of view of a translator/post-editor who has also taken part in various MT evaluations in the verticals of IT and consumer electronics. After referring to some common myths about MT and post-editing, this paper turned to the issue of MT evaluation in the translation industry, identifying some problems and possible solutions on the basis of my own experience as an MT evaluator. Finally, a plea was made to universities and teachers to take into account the needs of practicing translators who want to expand their knowledge and skills without doing an entire degree, by opening up MT modules currently taught in undergraduate and postgraduate courses on a per-module basis. References DePalma, Don. 2013. Post-editing in practice. In tcworld e-magazine. http://www.tcworld.info/e- magazine/translation-and-localization/article/post-editing-in-practice/ [last accessed October 16, 2017]. KantanMT blog. 2014. Post-Editing Machine Translation. https://kantanmtblog.com/2014/08/20/post-editing- machine-translation/ [last accessed October 16, 2017]. Kelly, Nataly. 2014. Why So Many Translators Hate Translation Technology. Article on HuffPost. https://www.huffingtonpost.com/nataly-kelly/why-so-many-translators-h_b_5506533.html [last accessed October 16, 2017]. Memsource blog. 2015. Post-editing in Memsource. https://www.memsource.com/blog/2015/08/18/post-editing- in-memsource/ [last accessed October 16, 2017]. O’Brien, Sharon and Joss Moorkens. 2014. Towards Intelligent Post-Editing Interfaces. In Proceedings of FIT XXth World Congress 2014, 4-6 Aug 2014, Berlin, Germany, pages 131-137. Stafilia, Dimitra. 2016. Translating Europe Forum 2016: A lot of talk about technology. Post on the PEEMPIP blog. http://blog.peempip.gr/tef-2016-technology/ [last accessed October 16, 2017]. 54 Using Online and/or Mobile Virtual Communication Tools in Interpreter and Translator Training: Pedagogical Advantages and Drawbacks Koen Kerremans Helene Stengers Vrije Universiteit Brussel Vrije Universiteit Brussel Pleinlaan 2, 1050 Brussels (Belgium) Pleinlaan 2, 1050 Brussels (Belgium) [email protected] [email protected] Abstract In this article we will present a research project in which we aim to compare the use of online and/or mobile virtual communication tools in two master programmes of interpreting and translation. Since this project was only recently launched, we will focus on the general objectives of the project, the planned activities and the expected impact. 1 Introduction In this article we will present a research project in which we aim to compare the use of online and/or mobile virtual communication tools in the master programmes of interpreting and translation at Vrije Universiteit Brussel (VUB). Since this project was only recently launched, we will only be able to discuss the general objectives of the project and the planned activities and reflect on its expected impact. The master programmes in translation and interpreting at VUB are based on a situated learning approach, which is generally understood as a didactic method in which students learn the profession and acquire professional skills through hands on experience by exposing them to simulated or real work environments, situations and tasks (González-Davies and Enríquez- Raído 2016). In recent years, this learning-by-doing approach (or authentic experiential learning) has gained quite some attention in translator and interpreter education (Class and Moser-Mercer 2013; Braun and Slater 2014; Kiraly 2016). In the following section, we present a general motivation for introducing student interpreters and translators to several virtual communication tools. Next, we present the general objectives of the project and the planned activities. Finally, we briefly reflect on the expected outcome(s) of the project. 2 Rationale In creating authentic learning contexts for student translators and interpreters, technology has become an important factor to take into consideration, given the unmistakable impact that it has on professional translation and interpreting practices. A review conducted by Braun (2015), for instance, shows that the use of Remote Interpreting Technologies (RITs) – i.e. telephone or video interpreting technologies – is gaining ground in different professional interpreting settings (e.g. immigration proceedings, multilingual conferences or healthcare contexts) and that, consequently, interpreters are more often working in virtual spaces. The same holds for translators who, owing to the ongoing technologisation and globalisation, are no longer confined to a physical office space but can work wherever and with whomever they see fit in a shared virtual environment (Olvera-Lobo et al. 2009). The popularity of virtual spaces and virtual communication technology in contexts of interpreting and translation is expected to grow as a result of different factors such as improvements in internet access, the decreasing cost of mobile devices (tablets and 55 Proceedings of the 39th Conference Translating and the Computer, pages 55–58, London, UK, November 16-17, 2017. 2017c AsLing smartphones) or the growing number of virtual communication tools. Training programmes should therefore be geared to this new reality by presenting students in translation and interpreting with opportunities to become familiar with some of the available virtual technological solutions. 3 Project aims After a review of previous studies dealing with the use of virtual technologies in translator and interpreter training (see e.g. Sandrelli and Jerez 2007; Olvera-Lobo et al. 2009; Ritsos et al. 2013; Kajzer-Wietzrzny and Tymczynska 2014), several virtual communication tools will be tested and evaluated both from the trainers’ and the trainees’ perspectives in order to assess the pedagogical advantages (and possible drawbacks) as well as their impact on professional skill development, i.e. their contribution to the professionalisation of translation and interpreting programmes. As a result of new technologies available for web conferencing and/or video interpreting, as well as platforms for the management of collaborative translation projects, translators and interpreters are increasingly operating in virtual spaces which allow them to be in contact with their clients. In the present study, we will assess how virtual communication platforms and tools can be integrated in three practical courses of the master programmes interpreting and translation, more specifically 1) the interpreting workshop and the 2) interpreting internship, two compulsary course units in the master programme in interpreting, and 3) the translation workshop, which is a mandatory course unit in the master programme in translation. 4 Planned activities In the context of an interpreting workshop, i.e. a practical exercises course, students will be encouraged to use the virtual communication tools to make extra out-of-class exercises as part of their portofolio. This will allow us to gauge to which extent interpreting students are inclined to collaboratively make use of these technologies for extra practice. The implementation phase will take place during the first term as part of the Spanish interpreting workshop, which is followed by four students. After screening potential tools in a preparatory study, a selection of online communication platforms will be tested and compared with specific interpreting mobile applications during the practical inpreting exercises. The functions and characteristics of these applications make it possible to set up practical exercises in video interpreting and boothless simultaneous interpreting. Certain online communication platforms have the potential to present trainees with pedagogical authentic learning activities by means of collective video or phone conversations in which one speaker can be defined as a “presenter” (the trainer who reads a speech in the foreign language) who can listen to any of the “participants” in the group (the students who interpret the speech into Dutch). Exercises of this type are easy to integrate in the interpreting workshop. In this study, we also assess the usability of the virtual communication technologies mentioned above for interpreting internships, which require students to perform interpreting assignments about a panoply of possible subjects in different sectors (mainly non-profit) of the professional field at the demand of organisers of a conference or event who want to facilitate communication in other languages. Contrary to the pedagogical situation of an interpreting workshop, in which the trainer reads or plays the recording of an oral text while listening to the student interpreters, an interpreting internship offers an authentic (learning) situation with a speaker and a listener who are independent of each other and who do not understand each other's 56 language. The number of listeners is also typically higher than in a pedagogical context. The interpreting assignments are also performed in places which do not always dispose of interpreting booths. Therefore, it is important to investigate how online tools can be used optimally in order to allow boothless simultaneous interpreting or remote interpreting in this context. The usability of the online tools for internship assignments will be tested, compared and evaluated in the context of a master course entitled ‘Multilingual Education’, in which international speakers discuss the multilingual education in their own countries. Student interpreters will be invited to interpret the lectures of the guest speakers into Dutch, the master students' mother tongue. The third and final objective of this project is to compare and evaluate the use of virtual communication technologies to support virtual translation workshops. These are basically teaching sessions in which the translation teacher(s) and students do not share the same physical space (e.g. a computer lab or traditional classroom on the university campus) but interact with one another via a virtual space. Several tools will be tested during virtual feedback sessions on translation assignments, submitted by the students. These sessions will take place in the context of the workshop on technical-scientific translation (English-Dutch), a one-semester course which is taught to a group of approximately 20 students. Apart from supporting live interactions and demonstrations, tools will need to provide all interactants (both teacher(s) and students) the possibility to share their own computer screen with all participants during the workshop. The use of virtual communication tools during the three types of sessions outlined above – interpreting workshops, interpreting internships and translation workshops – will be discussed and evaluated by the translation/interpreting teachers and students during focus group sessions that will be scheduled immediately after each planned translation or interpreting session. During these focus group sessions, topics will be discussed pertaining to the pedagogical advantages and possible drawbacks of the tools used in the study. 5 Conclusion and expected impact Using online virtual technologies in the master translation and interpreting programmes is not only relevant for the professionalisation of the master curricula – it better prepares students for the professional field and will also create new opportunities for collaboration between students and translation or interpreting professionals. Such technologies also offer interesting pedagogical opportunities. Authentic learning activities for both student translators and interpreters are a welcome addition to the ‘traditional’ workshops, since it allows student to carry out assignments autonomously and in a flexible manner instead of in a translation/interpreting classroom on the university campus. Student interpreters will have the possibility to use the online tools to practice among themselves without the presence of a teacher. It stands to reason that this additional practice will have a positive impact on students' interpreting skills. The virtual communication platforms may also open up new possibilities for feedback and student monitoring. Acknowledgements This project is funded by Vrije Universiteit Brussel’s Education Innovation Project funding – OnderwijsVernieuwingsProject (OVP). 57 References Braun, Sabine. 2015. ‘Remote Interpreting’. In The Routledge Handbook of Interpreting, edited by Holly Mikkelson and Renee Jourdenais, 352–67. London/New York: Routledge Taylor and Francis Group. Braun, Sabine, and Catherine Slater. 2014. ‘Populating a 3D Virtual Learning Environment for Interpreting Students with Bilingual Dialogues to Support Situated Learning in an Institutional Context’. The Interpreter and Translator Trainer 8 (3): 469–85. doi:10.1080/1750399X.2014.971484. Class, Barbara, and Barbara Moser-Mercer. 2013. ‘Training Conference Interpreter Trainers with Technology - a Virtual Reality’. In Quality in Interpreting: Widening the Scope, Volume 1, edited by O. Garcia Becerra, E.M. Pradas Macias, and R. Barranco-Droege, 293–313. Granada: Editorial Comares S.L. González-Davies, Maria, and Vanessa Enríquez-Raído. 2016. ‘Situated Learning in Translator and Interpreter Training: Bridging Research and Good Practice’. The Interpreter and Translator Trainer 10 (1): 1–11. doi:10.1080/1750399X.2016.1154339. Kajzer-Wietzrzny, Marta, and Maria Tymczynska. 2014. ‘Integrating Technology into Interpreter Training Courses: A Blended Learning Approach’. Edited by Maria Piotrowska and Sergiy Tyupa. inTRAlinea. http://www.intralinea.org/specials/article/2101. Kiraly, Donald C., ed. 2016. Towards Authentic Experiential Learning in Translator Education. Göttingen: V&R unipress/Mainz University Press. Olvera-Lobo, María Dolores, Bryan Robinson, José A. Senso, Ricardo Muñoz-Martín, Eva Muñoz-Raya, Miguel Murillo–Melero, Enrique Quero-Gervilla, María Rosa Castro-Prieto, and Tomás Conde-Ruano. 2009. ‘Teleworking and Collaborative Work Environments in Translation Training’. Babel 55 (2): 165–80. doi:10.1075/babel.55.2.05olv. Ritsos, P.D., R. Gittins, Sabine Braun, Catherine Slater, and J.C. Roberts. 2013. ‘Training Interpreters Using Virtual Worlds’. LNCS Transactions on Computational Science XVIII (7848): 21–40. doi:10.1007/978- 3-642-38803-3_2. Sandrelli, Annalisa, and Jesús de Manuel Jerez. 2007. ‘The Impact of Information and Communication Technology on Interpreter Training’. The Interpreter and Translator Trainer 1 (2): 269–303. doi:10.1080/1750399X.2007.10798761. 58 When Google Translate is better than Some Human Colleagues, those People are no longer Colleagues Samuel Läubli1 and David Orrego-Carmona2,3 1 Institute of Computational Linguistics, University of Zurich, Switzerland 2 School of Languages and Social Sciences, Aston University, United Kingdom 3 Department of Linguistics and Language Practice, University of the Free State, South Africa Abstract We analyse posts on social media (Facebook, LinkedIn, and Twitter) as a means to understand how translators feel about machine translation (MT). A quantitative analysis of more than 13,000 tweets shows that negative perceptions outweigh positive ones by a ratio of 3:1 overall, and 5:1 in tweets relating MT to human translation. Our study indicates a disconnect between translation and research communities, and we outline three suggestions to bridge this gap: (i) identifying and reporting patterns rather than isolated errors, (ii) participating in evaluation campaigns, and (iii) engaging in cross-disciplinary discourse. Rather than pointing out each other’s deficiencies, we call for computer scientists, translation scholars, and professional translators to advance translation technology by acting in concert. 1 Introduction Mistranslations can be hilarious. In fact, social media have become ideal outlets to share pictures of clumsy food menus and mislabelled street signs, as well as screenshots of translation errors produced by machine translation (MT) engines such as Google Translate. People share them, laugh at them, and criticise them openly. Professional translators also participate in the debates about such translations. On a regular basis, translators on LinkedIn, Facebook and Twitter engage in discussions about mistranslations and how they show that MT is not comparable to human translation (see Figure 1). Translators use these spaces to voice their frustration with MT and the implications it has on their profession. Social media groups dedicated to translation and translators count their members in the thousands. Communities of practice have emerged thanks to these spaces; however, researchers have barely looked at them in order to better understand translators and their opinions. Considering the lack of attention to translators’ activities on social media and curious about how these could be used to understand the translators’ perceptions of MT, we decided to conduct a study into how translators’ interactions on social media could help improving translation technology. Perceptions of MT among translators have been explored using questionnaires and interviews. We conjectured that eliciting their opinions from online Figure 1: Meme posted on a translators’ group on Facebook, mocking the use of Google Translate. 59 Proceedings of the 39th Conference Translating and the Computer, pages 59–69, London, UK, November 16-17, 2017. 2017c AsLing interactions would provide us with data to understand their attitudes towards MT, and propose ways in which their efforts and knowledge could support the improvement of the technology. We used qualitative and quantitative methods to analyse how translators’ feel about MT. We first present the results of our initial qualitative exploration of groups on Facebook and LinkedIn. The posts on these platforms gave us the impression that sentiment towards MT in translation groups is predominantly negative. Aiming at quantifying this initial impression and providing empirical grounding, we then employed automatic sentiment analysis on a larger data set. We classified a collection of 13,150 tweets about MT using human annotation on a subsample of 150 tweets, and automatic annotation for the entire collection.1 Both our qualitative and quantitative analyses show that negative perceptions in social media outnumber positives. To the best of our knowledge, these results provide the first empirical view of how MT is portrayed on social media. Based on our findings, we make a call for improving collaboration among professional translators and researchers, and propose possible avenues to move towards that goal. 2 Background Most of the literature on the perception of MT among translators, some of which we review in this section, relies on data obtained through formal questionnaires and interviews. This paper is motivated by our impression that translators might be more open and direct when expressing opinions on social media, as well as the fact that there is a lot more data than could be collected through direct interrogation. Already in 1993, Meijer found that the MT was seen as a threat among translators, and negative opinions seem to persist (see Guerberof Arenas, 2013; Gaspari et al., 2015; Cadwell et al., 2017). Despite significant technological advancements in recent years, translators are ‘still strongly resistant to adopting MT as an aid, and have a considerable number of concerns about the impact it might have on their long-term work practices and skills’ (Cadwell et al., 2017). As a response to these concerns, in the last two years, the International Federation of Translators (FIT) has published three position papers on MT, crowdsourcing, and the future for professional translators. In their paper on MT, they state that ‘MT is unlikely to completely replace human translators in the foreseeable future. Leaving aside the area where MT is a feasible option, there will continue to be plenty of work for them. Professional translators, who have the appropriate skills and qualifications, will still be needed for demanding, high-quality products’ (Heard, 2017). Given that the Federation represents the interests of professional translators, their paper can be seen as an indicator of the relevance to understand how translators feel about MT. In spite of the seemingly significant importance for the community,2 the use of social media among professional translators has been barely studied. Desjardins (2016) addresses the aspects of professionals using social media but primarily as a strategy to increase their visibility, not as a way of interacting among themselves. The research that is available in the field of translation and social media has mainly explored the work of non-professionals and their translations (e. g., Dombek, 2014; O’Hagan, 2017; Jiménez-Crespo, 2017). Although not on social media, the online presence of translators and their attitudes in other outlets have previously been 1 All data and annotations are released under the CC BY-SA 4.0 license, available at https://github. com/laeubli/MTweet. 2 Social media groups dedicated to translation and translators count their members in the thousands, and since 2013, proz.com has been running the Community Choice Awards to recognise, among others, translation and interpreting professionals and companies who are active and influential on the internet. 60 explored: McDonough Dolmaya (2011) analysed translators’ blog entries to understand the attitudes and practices of professional translators, while Flanagan (2016) used blogs to study their opinions towards crowdsourcing. Along the same lines, researchers have also asked professional translators about their attitude towards MT (Meijer, 1993), their opinion about post-editing of MT output (Guerberof Arenas, 2013), and their reasons to use or not use MT (Cadwell et al., 2017). The research on the translators’ opinions about MT is still limited and, to the best of our knowledge, no study has analysed interactions on social media as a way of understanding the translators’ attitude towards MT. Interacting on social media requires less time and effort than maintaining a website or writing a blog post so we assume a larger number of translators would be involved in different types of exchanges in social media platforms. 3 Qualitative Analysis Our aim to fill this gap started with a preliminary analysis of translators’ posts and comments on Facebook and LinkedIn. We hand-picked 137 examples related directly or indirectly to MT by browsing through public and invitation-only3 groups: Professional Translators and Interpreters (ProZ.com),4 Translation Quality,5 The League of Extraordinary Translators,6 Things Translators Never Say,7 and Translation and Interpreting Group.8 It is important to point out here that this part of the study does not claim to be comprehensive; it serves to illustrate the situation that unfolds in these groups rather than provide generalisable results. In relation to the assumption that MT can be a threat to professional translators, one of the recurrent topics in these groups is quality. Translators engage in discussions about the mistranslations produced by MT engines as a way of reinforcing the need for human translators. Photos and screenshots of translation errors are systematically posted to the groups. Translators criticise them and comment on the shortcomings of MT (see Figure 2b). Some use the examples to respond with sarcasm to the possibility that translators might be replaced by machines in the near future: ‘Oh yes, I’m very worried about being replaced by a machine when it can’t tell the difference between an extraordinary announcement and a declaration of emergency...’. In their discussion, Google Translate is normally the main culprit, probably because of its accessibility and the considerable number of languages in which it operates. There are direct references to Google Translate in 66 of the 137 posts we collected for this qualitative analysis. Translators also question the improvements announced by the companies that develop MT. In response to an article comparing the quality of neural MT to phrase-based MT and human translation, one translator indicates her doubts about the results commenting ‘I wonder how the quality was measured if neural came so close to human.’ In this case, MT as such is not the issue that is put on the spot but the concept of quality that is used to assess MT output. Also, as pointed out by other researchers (see Doherty and Kenny, 2014; Cadwell et al., 2017), translators feel they are not considered part of the development of MT: ‘Yes, AI people who know nothing about our job, we totally agree that you will figure out how to replace us with machines in the next ten years. Sure you will.’ In some cases, translators even use MT as an indicator of poor quality when judging other translators and their translations. Figure 2d shows a comment by a translator who uses Google Translate’s output quality as a point of comparison to argue that the translation she 3 Access is usually granted within minutes. 4 https://www.linkedin.com/groups/138763 5 https://www.linkedin.com/groups/3877235 6 https://www.facebook.com/groups/extraordinarytranslators 7 https://www.facebook.com/groups/thingstranslatorsneversay 8 https://www.facebook.com/groups/Interpreting.and.Translation 61 (a) Translation from German into English made by Google Translate used as an example of how confusing MT outputs can be. In the comments, the translators discuss Google Translate’s poor attempt at rendering the different meanings of the terms in German. In the output in English, all the German terms are translated as ‘economics’, resulting in a meaningless repetition of the same term. (b) A translator posted a link to a list of examples of mistranslations generated by Google Translate App’s function that can (c) Translators engage in a discussion about translate text in images. The translator whether or not MT can be acceptable in sarcastically comments on the fact that specific circumstances. Some of them argue the poor quality of the translations makes MT can be useful for small business without it unlikely that human translators will be the resources to pay for a professional replaced by computers in the near future. translation, while others stress the fact that accepting MT as a valid option means lowering the standards of the profession. (d) A translator complaints about the quality of the translation she is revising. As a way of signalling the poor quality of the translation made by her colleague, she claims it would have been better to proofread a machine-translated text. Figure 2: Examples of translator interactions in Facebook groups. 62 is proofreading is of low quality. Translators also recognise that some of the mistakes present in the translations that are posted in the group are such poor examples of translations that ‘not even Google translate [sic] is that bad.’ However, not all the posts and comments on social media discredit MT straight away. Figure 2a presents an image that was shared in one of the groups showing Google Translate’s translation of a German text into English. Interestingly, the translators who commented on this post were genuinely curious about the veracity of the output. Some of them took the time to retype the text into Google Translate and check whether the translations into English or their own target languages made any sense. Comments in the groups often also point at the use of MT as an aid for translating as an indicator of poor quality or a poorly skilled professional. One of the commentators states that ‘Machine translation, like Google Translate, can give you a false sense of competence’, suggesting that non professionals could get the impression they can translate thanks to the support of MT. Another translator comments on the fact that the fear of MT is, in a way, an indicator of the competence of the translators. She says that ‘[m]achines will only replace those translators who translate like machines.’ These opinions do not represent isolated cases. In another thread when discussing the issues that MT could bring to the profession, a translator states that ‘When Google Translate is better than some human colleagues, those people are no longer colleagues.’ Using Google Translate or the risk of being replaced by a machine seem then to be related to a translator’s lack of professionalism or skills. One of the highlights of MT is affordability: automating the process of translating makes it possible for people to access translations, even when they do not have the resources to pay for them. The discussion depicted in Figure 2c serves as an example of this argument among translators. Some of the translators recognise there are situations in which having access to an automatic translation is better than having no translation at all, while others would not consider it possible to accept a translation that only allows users to ‘get the idea’. For some of the translators, it seems, accepting MT as a valid option would constitute lowering the standards of the profession. Discussions in the groups commonly go back to the assumption that human translators approach translation as a creative task, while MT only looks at translation as the word-for-word replacement of a string of text. Not all the discussions centre on the negative aspects of MT. Some translators point out that MT, and Google Translate in particular, are good for certain language combinations or specific fields, and can actually support the work of skilled professionals. A translator summarises these two points when he states that ‘Translation and interpreting are very demanding professions where talented human linguists will continue to make the difference in value and quality. Nevertheless, it is hard to deny the benefits of applied language technology – CAT for translators and VRI for interpreters to name but a few – to support linguists and language service providers in their joint mission to meet customer requirements in a very rapidly changing market of demanding end users and organizations who pay the bill for these language services.’ 4 Quantitative Analysis The initial exploration of how MT is discussed on social media reinforced our impression that perceptions are predominantly negative among professional translators. We conducted a larger study in order to ground this impression empirically. In this stage, we focused on Twitter data as large numbers of posts are difficult to obtain from Facebook and LinkedIn (see Section 4.1). Our goal was to quantify the extent of positive, neutral, and negative tweets on MT, for which we employed independent human judges (Section 4.2) and an automatic sentiment classifier 63 Annotator B 63 n = 150 positive neutral negative 47 positive 12 11 6 Annotator A 28 neutral 11 28 2 12 negative 8 9 63 positive neutral negative tie (a) Assignments per class (b) Confusion matrix Figure 3: Human Sentiment Analysis (Section 4.3). 4.1 Data Collection We collected tweets from twitter.com using a purpose-built, open source web crawler.9 We only kept tweets which (i) contain the terms ‘machine translation’ and/or ‘machine translated’, (ii) are written in English, according to Twitter’s language identification, and (iii) were created between 1 January 2015 and 31 July 2017. This method is not exhaustive in that authors may refer to machine translation by use of synonyms or without mentioning it explicitly. However, the data is representative of what a user would find searching for the terms mentioned in (i) above through Twitter’s search interface. Our filtered collection contains 13,150 tweets. 4.2 Human Sentiment Analysis We sampled 150 tweets from this collection for human sentiment analysis. The selection was random, except that we required each tweet to contain at least one of the following terms: ‘human’, ‘professional’, ‘translator’. As discussed in Section 5.2, we used this heuristic to focus on discussions comparing MT and human translation in this part of the study. The sampled tweets formed the basis for an annotation job on a web-based crowdsourcing platform.10 Annotators were asked to read each tweet, click all links found in the text for additional context, and then determine if the tweet is positive, neutral, or negative with regards to machine translation. Tweets were presented in random order. We included ten control items as a means to filter out random contributions: each annotator saw ten tweets twice, and we expected them to be consistent with their own judgements. Human annotators were recruited through convenience sampling, the restriction being that they have never been involved with translation, translation studies, or computational linguistics. Five annotators completed the entire job, from which we excluded three due to low inter-annotator agreement: they failed to reproduce their own judgements on three or more out of ten control items. Human annotation results are summarised in Figure 3. Inter-annotator agreement is 68.7 % (Cohen’s κ = 0.495). The two remaining annotators independently assigned the same label to 103 out of 150 tweets: 12 positive, 28 neutral, and 63 negative (Figure 3a). Two different labels were assigned to 47 tweets, with most disagreement between positive and neutral (Figure 3b). 9 https://github.com/jonbakerfish/TweetScraper 10 https://www.crowdflower.com 64 Automatic 7,461 n = 103 positive neutral negative 4,293 positive 2 4 6 Human neutral 3 21 4 1,396 negative 1 17 45 positive neutral negative (a) Assignments per class (b) Confusion matrix Figure 4: Automatic Sentiment Analysis 4.3 Automatic Sentiment Analysis To annotate our entire collection of tweets (see Section 4.1), we leveraged Baziotis et al.’s (2017) automatic sentiment classifier.11 Their system scored first in the SemEval 2017 shared task on classifying the overall sentiment of a tweet (task 4, subtask A), i. e., deciding whether it expresses positive, neutral or negative sentiment (see Rosenthal et al., 2017). It uses a deep LSTM network (Hochreiter and Schmidhuber, 1997) with attention (Rocktäschel et al., 2015) and is completely data-driven: rather than relying on linguistic information and hand-crafted rules, the system learns to classify from large collections of manually annotated tweets. We trained the system on the original SemEval data, meaning it is not specifically geared to tweets on machine translation. Using the 103 tweets that both of our annotators labelled with the same class as a reference (see Section 4.2), it classifies 68 tweets correctly. This corresponds to an overall classification accuracy of 66.0 %. In terms of average recall per class – the primary evaluation metric used in SemEval – its performance in our domain (66.0 %) is similar to the performance achieved in the shared task with varied topics (68.1 %; see Rosenthal et al., 2017). However, precision (33.3 %) and recall (16.7 %) are low for positive tweets. The system performs better with neutral (precision: 50.0 %, recall: 75.0 %) and negative tweets (precision: 81.8 %, recall: 71.4 %), with a tendency to classify negative tweets as neutral (Figure 4b). Overall, the classifier labels 1,396 tweets as positive (10.6 %), 7,461 as neutral (56.7 %), and 4,293 as negative (32.6 %). Note that in contrast to the subset used for human annotation (see Section 4.2), this includes tweets not comprising the terms ‘human’, ‘professional’, or ‘translator’. 5 Findings and Discussion Our study provides evidence that MT is often portrayed negatively among translators on social media outlets. The suspicions about a negative attitude towards MT that stemmed from our qualitative analysis of Facebook and LinkedIn posts (Section 3) were supported by the results of the sentiment analysis carried out on Twitter data (Section 4). 5.1 Recurrent Topics Our exploration of Facebook and LinkedIn data (see Section 3) sheds light on recurrent MT-related topics in social media. Firstly, we observed frequent reiteration of how professional 11 source code available at https://github.com/cbaziotis/datastories-semeval2017-task4 65 translators are and will still be needed as MT improves, as shown by the example is provided in Figure 2b. Secondly, translators doubt if MT improves at all, for example, by calling into question the methodology and/or veracity of evaluation campaigns. Referring to a study on productivity with a web-based translation workbench, a Facebook user says ‘if only your productivity estimate was correct! If I actually could do 2k words/hour while watching esports [sic], I’d actually take on all those bottom feeders and still make good bank!’ Thirdly, many posts merely criticise MT for bad quality. Translators spend considerable time and effort on discussing MT errors, but we were surprised to find little variance in the discussions and errors reported. In one instance, a translator even made up a meaningless sentence in Japanese and mocks Google Translate for producing meaningless output, conceivably because of its sexual connotation (see Figure 5). 5.2 Sentiment Towards MT In analysing 150 tweets relating MT to human translation, two independent judges found negative tweets to be most common, outnumbering positive and neutral ones by a ratio of 5:1 and 2:1, respectively. However, sentiment classification in tweets is not trivial: human judgements do not overlap in a third of all cases, resulting in 47 ties. As shown in Table 1c, there are even tweets classified as positive by one annotator and negative by the other. Even if other studies on human sentiment analysis in tweets report similar inter-annotator agreement (e. g., Cieliebak et al., 2017), a negotiation phase following the independent procedure of annotation could have resolved some of the disagreement. Moreover, sampling tweets based on the presence of keywords – ‘human’, ‘professional’, or ‘translator’ – is somewhat arbitrary. Still, we found this heuristic useful to get a sense of how Twitter users contrast MT with human translation (see the examples in Table 1). Without this restriction, neutral tweets on MT are most common in our collection. News and product announcements, such as ‘Facebook posts its fast and accurate ConvNet models for machine translation on GitHub’, often fall into this category. But even so, there are three times more negative than positive tweets in the 13,150 examples we collected, hinting at the predominance of negative perceptions about MT in general. The caveat here is that sentiment was determined by means of an automatic classifier. The classifier did not have access to contents such as websites and images linked in tweets, which human annotators were explicitly asked to consider when making their judgement. It also was not geared to tweets on MT specifically; while the system we leveraged would have allowed for topic-based classification (see Baziotis et al., 2017), we lacked appropriate amounts of training data. Despite these limitations, the system reproduced human judgements with an accuracy of 66.0 % overall. This corresponds to state-of-the-art results (see Rosenthal et al., 2017), and is similar to the degree of disagreement between human annotators (see above). This is good enough to get a sense of the class distribution in our data, even if the classifier does make mistakes (e. g., Table 1b). A clear advantage is speed: the 13,150 tweets are labelled in seconds. Eliciting human annotations would have taken a lot longer and would have been expensive at this scale. 5.3 A Case for Collaboration Even after its emergence as a profession in the last century, translation still struggles with recognition and undervaluation (see Tyulenev, 2015; Flanagan, 2016). Apart from the general situation of the profession, translators also feel, and indeed in many cases are, left out in the processes towards the development of translation technologies (see Doherty and Kenny, 66 Text Automatic Human HumanA HumanB (a) Six reasons why machine translation can negative negative negative negative never replace good human translation: https://t.co/JzLYbXO6yJ #xl8 #t9n (b) When you solely rely on machine neutral negative negative negative translation... via @inspirobot #wedoitthehumanway #htt https://t.co/UpfnVd4k8W (c) High-quality machine translation is negative tie positive negative threatening to make translators ‘the coffee-bean pickers of the future’ https://t.co/n8fGvIHBao (d) Difference between professional translation neutral neutral neutral neutral and machine translation by @ChrisDurbanFR #xl8 #ITIConf17 https://t.co/gFhgRrLtJq (e) Pretty incredible. For a few languages, positive positive positive positive machine translation is near equal to human translation. https://t.co/GsCejE0cUW Table 1: Example tweets. Sentiment was assessed by two human annotators as well as an automatic sentiment classifier. 2014; Cadwell et al., 2017). They resist technology because they feel they need to protect their profession and resort to the defence of quality as the main argument for their cause. This behaviour is neither a new strategy, nor something restrictive of professional translators. As Pym (2011) puts it: ‘Resistance to technological change is usually a defense of old accrued power, dressed in the guise of quality.’ However, it is unlikely that the technological development will stop. Together with its quality, the use of MT has increased significantly in the last decades. Research has also provided evidence of increased productivity through post-editing of MT, and companies are moving more and more towards a context in which this practice is the norm rather than the exception (e. g., Green et al., 2013; Koponen, 2016). While it can be assumed that translation will continue being an activity with human involvement, it will (continue to) involve various degrees of automation as translation technologies evolve. We believe that translators should be actively engaged in these developments, and that their actions on social media could help inform and support research on translation technology. In the following section, we propose a set of recommendations aimed at fostering collaboration and promoting common goals among researchers and professional translators. 6 Recommendations As hilarious as MT errors can be, laughing about them does neither improve translators’ lives nor the technology. The study we present in this paper fills a gap in the exploration and quantification of translators’ perceptions as it brings social media into the picture. Our findings imply that translators and researchers have different understandings of the functioning and purposes of MT, but at the same time show that translators are aware of the types of issues that are problematic for it. Considering our findings, we believe that professional translators could and should have more influence on future developments in MT and translation technology in general, and propose three initial recommendations to bridge this gap: 67 Figure 5: Translation of a meaningless text in Japanese into English by Google Translate. The translator posts the screenshot because Google Translate’s output has a sexual connotation; he uses this argument as proof that machines will never replace human translators. (i) Identify and report patterns rather than isolated errors. State-of-the-art MT systems are not based on rules, so complaining about specific words that are mistranslated does not help much. Reporting error patterns, in contrast, may help gearing systems to new objectives, such as ensuring noun phrase agreement or controlling for negation particles (see also Sennrich, 2017). Professional translators have the knowledge and expertise to identify these patterns and, given the appropriate tools, pattern reporting could easily be integrated into their regular workflows. (ii) Participate in evaluation campaigns. Our study has shown that criticising findings of MT quality or productivity as being unrealistic is a recurrent theme on social media. The MT research community puts a lot of effort into evaluation campaigns. At the annual Workshop on Machine Translation (WMT), for example, research teams from all over the world compete for the best translation system in several language pairs. Human judgements are the primary evaluation metric, but rather than from translators, they stem from ‘researchers who contributed evaluations proportional to the number of [MT systems] they entered’ into the competition (Bojar et al., 2017) – although blindly, the MT researchers evaluate their own systems. We believe that the involvement of professional translators in scientific evaluation campaigns would not only improve the quality and credibility of their outcome, but also improve the translators’ understanding of – and impact on – the methodologies used to evaluate MT systems. (iii) Engage in cross-disciplinary discourse. We need to talk to each other. The issues presented above show that professional translators and researchers hold different positions, not due to a lack of information or skills, but rather poor communication. Translators make good points about the shortcomings of MT, but they comment on MT issues primarily among themselves. The outcomes of academic research, on the other hand, are poorly communicated to translators and seemingly give them the impression that MT has little to offer to improve their conditions. For translators to have an active role in the development of the technologies they (have to) use, it is necessary for both sides, professional translators and researchers, to meet halfway and cooperate. Common rather than separate spaces on social media may be a starting point. 68 References Baziotis, Christos, Nikos Pelekis, and Christos Doulkeridis. 2017. Datastories at SemEval-2017 task 4: Deep LSTM with attention for message-level and topic-based sentiment analysis. In Proceedings of the 11th International Workshop on Semantic Evaluation. Vancouver, Canada, pages 747–754. Bojar, Ondřej, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Shujian Huang, Matthias Huck, Philipp Koehn, Qun Liu, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Raphael Rubino, Lucia Specia, and Marco Turchi. 2017. Findings of the 2017 conference on machine translation (WMT17). In Proceedings of the Second Conference on Machine Translation, Volume 2: Shared Task Papers. Copenhagen, Denmark, pages 169–214. Cadwell, Patrick, Sharon O’Brien, and Carlos S. C. Teixeira. 2017. Resistance and accommodation: factors for the (non-) adoption of machine translation among professional translators. To appear in Perspectives, available online at http://www.tandfonline.com/doi/full/10.1080/0907676X.2017.1337210. Cieliebak, Mark, Jan Milan Deriu, Dominic Egger, and Fatih Uzdilli. 2017. A Twitter corpus and benchmark resources for German sentiment analysis. In Proceedings of the 5th International Workshop on Natural Language Processing for Social Media. Valencia, Spain, pages 45–51. Desjardins, Renée. 2016. Translation and Social Media: In Theory, in Training and in Professional Practice. Palgrave Macmillan. Doherty, Stephen and Dorothy Kenny. 2014. The design and evaluation of a statistical machine translation syllabus for translation students. The Interpreter and Translator Trainer 8(2):295–315. Dombek, Magdalena. 2014. A study into the motivations of internet users contributing to translation crowdsourcing: the case of Polish Facebook user-translators. Ph.D. thesis, Dublin City University. Flanagan, Marian. 2016. Cause for concern? Attitudes towards translation crowdsourcing in professional translators’ blogs. The Journal of Specialised Translation 25:149–173. Gaspari, Federico, Hala Almaghout, and Stephen Doherty. 2015. A survey of machine translation competences: insights for translation technology educators and practitioners. Perspectives 23(3):333–358. Green, Spence, Jeffrey Heer, and Christopher D. Manning. 2013. The efficacy of human post-editing for language translation. In Proceedings of the 2013 Conference on Human Factors in Computing Systems (CHI). Paris, France. Guerberof Arenas, Ana. 2013. What do professional translators think about post-editing? The Journal of Specialised Translation 19:75–95. Heard, Reiner. 2017. FIT position paper on machine translation. Babel 63(1):130–135. Hochreiter, Sepp and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9(8):1735–1780. Jiménez-Crespo, Miguel A. 2017. Crowdsourcing and online collaborative translations: Expanding the limits of Translation Studies, volume 131 of Benjamins Translation Library. John Benjamins. Koponen, Maarit. 2016. Is machine translation post-editing worth the effort? A survey of research into post-editing and effort. The Journal of Specialised Translation 25:131–148. McDonough Dolmaya, Julie. 2011. A window into the profession: What translation blogs have to offer translation studies. The Translator 17(1):77–104. Meijer, Siety. 1993. Attitudes towards machine translation. Language International 5(6):11–13. O’Hagan, Minako. 2017. Deconstructing translation crowdsourcing with the case of a Facebook initiative. In Dorothy Kenny, editor, Human Issues in Translation Technology, Routledge, pages 25–44. Pym, Anthony. 2011. What technology does to translating. Translation & Interpreting 3(1):1. Rocktäschel, Tim, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiský, and Phil Blunsom. 2015. Reasoning about entailment with neural attention. arXiv preprint arXiv:1509.06664. Rosenthal, Sara, Noura Farra, and Preslav Nakov. 2017. SemEval-2017 task 4: sentiment analysis in Twitter. In Proceedings of the 11th International Workshop on Semantic Evaluation. Vancouver, Canada, pages 502–518. Sennrich, Rico. 2017. How grammatical is character-level neural machine translation? Assessing MT quality with contrastive translation pairs. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL). Valencia, Spain, pages 376–382. Tyulenev, Sergey. 2015. Towards theorising translation as an occupation. Asia Pacific Translation and Intercultural Studies 2(1):15–29. 69 On the Need of New Tools for "Translating Writers" in Industry Claire Lemaire Christian Boitet Laboratoire d’Informatique de Laboratoire d’Informatique de Grenoble Grenoble [email protected] [email protected] Abstract Working in the context of French and German companies, we discovered the emergence of a new situation of bilingual writing, where French or German technical writers writing in their (source) language (SL) are asked to produce a parallel version of their document in English (the TL), often for delocalization purposes. These technical translating writers cannot benefit from available tools such as MT+PE (post-editing environment) or TM-based translator aids to produce good enough translations. But, not only in IT, badly translated requirements and specifications lead to the development of totally inadequate products. We propose a scenario using existing free tools such as MT, sub-sentential aligners, company-specific bilingual terminology, SL and TL correctors, and integrating iterations of (re)writing their SL text, MT-translating it, correcting it somewhat, and translating it back from TL to SL. We then outline a more futuristic approach, relying on a multiple SL analyzer, an interactive disambiguator, the production of a “self-explaining” document (SED), and the subsequent automatic generation of a high-quality TL document in SED format. In short, the aim would be to build a true bilingual writing tool for technical translating writers. 1 Introduction Working in the context of some French and one German company, the first author discovered the emergence of a new situation of bilingual writing, where French or German technical writers writing in French or German, their native tongue (the source language, or SL) are asked to produce a parallel version of their documents in English (the target language, or TL), often for delocalization purposes (Lemaire, 2016). These technical translating writers know some English (say, to a B1 level) and have often access to the specific bilingual terminology used in their technical context. As their management ignores the requirements for producing good translations, they get no support, and they end up using free tools like Google Translate or Bing (Lemaire, 2017). As no revision (and even no proof-reading) is done on the results, the quality is very bad, with sometimes disastrous consequences. In fact, if their company does not want to pay professional translators to do a decent job, it seems there is nothing they can do to solve this problem. • They can’t buy a cheap license for a “good enough” MT system and let the results be post-edited by the technical translating writers: (1) specialized MT systems may be very good, but are somewhat expensive, as they must be built from good translation memories (TMs) and specific bilingual term banks (TBs), and (2) in any case, post- editing into English to get high quality translations can only be performed by native speakers of English knowing the domain well. • They also can’t train their technical translating writers to use TM-based like SDL- Trados, although some recent ones are free (OmegaT1, SmartCAT2, Poedit3, MateCat4 1 http://sourceforge.net/projects/omegat 2 https://www.smartcat.ai/ 3 https://poedit.net/ 4 https://www.matecat.com/ 70 Proceedings of the 39th Conference Translating and the Computer, pages 70–75, London, UK, November 16-17, 2017. 2017c AsLing and many others5), because they are all tailored to professional translators translating into their native tongue. We think that, despite that apparent impossibility, it should be possible to help technical translating writers produce translations of reasonable quality using only existing free tools such as MT, sub-sentential aligners, company-specific bilingual terminology, and SL and TL correctors. What would change is the scenario of the production of both the SL and TL documents. Instead of SL document writing MT TL (LQ = low quality) document, we would introduce a loop of the form, at the (n+1)-th iteration: SL_doc1n+1 (re)writing MT TL_doc1n+1 checking TL_doc2n+1 MT SL_doc2n+1 . Note that this approach would not be usable in a classical translation context, because translators must start from the SL text as it is. They can correct typos in passing, but they are not allowed and even less asked to modify it. But our technical translating writers are the authors and can therefore write and rewrite until the translation seems them (aided by the correctors, the aligner and the reverse MT) to be grammatically and terminologically correct. This first approach will be detailed in section 3. The second approach we propose is much more futuristic, although it builds on ideas that have been successfully prototyped in the past, in particular at IBM-Japan (JETS system). It relies on the (demonstrated) possibility to build an interactive SL disambiguator coupled with an “all-path” parser and a bilingual dictionary aligning SL and TL lexemes and word senses. In line with the “semantic Web”, it also introduces the idea to add to a SL document annotations contained in a companion document and comprising everything that is needed to show the ambiguities (relative to the SL TL pair or the SL TL1/TL2…/TLn pairs), how they have been solved. It has been shown (back in 1994!) 6 than a SL SED text can be translated totally automatically in a corresponding TL SED text. In short, the aim of that more futuristic approach is to build a true bilingual writing tool for technical translating writers. 2 More on the business situation Many companies need to produce enormous quantities of documents in many languages with a very high quality. Also, the terminology of software products is specific to the company. When Bull sold IBM AS-4000 workstations under AIX 7 in OEM, it translated the AIX documentation in its own “Bull-AIX-French” and did not use the “IBM-AIX-French” existing translations. Almost all companies outsource translations to translation agencies or to freelancers. A big problem is that the cost is high (counting everything, about 0.15€/word for en fr or fr en, often more for more distant pairs, for smaller markets). Another is that the number of target languages has increased and is increasing. Commercial companies like Microsoft, IBM or Adobe translate their products (external documentation, on-line help, interface elements like button and window labels, menu items, system messages) in 40 to 60 languages8. For that, they use professional translators and propose or require them to use specific tools and resources, like the TM™ tool, a MT system, and, for each translation job, a kit containing a specific bilingual terminology, and a document-dependent TM (extracted from the enormous main TM, and more practical and useful on a PC). The annual size of translated 5 http://termcoord.eu/2016/06/139-free-tools-suggested-by-professional-translators/ 6 by Boitet & Blanchon at MT25YON, at Cranfield, in 1994. 7 IBM proprietary version of Unix. 8 https://console.bluemix.net/docs/services/language-translator/index.html#supported-languages lists 62 languages. Mozilla, a non-profit open source collaborative project, localizes its tools to at least 116 languages (see https://addons.mozilla.org/fr/firefox/language-tools/). Office 2016 has 39 language packs (see https://www.itechtics.com/download-free-office-2016-language-packs-languages/). 71 documents is often over 20M words per year (10 years of EuroParl!). That is the same for service companies like SAP. It is then understandable that these companies try to diminish the cost of translation for the “grey” (internal) part of what they have to translate. That is why, for example, SAP stopped doing the translation of requirements and specifications documents in a professional way and asked their writers to become “technical translating writers”. In one case we learned about (not at SAP), raw MT translations of functional specifications were sent to a development team based in India. The French client complained that the product did not meet its specifications, and menaced to sue the company, who then had to send an experienced engineer-developer on site for 3 months to develop a correct product. After all, bad translation can end up costing much more than professional translation! One should remark here that this change of practice may well have been caused by the profusion of loud claims made by MT developers concerning the increase in quality of MT systems, to the point that some are claiming that NMT (neural MT) systems are now as good or even better than professional translators. That is in general utterly false, and can be true only in the case of MT systems (following whatever paradigm, expert or empirical) specialized to a small enough sublanguage, such as the METEO system for weather bulletins9 or the ALTFLASH system for Nikkei flash reports. Returning to the situation in companies wanting to turn their technical writers into technical translating writers, which would be the (sole) users of an environment meant to help them: • The technical writers know the terminology very well, in both their language and English. • There may be some native speakers of English in the company, but probably none or very few doing the same job of technical writing — and then, they might or might not have to produce a version in the “local” language (in our cases, French or German). • The texts concerned are IT requirements and functional specifications, that is, exclusively technical translations. • The writers are not “recognized” for producing a parallel version in English: they get no special financial incentive, no feedback from anybody in the company, and usually no feedback either from the delocalized development team, whose members, not native speakers of English, are often not competent enough in English to be sure that the purported “translation” is erroneous or even outright meaningless, so that they try to guess a meaning that could “make sense” in the context — and of course often fail. 3 Approach 1: integrate existing free tools in a new scenario using “rewriting” In this section, we would like to give some details on our first proposed approach, and show that it should indeed be possible to help technical translating writers produce translations of reasonable quality using only existing free tools such as MT, sub-sentential aligners, company-specific bilingual terminology, and SL and TL correctors. The scenario, introduced in 1 above, is to induce the technical translating writer to (1) correct what s/he can in the MT results, namely the terminology, and (2) rewrite her/his SL text so that the MT-translated version improves. In this scenario, the technical translating writer produces both the SL document and the corresponding TL document in an iterative way. The (n+1)-th iteration would be of the form: SL_doc1n+1 (re)writing MT TL_doc1n+1 checking TL_doc2n+1 MT SL_doc2n+1 . The first step, writing (if n=0) or rewriting (if n>0) contains the use of some classical tools, such as a spell-checker and a grammar checker. For the second step, MT, we would use whatever MT system the technical translating writer already uses. An improvement here could 9 And not for the whole domain of weather forecasting, that also contains situations and warnings. 72 be easily introduced. It would be to use several (2 or 3) free MT systems and to select the result having the best score according to some quality estimator (QE). Research on QE has already produced convincing results. Then, the MT result, TL_doc1n+1, would be checked in 2 ways. (1) A language checker would signal spelling and grammar errors. The user is supposed to have at least a B1 level in English, which is normally enough to understand the error, if any, and to accept or not the proposed correction. (2) An aligner such as Giza++ or Anymalign (Lardilleux, 2010) combined with the bilingual term bank would show the correspondences between segments, and in particular between source and target terms, colouring them (for example) in green if they are in the term bank, and in red if they are not. The resulting TL document, TL_doc2n+1, would then be “back-translated” into the SL, producing SL_doc2n+1. On that basis, the writer could perceive whether the translation still has problems or not. If yes, the writer would enter the next iteration. S/he would modify SL_doc1n+1 to produce SL_doc1n+2, the input to the (n+2)-th iteration. This new kind of help might be implemented as a web service, residing in a server of the company. It should probably allow the user to perform the iterations sentence by sentence, or paragraph by paragraph, or on the whole text. It would be interested to see which strategy would be preferred by the technical translating writers, and which would give the best results. 4 Approach 2: towards a true bilingual writing tool for technical translating writers Our second solution could be developed in 2-3 years in a particular context, then generalized. It builds on ideas successfully prototyped in the past, in particular at IBM-Japan (JETS system), and then by our LIDIA project (1990-1995), on which H. Blanchon did his PhD, and which was an essential part of the Eureka EuroLang project (Boitet & Blanchon, 1994). In short, the idea is to build an interactive SL disambiguator coupled with an “all-path” parser and a bilingual dictionary aligning SL and TL lexemes and word senses. Typically, after the writer has written a paragraph, s/he clicks on it to say that its segments (usually sentences) can be processed. The segments are sent to a web service that returns, for each segment, a factorizing “mmc-structure” containing all linguistically and especially lexically possible representations. In the example below, that structure is a tree containing 2 subtrees, one for each representation for the sentence: "Which author cites this speaker?". Figure 1 : Interactive functional (subject/object) disambiguation, Blanchon (1992) Another module then identifies all ambiguities and generates a “question tree”, according to a certain strategy, for example ordering them by type, or by cruciality. Both the mmc- 73 structure 10 and the question tree are then returned to the writing environment. A button appears next to the (SL) segment to signal to the writer that the system is ready to ask questions. It is very important here that the writer is not obliged to answer immediately, and can continue whatever task s/he is engaged in. A human should never be slave of a machine! If and when s/he feels like it, the user clicks on the dialogue button and answers the questions. Contrary to the LIDIA prototype and to all interactive translation systems we know of, this new system should allow the user to leave the disambiguation dialogue at any point, leaving to the following modules the task of handling the remaining ambiguities automatically, either by selecting for each a solution with the best score, or by producing a factorized output, like, in en-fr, “plante/usine/espion” (plant/factory/mole) for “plant”. At that point, the system will add to the SL segment annotations contained in a companion structure and comprising everything that is needed to show the ambiguities, and how they have been solved, that is, the mmc-structure, the question tree, and the answer to each question down the “disambiguating branch”. The resulting umc-structure would then be sent to a web service performing the remaining steps of translation, using a transfer or abstract pivot architecture. That is not important for our user. What is important is that, because the representation is unambiguous, a classical generation process, once debugged and tuned properly, will produce very HQ results. Nevertheless, ambiguities will almost certainly appear in translations. To eliminate them, the idea is to parse the TL text with an all-path parser built as the inverse of the generator, giving rise to a mmc-structure that contains the umc-structure produced as an intermediate step during generation. It will then be possible to run the interactive disambiguator of the TL automatically: a program will replace the human, and, for each question, select the answer that itself selects the subset of the current set of structures that contains the goal (the starting umc-structure). Hence, the TL segment will be representable in a SED format. That approach would certainly enable technical translating writers to produce very precise, grammatical, and semantically exact English versions. Nevertheless, we should keep the possibility of correction in the TL, for the terminological part, and leave open the possibility of rewriting the SL text, at least for an interesting reason: it sometimes happens that some information from the context is not explicit in the SL text, but should imperatively be explicit in the TL text. That situation is very common when translating from Japanese into English or French or German, but it also happens between near languages, such as English and French: "he was" "il avait 1 an" / "il faisait 1 m" "he was 1 year old" / "he was 1 meter tall". In such cases, the solution would probably be to rephrase the SL text in a more explicit way. 5 Conclusion & perspective We are embarking on an internal project to implement and evaluate our first approach, and are looking for a company that would like to experiment it with us. Concerning the second approach, it is a longer-term project, which we have begun to work on with CS (Communications and Systems) in the framework of a project preparation. Here, the domain would be the writing of system requirements, representing them after disambiguation as UNL11 graphs, then as UML graphs, and further as logical expressions in the specific domain ontology (Sérasset & Boitet, 2000). Starting from any of the last 3 forms, one would be able to generate the requirements in several languages and forms, in particular SED forms and controlled language forms. 10 mmc: multiple, multilevel and concrete; umc: unique, multilevel, concrete; uma: unique, multilevel, abstract. 11 UNL (Universal Networking Language) is a language of « anglo-semantic » hypergraphs able to represent any utterance in any natural language. Arcs bear semantic relations and nodes bear interlingual lexemes (UWs) and semantic features. See http://undl.org. 74 Acknowledgments Our first thanks go to the ANRT (Agence Nationale de la Recherche Technologique) and to the L&M (Lingua et Machina) company, that have supported the first author for 3 years. We are also very grateful to the firms that have allowed us to look into their translation practice: SAP, Vicat, and EDF while we worked with L&M. References Blanchon, Hervé. 2004. Comment définir, mesurer et améliorer la qualité, l'utilisabilité et l'utilité des systèmes de TAO de l'écrit et de l'oral. Une bataille contre le bruit, l'ambiguïté, et le manque de contexte. Mémoire de HDR, 380 p., Université Joseph Fourier, Grenoble. Blanchon, Hervé. 1992. A Solution to the Problem of Interactive Disambiguation. COLING 1992, 23–28 juillet. Nantes, France. 1233–1238. Blanchon, Hervé. 1994. LIDIA-1 : une première maquette vers la TA Interactive "pour tous". Thèse de doctorat, Université Joseph Fourier – Grenoble I, Grenoble, 1994. Blanchon, Hervé, and Boitet, Christian. 2007. Pour l'évaluation externe des systèmes de TA par des méthodes fondées sur la tâche. TAL, 2007: 33–65. Boitet, Christian. 1995. Factors for success (and failure) in Machine Translation–some lessons of the first 50 years of R&D. Proc. MTS-V (Fifth Machine Translation Summit), 11–13 juillet, Luxembourg. Boitet, Christian. 1976. Un essai de réponse à quelques questions théoriques et pratiques liées à la traduction automatique : définition d’un système prototype. Modélisation et simulation. Thèse de Doctorat d'État, Université Scientifique et Médicale de Grenoble, 250 p. Boitet, Christian, and Blanchon, Hervé. 1994a. Promesses et problèmes de la "TAO pour tous". Après LIDIA-1, une première maquette. Langages. Le traducteur et l'ordinateur, sous la direction de Jean-René Ladmiral, 1994a: 20–47. Boitet, Christian, and Blanchon, Hervé. 1994b. Multilingual Dialogue-Based MT for monolingual authors: the LIDIA project and a first mockup. Machine Translation 2, 99–132. Chan, Andy Lung Jan. 2010. Perceived benefits of translator certification to stakeholders in the translation profession: A survey of vendor managers. Across Languages and Cultures 11, n° 1, 93–113. Dam, Helle, and Zethsen, Karen. 2010. Translator status: Helpers and opponents in the ongoing battle of an emerging profession. Target 22, n° 2, 194–211. Huynh, Cong Phap, Valérie Bellynck, Christian Boitet, et Hong Thai Nguyen. 2010. The iMAG concept: multilingual access gateway to an elected Web sites with incremental quality increase through collaborative post-edition of MT pretranslations. TALN-2010, Montréal. Huynh, Cong-Phap. 2010. Des suites de test pour la TA à un système d'exploitation de corpus alignés de documents et métadocuments multilingues, multiannotés et multimédia. Thèse de doctorat, Université Joseph Fourier. Lardilleux, Adrien. 2010. Contribution des basses fréquences à l’alignement sous-phrastique multilingue : une approche diﬀérentielle. Thèse de doctorat, Université de Caen - Human-Computer Interaction. Lemaire, Claire. 2016. Linguistic methodology to help German and French non-translator users to write bilingual specifications. 38. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft. Universität Konstanz. Lemaire, Claire. 2017. Traductologie et traduction outillée : du traducteur spécialisé professionnel à l'expert métier en entreprise. Thèse de doctorat, Université Grenoble Alpes. Nguyen, Hong-Thai. 2009. Des systèmes de TA homogènes aux systèmes de TAO hétérogènes. Interface homme-machine. Thèse de doctorat, Université Joseph-Fourier – Grenoble I. Sérasset, Gilles & Boitet, Christian. 2000. On UNL as the future "html of the linguistic content" & the reuse of existing NLP components in UNL-related applications with the example of a UNL-French deconverter, Proc. COLING 2000, Saarbrücken, Germany, 7 p. Slocum, Jonathan. 1985. A survey of machine translation: its history, current status, and future prospects. Computational linguistics 11.1, 1–17. Vasconcellos, Muriel. 1995. Advanced software applications in Japan. Elsevier. Vasconcellos, Muriel. 1993. The Present State of Machine Translation Usage Technology; Or: How Do I Use Thee? Let Me Count the Ways. MT Summit IV, July 20–22. Kobe, Japan. 47–62. Zhang, Ying. 2016. Modèles et outils pour des bases lexicales ”métier” multilingues et contributives de grande taille, utilisables tant en traduction automatique et automatisée que pour des services dictionnairiques variés. Thèse de doctorat, Université Grenoble Alpes. 75 Designing a Multimethod Study on the Use of CAI Tools during Simultaneous Interpreting Bianca Prandi Johannes Gutenberg Universität Mainz/Germersheim [email protected] Abstract Even though studies on computer-assisted interpreting still represent a very small percentage in the body of research, the topic is starting to gain attention in the interpreting community. So far, only a handful of studies have focused on the use of CAI tools in the interpreting booth (Gacek, 2015; Biagini, 2015; Prandi, 2015a, 2015b). While they did shed some light on the usability and the reception of CAI tools as well as on the terminological quality of simultaneous interpreting performed with the support of such tools, these studies were only product-oriented. We still lack process-oriented, empirical research on computer-aided interpreting. A pilot study currently underway at the University of Mainz/Germersheim (Prandi, 2016, 2017) aims at bridging this gap by combining process- and product-oriented methods. After discussing the theoretical models adopted to date in CAI research, this paper will suggest how an adaptation of Seeber’s (2011) Cognitive Load Model can be better suited then Gile’s (1988, 1997, 1999) Effort Model to operationalize hypotheses on the use of CAI tools in the booth. The paper will then introduce the experimental design adopted in the study with a focus on the features of the texts used and on the rationale behind their creation. 1 Introduction Computer-assisted interpreting has not yet received the same amount of attention as computer-assisted translation, although the interest for new technological solutions that aim at supporting interpreters in their workflow seems to be increasing among practitioners and trainees alike. Apart from a few Master’s theses, whose scope is however limited (De Merulis, 2013; Biagini, 2015; Prandi, 2015a), only a few publications have addressed the topic, examining various aspects of the interpretation workflow (Xu, 2015; Fantinuoli, 2017a; Will, 2015), and even fewer of them have addressed their use in the interpreting booth (Gacek, 2015; Biagini, 2015; Prandi, 2015a, 2015b). Certainly, the scarcity of studies on computer-assisted interpreting does not help engaging practitioners in the discourse, nor does it dissipate the scepticism around the subject – a scepticism centred on the ability of CAI tools to provide interpreters with the support they need in retrieving terminology in the booth, without being too cumbersome to use and without taking up precious cognitive resources needed for the interpreting task itself. Apart from these initial studies, no extensive research can be identified on this subject. A doctoral research project at the University of Mainz/Germersheim aims at bridging this gap, by combining pupillometry, eye-tracking measures, target text analysis, and key-logging data to analyse how the use of CAI tools affects the interpreting process. The study thus brings together process- and product-oriented methods to describe the variations in local cognitive load during simultaneous interpreting (SI) performed with the support of a CAI tool. More specifically, it compares CAI tools with more traditional methods for the management of and the access to terminology, namely Word and Excel tables. The most recent inquiries in the terminology management habits of conference interpreters have shown that the technological support has made its way into the interpreting booth (Bilgen, 2009; Berber Irabien, 2010; Corpas Pastor and May Fern, 2016), and there’s reason to believe this trend will continue. Berber Irabien also found that terminology databases are the preferred method for accessing 76 Proceedings of the 39th Conference Translating and the Computer, pages 76–88, London, UK, November 16-17, 2017. 2017c AsLing terminology during simultaneous interpreting. Corpas Pastor and May Fern have also come to a similar conclusion, identifying bilingual dictionaries and personal glossaries as the preferred tool used by interpreters to search for terminology while interpreting. Our paper first presents the theoretical models most used to describe the cognitive processes involved in simultaneous interpreting (Section 2). In Section 3 we discuss how Seeber’s Cognitive Load Model can be integrated and expanded to represent simultaneous interpreting performed with electronic glossaries and to operationalize our hypotheses on cognitive load during SI with the support of CAI tools and electronic glossaries. The paper then briefly presents the methodology adopted so far in CAI research and the first findings (Section 4). Finally, section 5 outlines the structure of the pilot study aimed at testing the feasibility of the experimental design and describes the texts prepared for the experiment in detail. The methodology used represents the very first attempt at investigating computer- assisted simultaneous interpreting in a laboratory setting with a multimethod approach. 2 Cognitive models of simultaneous interpreting Simultaneous interpreting is one of the most complex activities the human brain can perform. It involves various tasks to be carried out almost simultaneously, with a high degree of potential interference. Listening to the words pronounced by the speaker, analysing the message he or she wants to convey, storing information in short-term memory, retrieving the linguistic and extra-linguistic information needed to deliver that message in a different language, uttering the message and monitoring one’s own delivery, while already listening and analysing the next portion of the text to be interpreted – all this demands a high amount of cognitive resources. There have been various attempts to provide a theoretical model for such a highly complex cognitive activity.1 For the purpose of this paper and given the scope of our research project, which seeks to verify how local cognitive load during SI is affected by the use of an electronic glossary2, we will briefly compare the two models which aim at describing how cognitive resources are allocated during simultaneous interpreting, namely the widely applied Effort Model (Gile, 1988, 1997, 1999) and the more recent Cognitive Load Model for Simultaneous Interpreting (Seeber, 2007, 2011; Seeber and Kerzel, 2011). 2.1 Gile’s Effort Model Gile’s model draws on a key concept of cognitive psychology (Shannon and Weaver, 1949): that non-automatic operations are managed by an inherently limited system. This happens in simultaneous interpreting, where the non-automatic processes of comprehension, memorization, and production require cognitive resources that are limited. Gile describes these processes as “efforts”, and distinguishes between the listening and analysis effort, the short-term memorization effort and the production effort. Interpreting results from an interaction of the three efforts and requires an additional coordination effort. In order for simultaneous interpreting to be successful, “the sum of capacities needed for the three efforts, plus coordination, must not exceed the total available capacity” and “none of the three efforts must use more than the specific capacity available to it” (Setton, 2003). The equation, however, does not remain constant throughout the interpreting task but varies according to the degree of difficulty of the portion of the text interpreted. Since, according to Gile, interpreters work close to saturation level most of the time, errors and omissions can be explained as a result of system saturation, or cognitive overload, which 1 For an extensive overview of the models of the interpreting process, see Setton (2003, 2013, 2016). 2 Be it a Word or Excel table, or a CAI tool. 77 occurs when “problem triggers” (Gile, 1999:157) require increased resources.3 Experts are more capable than novices of preventing this, but the risk is present most of the time. While Gile’s model can be praised for its simplicity (and is therefore widely used in interpreting didactics), it does not account for multi-tasking and time-sharing. After all, Gile’s model is based on Kahneman’s single resource theory (1973), which postulates a single pool of resources that can be shifted from one task to another. If this were the case, interpreters would not be able to process visual information while interpreting. More often than not, however, interpreters now deal with a multimodal presentation of information, both aural and visual, in the form of PowerPoint presentations and the like, which does not find an explanation in the effort model. This shortcoming has led interpreting research to look for alternative explanations of how cognitive resources are allocated during simultaneous interpreting. Seeber’s Cognitive Load Model represents a valuable step in this direction. 2.2 Seeber’s Cognitive Load Model in simultaneous interpreting Seeber’s Cognitive Load Model aims at describing the interaction between the cognitive sub- components of simultaneous interpreting. The model draws on Wickens’s (1984, 2002) Multiple Resource Theory and Cognitive Load Model. Wickens’s model is based on two main assumptions: that the interaction of two tasks requires more processing capacity than any individual task and that tasks which share the same processing resources interfere with each other more strongly. In his model, tasks do not draw resources from a single pool, but from discrete cognitive structures. What competes in cognitive tasks is, therefore, the resources that make those structures work, not the structures themselves. Thus two tasks that share the same resources are harder (or impossible) to perform simultaneously than two tasks that do not share resources. In a given task, processing can occur in two modalities, either through a visual or an auditory channel and is coded either spatially or verbally4. Processing occurs at three stages, namely perception, cognition and response. Perception and cognition share the same pool of resources. Wickens also “concedes the existence of a residual pool of general resources which, albeit not reflected in his model, is available to and demanded by all tasks, modalities, codes, and stages as required” (Seeber, 2007:1382). Seeber applies Wickens’s three-dimensional model to simultaneous interpreting by turning it into a 2D-model. Seeber’s adaptation of Wickens’s model can be described as a bird’s eye view of Wickens’s pyramidal model. The two-dimensional version has the advantage of showing all sides of the pyramid and of including the “general capacity” - not graphically represented in the original model – in the centre, at the “top” of the pyramid (see Figure 1). The model is integrated by a “conflict matrix”, also present in Wickens, which quantifies the degree of interference between the individual sub-tasks.5 The total interference score is calculated as the sum of the demand vectors for the three dimensions of each task and the conflict coefficients of the sub-tasks that share the same resources. Demand vectors are an indication of how much a certain task is dependent on a certain resource. While Wickens suggests that these values can vary between 0 (no dependence) and 3 (extreme dependence), Seeber indicates a demand vector of 1 for all resources involved in simultaneous interpreting. 3 See Gile’s “tightrope hypothesis” (1999). 4 In a subsequent update of his model, Wickens (2002) introduced a fourth dimension, namely a distinction in the visual channel between focal and ambient vision. 5 See Seeber (2007, 2011) and Seeber and Kerzel (2011) for a detailed description and discussion of the Cognitive Load Model applied to Simultaneous Interpreting. 78 An important difference between Gile’s EM and Seeber’s CLM lies in the fact that for Gile the interpreter works close to saturation limit most of the time, while Seeber’s CLM accounts for local variations in cognitive load at a microscopic level: the output in SI is the result of strategies aimed at managing the limits inherent to the task and at saving elaboration capacity. According to Seeber, even though interpreters might reach maximum cognitive load locally, most of the time they work well below that limit. Only Seeber’s CLM for SI is able to “account for the conflict potential posed by an overlap [of tasks] and the interference they cause” (Seeber, 2011:189). It describes local cognitive load “as a function of both input and output features” and in relation to the amount of parallel processing and the amount of time for which elements must be stored, providing a more detailed analysis of local cognitive load than Gile’s EM. Seeber’s model “illustrates how the overall cognitive demands are affected by the different combinations of sub-tasks” and “includes a first attempt at quantifying cognitive load, relying principally on Wickens’s demand vectors and conflict coefficients” (ibid.). For the above-mentioned reasons, Seeber’s CLM for SI is the model best suited to operationalize our hypotheses on simultaneous interpreting performed with the support of CAI tools or other terminology management solutions, albeit with some integrations. 3 Hypotheses on cognitive load during SI with CAI tools Seeber applies his Cognitive Load Model to shadowing, sight translation, and simultaneous interpreting, providing a “cognitive resource footprint” of the three activities (Seeber, 2007:1383). Figure 1: Cognitive resource footprint for simultaneous interpreting (Seeber 2007:1385) Figure 2: Conflict matrix for simultaneous interpreting (Seeber 2007:1385) 79 As illustrated by his model (Figures 1 and 2), simultaneous interpreting is the activity where the two tasks being performed simultaneously (listening and comprehension, on the one hand, production and monitoring on the other) share the highest amount of cognitive resources and therefore have the highest total interference score of 9.6 In simultaneous interpreting, the listening and comprehension task recruits auditory-verbal resources and cognitive-verbal resources at the perceptual and cognitive stage. The production and monitoring task also mobilises auditory-verbal and cognitive-verbal resources at the perceptual and cognitive stage but requires additional vocal-verbal resources in the response stage. What happens when the interpreter looks up terminology in the booth? Figure 3: Cognitive resource footprint for SI performed with a CAI tool or an electronic glossary The task of looking up terminology on a laptop in a CAI tool or in an electronic glossary requires the interpreter to draw on additional resources. In order to find the translation equivalent needed, the interpreter first has to type the term, or part of one, on a keyboard and, in some cases, press the enter button to start the query. While performing this activity, the interpreter recruits manual-spatial resources as a response to what he or she has heard. The interpreter then has to locate the term on the screen (recruiting visual-spatial resources) and to read the term, which mobilises visual-verbal resources at the perceptual and cognitive stage. Thus, simultaneous interpreting performed with the support of an electronic glossary has something in common with sight translation, which also recruits visual-verbal resources at the perception and cognition stage, but also involves a manual-spatial response and visual-spatial resources. This makes simultaneous interpreting performed with a CAI tool or an electronic glossary more “difficult” than simultaneous interpreting performed without, since more 6 For an explanation of how this score is calculated, see Seeber (2007:1384-1385). 80 resources are mobilised and, for some sub-tasks, shared, as illustrated in Figure 3. The total interference score is, therefore, higher for SI with a glossary (Figures 4 and 5) than for “traditional” SI. Total interference score = demand vectors + conflict coefficients = (1+1+1+1+1+1+1+1) + (0.4+0.5+0.6+0.7+0.8+0.7+0.7+0.8+0.2+0.4+0.4+0.6) Figure 4: Conflict matrix for SI with a CAI tool Given these theoretical assumptions, we expect cognitive load to be higher when the interpreter performs the search task, and to go back to a lower level after the search is completed. It can be hypothesised that querying the glossary might even lead to cognitive overload, resulting in processing issues, with loss of information, as well as difficulties in the production task, where fluency, coherence, and cohesion might be affected. One could, however, also argue that when the retrieval process is successful and fast enough, the production effort might even be lower than in simultaneous interpreting without any terminology database, as the interpreter would not need to retrieve the right term in his or her memory. Data analysis should help determine whether these conjectures are backed empirically. 81 Finally, we must not forget that the cognitive resource footprint of SI with a glossary cannot be applied to the whole interpreting task, but only to the moments when a search is performed, because the interpreter is not constantly looking up terms. When no term is looked up, Seeber’s CLM for SI applies. In his model, Seeber is describing an “ideal” case of simultaneous interpreting, where all resources are recruited with the same degree of intensity. If we want to reflect the level of “difficulty” of simultaneous interpreting tasks performed in different conditions (e.g. for speakers with a non-native accent), we can assign higher demand vectors to the resources recruited. The same can be done for terminology management solutions, which we expect to vary as for the degree of visual-spatial resources at the perception stage and manual-spatial resources at the response stage that the interpreter must recruit to successfully use the tool in question. We hypothesise that computer-assisted interpreting tools, like InterpretBank7, which we adopt Total interference score = demand vectors + conflict coefficients = (1+1+1+1+1+1+1+1+1+1) + (0.4+0.5+0.6+0.7+0.8+0.7+0.7+0.8+0.2+0.4+0.4+0.6) Figure 5: Conflict matrix for SI with an Excel glossary 7 http://www.interpretbank.com. For a detailed description of InterpretBank, see Fantinuoli (2009, 2012 and 2016). 82 in our study, might require a lower level of both visual-spatial and manual-spatial resources than traditional electronic glossaries. To put it more simply, we expect interpreters to have to perform fewer manual-spatial and visual-spatial sub-tasks when working with a CAI tool than when using a Word or Excel table (only typing in and visually locating the term needed vs. positioning the cursor in the search field, typing and pressing the enter button, scrolling up and/or down or pressing the “forward” button to locate the term in question, deleting the term before starting a new search). While still being higher than for “simple” simultaneous interpreting, the total interference score for SI performed while searching for terms in the glossary is expected to be lower for InterpretBank (= 14.8) than for traditional electronic glossaries, such as Excel tables (= 16.8).8 Moreover, the integration of speech recognition technology in a CAI tool such as InterpretBank (Fantinuoli, 2017b) would lower the amount of additional cognitive load even further, as no manual-spatial response would be needed. 4 Experimental designs adopted in CAI research to date and first findings The element of innovation in our study lies in the investigation of the process of simultaneous interpreting with CAI tools in addition to the product of such activity. To date, only a few studies were conducted on computer-assisted interpreting and only three put CAI tools to the test in the booth (Gacek, 2015; Biagini, 2015; Prandi, 2015a, 2015b). All three used InterpretBank as their CAI tool of choice. This is, after all, the only CAI tool with a feature developed specifically to facilitate terminology search in the booth.9 Gacek (2015) and Biagini (2015) investigated the use of InterpretBank in the booth with the aim of collecting, respectively, qualitative data and product-oriented data on its usability during simultaneous interpreting in comparison to paper glossaries. Both authors used questionnaires for data collection, while Biagini also carried out statistical analysis on the transcriptions of the test subjects’ interpretations. These studies confirmed that querying the glossary with InterpretBank is more effective than with a paper glossary and leads to higher terminological precision and fewer omissions. They also show that CAI tools can be integrated into the curriculum of trainee interpreters. Prandi (2015a, 2015b) focused on the perception of the CAI tool InterpretBank by trainee interpreters. She collected qualitative data with a questionnaire and transcribed the deliveries of the trainee interpreters involved in the study, to then analyse the quality of the terminology used. Both Prandi and Biagini analysed the LOG files automatically generated by the tool to verify how many and which terms had been looked up by the subjects. Prandi also used video recordings and analysed the notes taken by the students in the booth to collect information on their behaviour while interpreting with the support of InterpretBank. Data from Prandi’s study confirms that CAI tools can be integrated into interpreters’ training and also speaks in favour of the usability of InterpretBank. A common feature of the studies conducted on CAI tools is that they focus on the product of simultaneous interpreting in order to establish whether the use of such tools improves the terminological quality of the rendition. On its own, however, this method does not tell us if such improvement is only due to having access to the terminology or whether it is also linked to local variations in cognitive load. In addition, no study has compared a CAI tool to traditional electronic glossaries, but only to paper glossaries. 8 Microsoft Word has the option to show all results of a query in a column on the left of the document. The visual-spatial resources needed to locate the term needed are therefore expected to be lower than for Excel (demand vector = 1), leading to a total interference score of 15.8. 9 For a description and a comparison of CAI tools, see Costa et al. (2014a, 2014b) and Rütten (2015). 83 No process-oriented study on the impact of CAI tools on the cognitive processes involved in simultaneous interpreting can be identified in the body of research at present. The above- mentioned studies do suggest that using a CAI tool during simultaneous interpreting adds a certain amount of cognitive load to the interpreting task, which is supported by Seeber’s theoretical model, but do not go as far as to investigate this hypothesis empirically. Process- oriented research on CAI tools is therefore needed to verify whether this assumption is true. 5 A proposal for an experimental design: a multimethod approach In order to test our hypotheses (see Section 3) and to gain as complete a picture as possible of simultaneous interpreting performed with the support of CAI tools in comparison to traditional electronic glossaries, we drew on the methods adopted in the studies discussed above to develop an experimental design that combines process- and product-oriented research methods. This way, we hope to get the best of both approaches. Three are the main objectives of our study: 1. To establish whether the use of CAI tools leads to better terminological quality in the interpreter’s rendition than when a traditional electronic glossary in the form of a Word or an Excel table is used during simultaneous interpreting. 2. To determine whether a terminology search conducted while interpreting simultaneously leads to a smaller variation in cognitive load when performed in a CAI tool then in a Word or an Excel table and whether significant differences in local cognitive load can be identified for Word vs. Excel glossaries. 3. To acquire data on the usability of the three terminology management solutions adopted in the study. The following sections will introduce the structure of the pilot study. Since the experimental design is still under testing, for the scope of this paper we will focus on the description of the materials used for data collection. 5.1 Structure of the research project The pilot study involved a sample of 6 trainee interpreters attending the 4th semester of the Master’s degree course in Conference Interpreting at the University of Mainz/Germersheim. The sample was made up of 3 Italian natives and 3 German natives. All test subjects had English in their language combination, so English was chosen as the language they interpreted from. We chose to include two language combinations to verify whether this had an impact on the variations in cognitive load experienced during simultaneous interpreting and on the strategies used. A preliminary meeting was organised to present the project to the students and to introduce them to the CAI tool InterpretBank. The search functions in InterpretBank, as well as in Word and Excel, were the focus of the presentation. The test subjects then attended a total of 5 meetings, during which they practiced simultaneous interpreting with InterpretBank, Word glossaries and Excel glossaries from English into their mother tongue. The speeches used in the training phase were selected or prepared ad hoc drawing from a previous study (Prandi, 2015a, 2015b). We also prepared the glossaries, so all students worked with the same resources. The subjects of the speeches used in the training phase10 were different from the topic of the speeches used during data collection (renewable energy). This was done in order to promote the use of the glossary in the experimental setting. 10 Medicine and biology. 84 During the last meeting, the students were asked to take a short test to verify their level of proficiency in using the three tools for terminology search in the booth. All students passed the test, so technical ability should not be a decisive factor to consider during data analysis. Data collection was conducted in the TRA&CO Centre, the neurolinguistic laboratory of the University of Mainz/Germersheim dedicated to translation studies.11 A short briefing aimed at describing the structure of the experiment and at providing the test subjects with the necessary instructions preceded the experiment itself. The students were asked to imagine that they were interpreting at a conference on renewable energies and that they had been given a glossary to use if they needed to. We made clear that there was no obligation to use the glossary, but that they should consider it for what it is – an element of support. This was done to ensure a natural approach in the use of the glossary so that their behaviour in the booth would be more representative of a real interpreting situation. We decided not to provide the students with the glossary prior to data collection to make sure there was a sufficient amount of terms they might need to look up. 5.2 Features of the texts used in the pilot study Investigating the interpreting process in a laboratory setting poses a few issues. First of all, in order to analyse variations in local cognitive load, stimuli should be presented as precisely as possible. Asking test subjects to interpret single words, however, would not have been realistic, as the time constraint typical of simultaneous interpreting would have been lacking. The simultaneity of cognitive tasks should be maintained in the experimental setting since it makes limits emerge which cannot be identified in the single cognitive processes involved in simultaneous interpreting. Nevertheless, working at speech level makes it difficult to correlate stimuli and responses. Drawing on Seeber and Kerzel’s method (2011), we therefore created speeches with a fixed internal structure that allows us to focus on the sentence level without sacrificing ecological validity completely, as test subjects still have to interpret whole speeches and not single words or single sentences. Each text contains 36 terms, each one embedded in a “target” sentence. Every target sentence is preceded and followed by a sentence which does not contain technical terms, but which is used to provide context. The sentence following the target sentence can be analysed to verify whether searching for a term produces a trickle-down effect, leading to omissions and other issues. This structure is repeated throughout the text in each speech. Of the 36 terms, 18 are placed at the end of the sentence, while the other 18 are neither at the very beginning nor at the very end, more or less in the middle of the sentence. This was done to verify whether anticipation occurs for the terms at the end of the sentence and which impact it has on search behaviour and cognitive load. It can be hypothesised that when anticipation is possible, a query is performed in advance to prepare for the translation of the technical term. If the anticipation is wrong, however, interpreters probably need to perform a second query soon after the first, which might potentially lead to cognitive overload. It would also be interesting to verify whether the test subjects adopt specific strategies for the single tools. In every text, there are 12 unigrams (simple terms), 12 bigrams and 12 trigrams (complex terms). We suspect the complexity of the terms might also influence search behaviour and cognitive load, which is why we included this variable as well. Half of the terms in each text should require a search in the glossary. This is based on the fact that they are highly technical and do not belong to the 10,000 most common English 11 For further information on the TRA&Co Center, visit http://traco.uni-mainz.de. 85 words.12 The other terms are deemed to be common knowledge for second-year interpreting students. Whether this a priori classification is actually reflected in the queries conducted by the students in the glossary will be verified through the pilot study. Specifically, of the 18 terms placed at the end of the sentence, 3 unigrams should require a search in the glossary and 3 should not. The same is true for bigrams and trigrams. The same ratio is also present in the 18 terms placed in the middle of the sentences. Here is an example sentence from speech 1: There are many different types of this clean fuel that can be used in transport. I’m not going to name them all, but one example is rapeseed methyl ester. I know the name may sound intimidating, but it’s actually just fuel. The target sentence contains a trigram placed at the end, which should require a search in the glossary. Each test subject was asked to interpret three short speeches, of a similar length and lexical density. Speech 1 was 12:40 minutes and 1533 words long, speech 2 12:17 for 1513, speech 3 12:20 for 1512. The average speed was 122.26 words per minute (wpm). We chose this speed, as we wanted the speeches to be fast enough to make terminology search challenging for the test subjects, but not too difficult. The speeches were read by a native speaker. Each test subject interpreted with the support of InterpretBank, of an Excel glossary and of a Word glossary. The glossary contains 421 terms, i.e. all technical terms present in the speeches, plus additional terms pertinent to the topic. The glossary was created with InterpretBank and then exported as an Excel file, which was then converted into a Word table. The glossary is tabular and presents a column with the terms in the source language and one with the equivalents in the target language (Italian or German). The test subjects only visualised the columns of their language combination, so the glossary was bilingual. We randomised the order of the speeches to be interpreted and of the tools to be used as a support, to verify whether the speeches created for the study could be considered comparable or whether adjustments should be made before the final experiment. The transcription of the test subjects’ renditions will be checked for the percentage of terms translated as per glossary, which will indicate whether the use of a CAI tool helps improve terminological precision. This will also allow us to verify whether looking up a term in the glossary is the preferred strategy when an unknown term is encountered, or whether other strategies, such as paraphrasing, generalisation, the use of hypernyms and hyponyms or omission, are preferred, and whether this varies according to the kind of tool used. We will then check the outcome of the strategies used, to determine whether they are functionally acceptable or whether they lead to misunderstandings, drops in register, semantic confusion etc. Moreover, we will check whether, in spite of an adequate translation of the terms, other issues in terms of cohesion and coherence arise. In order to better evaluate the strategies adopted by the interpreters and the usefulness of CAI tools, the stimuli will be assigned to sub-categories prior to the analysis of the deliveries: technical terms that are common knowledge (e.g. “renewables”), terms that can be easily translated from the English because they are similar in the language pair (e.g. “liquid biofuel” and “biocombustibile liquido”), terms which can be paraphrased and terms which cannot be paraphrased (e.g. “rapeseed methyl ester”), etc. The transcriptions will also be analysed for the number and length of pauses, which affect the fluency of the interpretation. In addition, we believe investigating when pauses occur will be quite telling of whether a terminology search can occur during simultaneous interpreting, or whether performing the two tasks simultaneously is not possible or at least not the 12 The frequency was checked in the 2015 news corpus, the 2012 web corpus (UK) and the 2016 Wikipedia corpus for the English language (Projekt Deutscher Wortschatz, http://wortschatz.uni-leipzig.de). 86 preferred method. Finally, the analysis of key-logging data and of the video recordings, and their triangulation with data from the transcription analysis will be useful not only to facilitate the analysis of pauses but also to help establish a picture of the test subjects’ search behaviour and to better interpret the strategies adopted. The final phase of data analysis will be focused on the triangulation of the product-related measures with data from pupillometry and eye tracking. This multimethod approach is still under development, but it has the potential to help gain new insight into how CAI tools affect the interpreting process, and the product of such process. 6 Conclusions The paper presented a multimethod pilot study currently underway at the University of Mainz/Germersheim as part of a PhD project aimed at investigating the effects of CAI tools on the cognitive processes of SI and on the terminological quality of the interpreters’ renditions. In particular, it described how Seeber’s Cognitive Load Model of simultaneous interpreting can be applied to CAI research adopting process- and product-oriented methods, and it suggested how it can be further expanded in the light of new developments in the field. As for the methodology used in the study, we described the features of the texts used for data collection, highlighting how their structure helps focus the analysis at sentence level without sacrificing ecological validity, while still adopting the systematic approach needed in a laboratory setting. Even though the experimental design is still under testing, we hope our methodology will help start a conversation not only on the object of our research but also on the methods to adopt to address research questions in this recent and complex field of interpreting studies. References Berber Irabien, Diana. 2010. Information and Communication Technologies in Conference Interpreting. Lambert Academic Publishing. Biagini, Giulio. 2015. Glossario cartaceo e glossario elettronico durante l'interpretazione simultanea: uno studio comparativo (MA thesis). Università di Trieste. Bilgen, Baris. 2009. Investigating Terminology Management for Conference Interpreters (MA thesis). University of Ottawa. Corpas Pastor, Gloria and Lily May Fern. 2016. A survey of interpreters’ needs and their practices related to language technology. Technical report, Universidad de Málaga. Costa, Hernani, Gloria Corpas Pastor, and Isabel Durán Muñoz. 2014a. A comparative user evaluation of terminology management tools for interpreters. In Proceedings of the 4th International Workshop on Computational Terminology, pages 68-76. Costa, Hernani, Gloria Corpas Pastor, and Isabel Durán Muñoz. 2014b. Technology-assisted interpreting. MultiLingual, 143, 25(3):27-32. De Merulis, Gianpiero. 2013. L’uso di InterpretBank per la preparazione di una conferenza sul trattamento delle acque reflue: glossario terminologico e contributo sperimentale (MA thesis). Università di Bologna. Fantinuoli, Claudio. 2009. InterpretBank: Ein Tool zum Wissensmanagement für Simultandolmetscher. In Wolfram Baur, Sylvia Kalina, Felix Mayer and Jutta Witzel (Eds.). Übersetzen in die Zukunft: Herausforderungen der Globalisierung für Dolmetscher und Übersetzer: Tagungsband der Internationalen Fachkonferenz des Bundesverbandes der Dolmetscher und Übersetzer e.V., BDÜ, Berlin, pages 411-417. Fantinuoli, Claudio. 2012. InterpretBank. Design and implementation of a terminology and knowledge management software for conference interpreters. Epubli/Johannes Gutenberg-Universität Mainz, Berlin. Fantinuoli, Claudio. 2016. InterpretBank. Redefining computer-assisted interpreting tools. In Proceedings of the Translating and the Computer 38 Conference in London. Editions Tradulex, Geneva, pages 42-52. Fantinuoli, Claudio. 2017a. Computer-assisted preparation in conference interpreting. In Translation & Interpreting, 9(2). Fantinuoli, Claudio. 2017b. Speech recognition in the interpreter workstation. In Proceedings of the Translating and the Computer 39 Conference in London, London. Gacek, Michael. 2015. Softwarelösungen für DolmetscherInnen (MA thesis). University of Vienna. 87 Gile, Daniel. 1988. Le partage de l'attention et le ‘modèle d'effort’ en interprétation simultanée. The Interpreters' Newsletter, 1:4-22. Gile, Daniel. 1997. Conference interpreting as a cognitive management problem. In Joseph H. Danks, Gregory M. Shreve, Stephen B. Fountain and Michael McBeath (Eds.). Cognitive Processes in Translation and Interpretation. Sage, London, pages 196-214. Gile, Daniel. 1999. Testing the Effort Models’ tightrope hypothesis in simultaneous interpreting - A contribution”. Hermes - Journal of Linguistics 23. Kahneman, Daniel. 1973. Attention and Effort. Prentice-Hall, Englewood Cliffs, NJ. Prandi, Bianca. 2015a. L'uso di InterpretBank nella didattica dell'interpretazione: uno studio esplorativo (MA thesis). Università di Bologna. Prandi, Bianca. 2015b. The use of CAI tools in interpreters’ training: a pilot study. In Proceedings of the Translating and the Computer 37 Conference, London. Prandi, Bianca. 2016. Analysis of the impact of CAI tools on simultaneous interpreting with a focus on cognitive processes and terminology consistency. Poster presented at TRA&CO Symposium, Johannes Gutenberg University Mainz/Germersheim. Prandi, Bianca. 2017. Investigating cognitive load in simultaneous interpreting with the support of terminology management tools. Poster presented at 5th Polish Eye Tracking Conference, Lublin. Rütten, Anja. 28 April 2015. Summary table of terminology tools for interpreters / Übersichtstabelle Terminologietools für Dolmetscher / cuadro sinóptico de programas de gestión de terminología para intérpretes. http://blog.sprachmanagement.net/?p=706 [last accessed September 24, 2017] Seeber, Kilian G. 2007. Thinking outside the cube: Modelling language processing tasks in a multiple resource paradigm. Interspeech 2007, Antwerp, pages 1382-1385. Seeber, Kilian G. 2011. Cognitive load in simultaneous interpreting: Existing theories - new models. Interpreting 13 (2):176-204. Seeber, Kilian G. and Dirk Kerzel. 2011. Cognitive load in simultaneous interpreting: Model meets data. Special issue of the International Journal of Bilingualism 16 (2):228-242. Setton, Robin. 2003. Models of the interpreting process. In Angela Collados Aís and José Antonio Sabio Panilla (Eds.). Avances en la investigación sobre la interpretación (Advances in research on interpreting), Editorial Comares, Granada, pages 29-91. Setton, Robin. 2013. Models of Interpreting. In Carol A. Chapelle (Ed.). The Encyclopaedia of Applied Linguistics. Wiley-Blackwell, Oxford, UK. Setton, Robin. 2016. Models. In Franz Pöchhacker (Ed.). Encyclopaedia of Interpreting Studies. Routledge, London, pages 263-268. Shannon, Claude E., and Warren Weaver. 1949. The Mathematical Theory of Communication. University of Illinois Press, Urbana. Wickens, Christopher D. 1984. Processing resources in attention. In Raja Parasuraman and David R. Davies (Eds.). Varieties of attention. Academic Press, New York, pages 63-102. Wickens, Christopher D. 2002. Multiple resources and performance prediction. Theoretical issues in ergonomics science, 3(2), pages 159-177. Will, Martin. 2015. Zur Eignung simultanfähiger Terminologiesysteme für das Konferenzdolmetschen. Trans- Kom, 8(1):179–201. Xu, Ran. 2015. Terminology Preparation for Simultaneous Interpreters (PhD thesis). University of Leeds. 88 Learning from Sparse Data - Meeting the Needs Big Data Can't Reach Jon D Riding Neil J Boulton United Bible Societies United Bible Societies Stonehill Green, Westlea, Stonehill Green, Westlea, Swindon SN5 7TJ Swindon SN5 7TJ [email protected] [email protected] Abstract The vast majority of mainstream MT systems have coalesced around two core technologies, Phrase- Based Statistical Machine Translation (PBSMT) and increasingly Neural Machine Translation (NMT). Both of these technologies have in common the need for very large training data sets. Such data is not available for low resource languages and this is where much of Bible translation takes place. This paper describes a new approach to harnessing the power of machines as Machine Assisted Translation (MAT) engines, supporting the translator in their work from the very start of a project at which point it is likely there is little or no bilingual corpus available. This requires systems with the ability to learn from very small amounts of data and gradually aggregate this knowledge until it is sufficient to support more traditional MT processes. A model for how this might be achieved is presented and the results of early experiments discussed. 1 Introduction Mainstream MT is largely focussed on synthesis. Systems are designed to translate, at least to first draft, before the human translator’s skills are invoked, typically in some form of post- editing. Historically MT systems might be categorised as belonging to one of two types: those which are fundamentally rule-based (RBMT) and those which are heuristic machines of one sort or another. This latter group including various forms of SMT, word or phrase based [Koehn et al, 2003], and more recently NMT systems. All share the characteristic of learning to translate from large example data sets. Of the two the SMT/NMT approach is probably most generally favoured as witnessed by the many implementations of systems based upon generic SMT engines such as Moses and THOT [Ortiz-Martinez & Casuberta, 2014], and the various NMT platforms developed by Google [Wu, 2016] et al.. RBMT continues to contribute not least in the context of hybrid approaches which seek to use the strengths of both RBMT and SMT/NMT approaches [Eisele et al, 2008 & Sanchez-Cartagena et al, 2016] but also in scenarios which are closely controlled and the supporting knowledge bases can be closely tailored to that context. State of the art SMT has coalesced around phrase-based systems. Both PBSMT/NMT systems have in common a voracious appetite for example data [Shterimov et al, 2017:4] and NMT in particular needs high quality training data to give best results [Nagle, 2017]. Training data sets are commonly measured in millions of documents and whilst NMT is perhaps slightly less hungry than PBSMT in this respect the reality is that a vast data set is needed to train the system. This is analogous to the vast number of exemplars absorbed by a human child as it begins to learn its mother tongue. The principal difference is is that rather than a broad set of exemplars being presented at a single moment in time as is typical for initial training data for PBSMT/NMT a similarly vast set of exemplars is absorbed diachronically over a period of some years and within the wider context of learning that we 89 Proceedings of the 39th Conference Translating and the Computer, pages 89–97, London, UK, November 16-17, 2017. 2017c AsLing recognise as cognitive development in children [Tomasello, 2008]. This need for enormous sets of training data is of little consequence in the context of mainstream commercial languages where bi-lingual datasets already exist or can be derived from the web. Sub-setting training data for genre improves performance further within that context and the outcome is an excellent set of tools. For so called minority languages where such datasets do not exist and for texts containing many disparate genres and styles the approach is less strong. 2 Translating the Bible Bible translation is a peculiar problem space. The source text is written in more than one (ancient) language over a period of perhaps 1,600 years with the most recent portions almost 2,000 years old. Not only are we at a considerable distance diachronically from the authors of the text, the target language for a translation of the Bible may be culturally and linguistically distant from the original. The text includes many different genres ranging from narrative to complex constructs designed to emphasise particular concepts or aspects within the text. 1 [App A]. It is very unlikely that much if anything in the nature of training data exists (the translators may well have to being by defining an alphabet). This is not a great scenario for mainstream MT systems and overlaying all these issues is the theological landscape the translation must inhabit both in terms of the particular people, place and time for whom it is prepared and the global context of church and faith. 2 The crucible within which meaning is forged sits at the nexus where the narratives of the text engage with the narrative of the translators and the people they represent. Meaning is instantiated in encounter and it is hard to see how that encounter can be modelled by MT. All of these issues make Bible translators wary of MT as a solution to their task. 3 MT in Bible translation Many outcomes from MT research during the last twenty years or so in the form of Machine Assisted Translation (MAT) systems have bee embraced by Bible translators and these systems have served Bible translation well. The MAT systems developed for Bible translators focus on analysis rather than synthesis. This objective analysis is then used the inform the work of the translator. Translators have for many years enjoyed the benefit of word-based SMT to analyse the use of key terms in the text, automatic morphological analysis has contributed to spelling checkers and complex pattern recognition systems monitor renderings of items such as proper-names. Crucially, these systems are all entirely language independent, able to operate with any of the 7,000 or so extant world languages without the need for lexica or tables but looking to discern patterns of form, use and meaning within and between texts. But most of these systems suffer the same limitations as our state of the art PBSMT/NMT systems. They require a lot of training data. The outcome is that they are unable to contribute until a substantial part of the text has been translated, in the case of a New Testament translation perhaps the bulk of the text. 4 Reimagining MAT for Bible translators The limitations of current machine learning lead to particular problems for Bible translators. The lack of MAT support early in a project leads to many inconsistencies in the text, these in turn contribute to poorer results from MAT systems when they do come online later in a project. To address these limitations we have begun to imagine a new approach to working 1 An example of the complexities which can arise and which are often overlooked by those accustomed to encountering the Bible only in translation can be found at Appendix A. 2 For a thorough exploration of these issues see [Wendland, 2008]. 90 with MAT systems in the context of Bible translation. Our objective is to bring forward the moment when MAT systems can contribute to a translation, if possible to the very start of a project. We imagine a framework for MAT processing which falls into three main sections: 4.1 Discovery – parseBots The process of analysing a text must begin with the discovery of entities within the text stream. These may be words, phonemes, morphemes, short phrases or other items but fundamentally they are all patterns which are present in the text. Natural language is inherently structured. The business of learning or reading is, therefore, at its heart an exercise in pattern recognition [Hawkins, 2005 & Kurzweil, 2014]. There are an host of contexts in a text which form patterns. Amongst the key contexts for MAT systems are word formation, clause formation and semantics. Recognising the patterns which arise from these contexts is critical to being able to analyse the text. We have begun to model a set of processes each of which is targeted towards recognising patterns as they form within a developing text. We call these processes ‘parseBots’. These bots can be imagined as peering over the translator’s shoulder as he works on a passage of text identifying items in the text stream as the work progresses. Some consider aspects of word-formation such as phoneme stream or morphology, others are concerned with semantics and note close cognates, glosses etc… and others are seeking pattern and order at clause level. The bots operate on a small pericope of text which we call a ‘parse window’ [Fig. 1]. This may be as little as a handful of sentences and is unlikely to be as many as 100 sentences. Early experiments indicate that useful analyses can be made from as few as two or three sentences. As identifications are made the patterns found are tentatively tagged for meaning and/or function and added to a collection of parses awaiting confirmation. The parse window follows the translator as he continues to work on the text adding new analyses to the list as new forms enter the window and come under scrutiny. It should be noted that a form that fails to parse on first sight may subsequently be identified as subsequent text enters the parse window and enables an identification to be made. All parses are stored, whether confirmed or not, in the expectation that confirmation may be found in further analyses elsewhere in the text stream. Individual analyses are thus aggregated into a more coherent representation of entities within the developing text. Setting the extent for the parse window is one of the key research goals for this work. The optimum size is likely to be language and possibly genre dependent and needs to adjust dynamically as it moves through the text, setting the extent such that it exposes the maximum number of Fig. 1 parses to the Bots. There is often an assumption that larger extents of text expose more analyses. This leads us to imagine a discovery rate for entities in a text which roughly follows the curve at Fig. 2: 91 By this model there is clearly more to be learnt with larger datasets. This is certainly the case but experiment demonstrates that much smaller extents of text can also offer a proportionally rich harvest. A short passage of text with a particular narrative or conceptual focus will expose some entities disproportionately strongly in comparison to their global distribution in the text or language as whole. This characteristic is exploited in a similar fashion to Latent Semantic Analysis [Schone & Jurafsky, 2000]. The consequence of this is that parses which might be overwhelmed in larger extents of text become clearly visible in Fig. 2 shorter pericopes. If this is so we might redraw our discovery graph more like figure 3: to reflect the reality that extent, genre and focus may all contribute to exposing patterns, and so parses, within the text. The precise shape of the curve is likely to be language and context dependent. It is also possible that successful parses from short extent analyses may lessen the depth of the dip as the amount of data increases and bring to the left the moment when analysis of of a larger dataset begins to pay dividends. 4.2 Validation – Building a Language Model Developing a flexible model in which to record this knowledge as it accrues is a key objective for this research. Early experiment suggests that a model based upon surface Fig. 3 forms encountered in the text will provide the best framework for recording analyses in preference to attempting to populate predetermined categories of items. Each lexeme encountered by the parse window is stored as part of a developing Language Model (LM). Where parses have been attempted these are stored together with the form. As more forms and parses are added common patterns emerge. A morphological pattern may find support from a number of parses and may in turn generate candidate stem lemmata. If close cognates are identified which confirm the relationship implied by the morphological analysis then the model’s confidence in that analysis rises and it may begin to use these parses to drive further analyses as the parse window moves on through the text. It is expected that this aggregation of knowledge within the model will enable local parses to be extended across the model as a whole as patterns emerge which are found to be endemic within the text. It must, however, be recognised that errors, inevitably, creep in as a consequence of the limited processing context. Such errors are a particular concern for a scenario which seeks to exploit limited analyses. If we are to exploit these analyses it is important that we have confidence in them. 92 4.3 Verification – Confirming the analyses Given that we cannot rely on having any form of dictionary or grammar for the target language there is only one place we can turn for verification of our parses, the translator. Bible translators are not typically linguists or professional translators as that term is generally understood. They are usually mother-tongue speakers of the target language and, since they represent not just their language community but also the churches within that community, it is likely they will have some measure of theological training. Devising an accessible way to present analyses to translators for assessment is the third important area of research for this project. Initial thinking is that a list of parses awaiting verification will be maintained. As parses acquire a measure of confidence in the LM by aggregation of individual results the proposed identification will be offered to the translator for confirmation or otherwise in the form of a binary question to which the translator can reply only yes or no. For example, a request to confirm that mundus and mundo refer to the same thing might allow a morphology bot to conclude the possibility of a stem mund- with associated morphology -o, -us. The subsequent appearance of mundum adds -um to the morphology and the stem mund- can be passed to a glossing bot for confirmation across the wider text. Translators might choose when they wish to take questions although there may be some merit in maintaining a list of pericope related questions which are presented as the translator finishes a particular passage and whilst the work is fresh in his mind. This represents a departure from the way such confirmation is currently sought. At present, translators cover these kind of checks in sessions lasting hours or even days during which much larger portions of text are reviewed. This is both tedious and time consuming. It is hoped that dealing with such questions piece meal as the work progresses will limit the length of large scale checking sessions and encourage translators to reflect continually on their work as they confirm (or otherwise) the analyses generated by the Bots and the LM. Over time the LM which is the outcome of this process grows into a database which describes the language encountered in the text and from which resources such as morphology and syntax tables and a bi-lingual dictionary between the source and target text can be compiled. This is exactly the data needed to bring our existing systems such as key terms analysis, morphologically based spelling checks and inter-linear back translation on line at a much earlier stage of the translation. 5 Towards a viable prototype Our existing systems can provide many of the processes which will power the various bots. If we were to imagine a typical parseBot set as including capabilities in morphological analysis (concatenative and non-concatenative), close cognate recognition, single term glossing, proper-name recognition and some element of part of speech tagging many of these capabilities already exist within the MAT function library that powers the UBS ParaTExt glossing technologies.3 Re-engineering these systems in the context of sparse data analysis such that parseBots can take advantage of their processing is key to the success of the project. Constructing a viable Language Model will form a major part of the research needed to realise this proposal. Language Models are more often encountered in RBMT contexts and are typically driven by the expectations of formal linguistics. Language is, sadly, a messy 3 For details of these systems see previous work by the MAT team, much of which has been presented to previous ASLIB/ASLING TC conferences: [Riding (2007), Riding (2008), Rees and Riding (2009), Riding and van Steenbergen (2011), Riding (2012), Riding and Boulton (2016)]. 93 business and experience teaches us that attempting to fit linguistic data drawn from a wide set of languages into a single model based upon abstract classifications is not easy. We propose instead to base our LM on the surface forms encountered in the text together with the parses generated by our processing and the relations implied by those parses. Much of this may well prove very similar to traditional linguistic categories but our objective will be to model the linguistic reality we encounter in the text, rather than to fit the data into predetermined linguistic classes. In addition to establishing a workable data model, the LM will also provide the data for ‘global’ analyses which attempt to confirm local parse results from the wider data set. The third area of work facing the team is the need to develop an interaction module to forward confirmation requests to the user and manage their responses. Whilst interactive MT systems are becoming more common these are more commonly used to suggest how a phrase might be completed [Alabau, 2014] rather than to glean information about the text or language. Whether such interactions are best handled ‘little and often’ or less frequently but in a more structured manner will be another key focus of research as the system is developed. Acknowledgements We are indebted to United Bible Societies for their support for this work. Thanks are also due to Oxford Brookes University for continued access to their computing and library services. References Alabau V, Buck C, Carl M, Casacuberta F, Garcia-Martinez M, Germann U, Gonzalez-Rubio J, Hill R, Leiva L, Mesa-Lao B, Ortiz D, Saint-Amand H, Sanchis G and Tsoukala C (2014), “CASMACAT: A Computer-assisted Translation Workbench”, In Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics., 4, 2014. , pp. 25-28. Eisele A, Federmann C, Saint-Armand H, Jellinghaus M, Herrmann T and Chen Y (2008), “Using Moses to Integrate Multiple Rule-Based Machine Translation Engines into a Hybrid System”, In Proceedings of the Third Workshop on Statistical Machine Translation., 6, 2008. , pp. 179-182. Hawkins J and Blakeslee S (2005), “On Intelligence” New York, Owl Books. Koehn P, Och F J and Marcu D (2003), “Statistical Phrase-Based Translation”, In Proceedings of HLT-NAACL 2003, Main Papers., 5, 2003. , pp. 48-54. Kurzweil R (2012), “How to create a mind”, Viking Penguin. Nagle P (2017), “Get the Best from Neural MT with Quality Data”. KantanMT, URL: https://kantanmtblog.com/ 2017/08/04/get-the-best-from-neural-mt-with-quality-data/ Retrieved 14:32 1-10-2017. Ortiz-Martinez D and Casacuberta F (2014), “The New THOT Toolkit for Fully-Automatic and Interactive Statistical Machine Translation”, In Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics., 4, 2014. , pp. 45-48. Rees N and Riding J (2009), “Automatic Concordance Creation for Texts in Any Language”, In Proceedings of Translation and the Computer 31. IMI/ASLIB. Riding J (2007), “A relational method for the automatic analysis of highly-inflectional agglutinative morphologies”. Thesis at: Oxford Brookes University (MPhil). Riding J (2008), “Statistical Glossing, Language Independent Analysis in Bible Translation”, In Translating and the Computer 30. ASLIB/IMI. Riding J and van Steenbergen G (2011), “Glossing Technology in Paratext 7”, The Bible Translator. Vol. 62(2), pp. 92-102. Sanchez-Cartagena VM, Perez-Ortiz JA and Sanchez-Marinez F (2016), “Integrating Rules and Dictionaries from Shallow-Transfer Machine Translation into Phrase-Based Statistical Machine Translation”, Journal for Artificial Intelligence Research, 1, 2016. (55) Schone P and Jurafsky D (2000), “Knowledge-Free Induction of Morphology Using Latent Semantic Analysis”, 94 In Proceedings of CoNLL-2000 and LLL-2000. Lisbon , pp. 67-72. Shterimov D, Nagle P, Casanellas L, Superbo R and O'Dowd T (2017), “Empirical evaluation of NMT and PBSMT quality for large-scale translation production”. Thesis at: KantanMT. Staley, J. (1986) “The Structure of John's Prologue: Its Implications for the Gospel's Narrative Structure”, The Catholic Biblical Quarterly, 241-264 Tomasello M (2003), “Constructing a Language” Cambridge Mass., Harvard University Press. Wendland E (2008), “Contextual Frames of Reference in Translation” Manchester & Kinderhook, St Jerome Publishing. Wu Y, Schuster M, Chen Z, Le QV and Norouzi M (2016), “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation”. Thesis at: Google., 10, 2016. 95 Appendix A – Beyond MT Here is the opening two verses of St John’s Gospel in Greek but transliterated into English characters: en archē ēn ho logos, kai ho logos ēn pros ton theon, kai theos ēn ho logos, houtos ēn en archē pros ton theon. And here is a glossary of the words in those verses: en in ho logos the word ton theon God archē the beginning kai and ho theos God ēn was pros with houtos this (word) There are three poetic structures used in this short piece of text [Dobson, 2005:6-7 & Staley, 1986] (which in translation is typically presented as a single prose paragraph): 1. Terrace parallelism uses the repetition of key concepts at the end and beginning of adjacent phrases. We can see this happening more clearly in this text if we we highlight them like this: en archē ēn ho logos, kai ho logos ēn pros ton theon, kai theos ēn ho logos, houtos ēn en archē pros ton theon. It is rarely possible to represent this in translation. 2. Threefold repetition adds emphasis or importance to a concept in scripture. Note logos and theos above. 3. The writer has not only used terraced parallelism and threefold repetition to highlight ‘Word’ and ‘God’, he has also woven a chiastic pattern around the parallelismus: A. en archē [in the beginning] B. ēn [was] C. ho logos, [the word] D. kai ho logos [and the word] E. ēn [was] F. pros ton theon, [with God] kai theos [and God] E' ēn [was] D' ho logos, [the word] C' houtos [this (word)] B' ēn [was] A' en archē pros ton theon. [in the beginning with God] The central point of the chiasm ‘F’ is the point of emphasis and the closing elements echo the opening elements in reverse order. Such complexities account for much of the reluctance of Bible translators to embrace typical MT systems. 96 Appendix B – Initial Experiment To demonstrate the possibilities of working with very small amounts of data an initial experiment was prepared which used a set of 5 parseBots to analyse the same pericope of John’s Gospel in Latin from which the example at Appendix A was drawn. The bots had access to the base text (Greek) which was lemmatised. Beyond this, no information was given other than the text. The text analysed was: 1 In principio erat Verbum et Verbum erat apud Deum et Deus erat Verbum 2hoc erat in principio apud Deum 3omnia per ipsum facta sunt et sine ipso factum est nihil quod factum est 4 in ipso vita erat et vita erat lux hominum 5et lux in tenebris lucet et tenebrae eam non conprehenderunt 6fuit homo missus a Deo cui nomen erat Iohannes 7hic venit in testimonium ut testimonium perhiberet de lumine ut omnes crederent per illum 8non erat ille lux sed ut testimonium perhiberet de lumine 9erat lux vera quae inluminat omnem hominem venientem in mundum 10in mundo erat et mundus per ipsum factus est et mundus eum non cognovit. The bot set included: • Close cognate finder • Morphology analyser • Lemmatiser • Proper-name finder • Glossing engine The analysis began with a single verse and was then repeated, adding a verse at each iteration and the following hypotheses were queued for verification after each iteration: 1. Cognate: deus, deum? 2. 3. Cognate: ipso, ipsum? Results marked in green are analyses Cognate: facta, factum? confirmed by more than one bot process. Morph: _um? These are forwarded to the user for 4. Stem: de_? verification via the interaction module. Stem: ips_? Stem: fact_? Of the remainder, all but the non/nomen 5. Cognate: tenebrae, tenebris? cognate are valid and we can expect that to 6. Name: Iohannes? be dismissed by subsequent processing. Cognate: non, nomen? Gloss: de* = θε* Particularly pleasing is the hic/hoc cognate 7. Cognate: omnes, omnia? which illustrates the power of non- Stem: e_t? concatentive morphology analysis 8. Cognate: hic, hoc? Cognate: ille, illum? The e_t stem is also of interest. At first sight 9. Cognate: hominem, hominum? this is nonsensical but the data that support it Stem: homin_ are in fact est/erat; cognate forms of the 10. Cognate: facta, factum, factus? Latin verb to be. Gloss: fact* = ποι* Cognate: mundo, mundum, mundus? This is a rich harvest from so small a data Stem: mund_? set. Gloss: mund* = κοσμ* 97 Terminology Management Tools for Conference Interpreters – Current Tools and How They Address the Specific Needs of Interpreters Anja Rütten Sprachmanagement.net Paradiesstr. 3, 41849 Wassenberg, Germany [email protected] Abstract Ever since the 1990s, terminology management systems have offered sophisticated data structures and management functions to translators, terminologists, and interpreters. Nevertheless, tools have also been developed by or in close cooperation with active conference interpreters to meet the needs of interpreters. This workshop is intended to provide an overview of the tools currently on the market. It will include live demonstrations or screenshots to showcase the individual characteristics of each tool, illustrating the functions that make them attractive to interpreters, such as sorting, filtering, ease of use, mobile access and online collaboration. Information about pricing models and supported operating systems will also be provided. 1 Introduction Ever since the 1990s, terminology management systems have offered sophisticated data structures and management functions to translators, terminologists, and conference interpreters. Tools have also been developed by or in close cooperation with active conference interpreters to meet the needs of interpreters. Initially, these tools were generally inspired by one or very few users and developed by a single developer or a very small team. This workshop aims to present an overview of terminological tools for interpreters, highlighting the pros and cons of each of them. Although some of the tools presented are no longer being developed and no support is offered, they will nevertheless be presented, since the current versions run smoothly. Due to time restrictions, some tools will be shown “live” while others will be presented via screenshots, thereby giving participants a full picture of the terminological resources currently available for interpreters. As the aim of this workshop is for participants to determine which tool(s) best suit their needs, only the most relevant aspects of terminology management for conference interpreters will be addressed. 2 Software available The following programs will be presented in the workshop: ▪ Interplex by Peter Sand, Eric Hartner (Geneva, Switzerland) ▪ Lookup by Christoph Stoll (Heidelberg, Germany) ▪ Terminus by Nils Wintringham (Zürich, Switzerland) ▪ Glossarmanager by Glossarmanager GbR/Frank Brempe (Bonn, Germany) ▪ InterpretBank by Claudio Fantinuoli (Germersheim, Germany) ▪ Interpreters’ Help by Benoît Werner/Yann Plancqueel (Berlin/Paris) ▪ Intragloss by Dan Kenig and Daniel Pohoryles (Tel Aviv/Paris) ▪ Glossary Assistant by Reg Martin (Switzerland) 98 Proceedings of the 39th Conference Translating and the Computer, pages 98–103, London, UK, November 16-17, 2017. 2017c AsLing A complete overview of these tools, including information on pricing models and supported operating systems, is available at www.termtools.dolmetscher-wissen-alles.de. The author strives to keep this page up to date. An excerpt of this overview is available in the annex. Generic solutions like Microsoft Excel and Access, Google Sheets or Airtable may also be of interest, as they are widely available and offer many of the sorting, searching and filtering functions most interpreters seek. However, they lack interpreting-specific functions like intuitive, incremental accent-insensitive search, integrated search of online sources, and workflow support during preparation – which can include creating links between preparatory documents or reference corpora and the terminology database. 2.1 Sorting, Filtering and Searching One of the most important functions – which has been offered since the early days of interpreter- specific terminology management tools – is easy sorting and filtering by subject, conference or customer as well as intuitive searching while interpreting simultaneously at the same time. Pioneering booth-friendly terminology management programs include Interplex and LookUp (which were released around the turn of the century) and Terminus. Interplex has no categorisation system. As such, it is based on the idea that interpreters organise their terminology into glossaries rather than databases, which parallels the tradition of pen-and-paper glossaries. Glossary names (i.e. different .doc or .xls files) represented an early, simple categorisation system. Colleagues who had accumulated many such documents became aware of the fact that many subjects were interrelated and that different subjects could arise during the same conference. As such, they needed – and still need – a way to search all of their valuable glossaries at once. Interplex provides a solution by importing all glossaries, i.e. doc, txt or xls files, and offering a very intuitive multi-glossary search – ignoring accents and all kinds of special characters. LookUp is what comes closest to a combination of terminological considerations and the practical requirements of a conference interpreter. It goes beyond simple glossary structures and offers semantic relations including synonyms, antonyms, hypernyms and hyponyms, apart from categories like conference and customer allowing interpreters to focus on the terminology they need for a given conference setting. The incremental search function is highly intuitive, although not accent-insensitive. Terminus was first released in 1997. Like the previous programs, it categorises glossaries and “groups” (customer, subject group, etc.) using descriptors instead of data fields, a simple and efficient method many people are now familiar with thanks to their Gmail accounts. It also includes comment and context fields. The search function is mouse-free and results can be filtered, although accents and special characters are not ignored. Glossarmanager (2008) is similar: terms are organised into glossaries and chapters; data fields include the two languages, synonym, antonym, picture and comment fields. The search module can be operated using only the keyboard, and results can be filtered. Accent/special characters and case sensitivity can be activated. Glossarmanager also provides (customisable) links to online resources for further searching and includes a vocabulary training module, where the correct answers must be typed in (instead of only displaying the correct answers and leave it to the users to decide whether they would have known it or not, like InterpretBank does). Therefore, in addition to providing valuable support in the booth, Glossarmanager also offers functions that will help you prepare for an assignment. 2.2 Workflow support InterpretBank is another pioneering tool that has been around since the beginning of the century. However, it has seen the greatest number of developments over the years. Like Lookup, 99 InterpretBank offers search and detail views, taking into account various phases of an interpreter’s workflow, including an editing, conference and vocabulary training mode. Apart from classifying terms into glossaries and subglossaries, it also includes additional fields for each term and entry. An integrated search function allows users to search across online dictionaries, definitions and machine translations, supporting the glossary creation phase. Like Glossarmanager, InterpretBank is one of the few tools designed for interpreters that includes a vocabulary training mode. It also offers a sophisticated booth search function that can be tailored to user needs. An incremental search function can ignore accents and special characters as well as correct typing errors. In addition to providing strong support during the preparatory phase, InterpretBank also offers a helpful log file including all of the changes and queries made during a conference. This is especially useful after an interpreting assignment has ended. Intragloss focuses on the preparation phase, allowing users to transfer terms from background documents (speeches or reference texts) directly into a glossary and to check new texts against existing glossaries, displaying target language equivalents from the glossaries in the text. It also features a display function where users can view and scroll the original text and translation in parallel. Furthermore, Intragloss offers a customisable, built-in tool allowing users to search across a wide range of dictionaries and databases, including Linguee, IATE, Wikipedia, and many more monolingual and bilingual resources. Glossaries can also be organised by assignment and subjects. 2.3 Sharing and online collaboration Databases can be exported from and imported into the majority of the aforementioned software programs – with varying degrees of convenience. However, when it comes to sharing a link with someone to provide access to an online document which can be edited by everyone involved, cloud-based team glossaries in Google Sheets are becoming more and more popular; indeed, they offer an unprecedented degree of load-sharing and collaboration features that make Google Sheets a killer application for conference interpreters. Although Google Sheets lack the intuitive search functions and other workflow-supporting features that other tools offer, this may be outweighed by the fact that the software is freely available, relatively easy to use and helps to considerably decrease the workload when preparing a large number of highly technical presentations, not to mention the group dynamics and common knowledge base it creates. However, if you are concerned about keeping your customers’ data confidentiality yet still want to share cloud-based glossaries, the one interpreter-specific solution available is InterpretersHelp. The browser-based tool features the BoothMate companion application for offline work on tablets and desktops and offers an online space for creating and sharing glossaries. It also boasts convenient search and categorisation functions as well as a platform to connect with other interpreters and manage jobs. 2.4 Mobile use Mobile devices offer great possibilities of accessing your data on the go – especially for consecutive interpreting. GlossaryManager is designed for this very scenario, offering touchscreen-optimised handling for quickly sorting and searching your glossary. Although mobile apps are available for InterpretBank and GlossaryManager, files need to be transferred between a computer and your mobile device manually. InterpretersHelp is the only cloud- and browser-based solution, and can be consulted from both the iPad app and web browsers for comfortable mobile access. 100 2.5 Generic terminology management systems Given that many of the functions interpreters need are also available in generic terminology management systems, the question arises why the latter are used infrequently by conference interpreters. This may be because interpreters feel they lack control over their own data, do not have their relevant terminology visually present, lack data portability (and the ability to share resources with colleagues), or have to familiarise themselves with a new program whose benefits are unclear. Even conference interpreters who also work as translators often have two different systems in place. A non-representative survey (Wagener 2012) of 102 professional conference interpreters – mostly freelancers on the German market – found that only 15% use terminology databases, while 26% use interpreter-specific tools. Therefore, despite the fact that numerous tools have been designed for interpreters, it appears that no single tool has become the (unofficial) “industry standard” for conference interpreters. According to the survey, most participants often (51%) or sometimes (29%) use paper for their terminology work; only 20% work in a completely paperless fashion. Many respondents often or sometimes use Microsoft Word (87%) or Excel (60%). The most distinctive feature of interpreter-specific tools is simplicity and intuitiveness. What with the possibilities of add-ons and APIs at hand, there should be a merit in taking interpreters on board of “proper” terminology management databases. Indeed, the ability to save time when preparing for a conference might also provide useful insight for other user groups. Although translators and terminologists work under different time pressures than interpreters and can allocate greater cognitive capacities to operate filters and search functions, reducing the working time and attention required to operate terminology and translation memory systems could also boost translators’ productivity and profitability. Apart from time pressure, the greatest difference lies in the fact that translators know exactly what they are supposed to translate, while proper preparation is the key to good information management for interpreters. This often involves gaining familiarity with unknown terms, speakers and subjects. While interpreters must “guess” which knowledge will be relevant, this is less relevant for translators. In this respect, the approach adopted by interpreters resembles the approach adopted by terminologists, who must capture conceptual relations and semantic fields in order to grasp the subject matter at hand. In recent years, the most notable changes in software for translators and interpreters alike has been the shift to internet-based tools, including cloud-based collaboration as well as web dictionaries and machine translation that are integrated into translation memory systems (ToolBox 13 2017:217f, 241ff, 297:ff). According to Wagener, as early as 2012, 14 % of respondents used Google Docs. At the Interpreters for Interpreters Workshop in Bonn (15 September 2017), a show of hands among a comparable audience (about 100 participants comprising primarily AIIC or VKD members from Germany) revealed that nearly all participants had used Google Docs for preparation at least once. Interestingly, the same audience argued that the greatest advantage of online collaboration was saving time. 3 Outlook Online collaboration supposes a true paradigm shift in knowledge and information management for interpreters. Ten years ago it had to be considered an individual task, at least during the decisive phase of preparation (Rütten 2007:126f). The findings and best practice regarding information and knowledge management in organisations, like it has long been discussed in the field of economics and information sciences, were of very limited use for the work of an individual interpreter. With cloud-based platforms available for free, this paradigm has begun to change, and will continue to do so in the coming years. Apart from sharing the workload – which is necessary due to the increasingly technical nature of conferences – collaboration also 101 suits our natural need to communicate, which in turns enables us to process information at a deeper level. This means that for the first time, the team of freelance interpreters can be considered an “organisation, thus insight could be gained from knowledge management related findings in other disciplines. In the future, conference interpreters might be more effectively and profitably served if the “big” terminology management systems were adapted to cover an interpreter’s complete workflow – as Flashterm has already endeavoured to do. The technology required for the different modules already exists, yet for the system to be widely accepted, it will need to be portable and easy to use, offering user-friendly input, sorting, and filtering as well as online collaboration features and high visibility of the most relevant terms. A field study on the paper notes interpreters use in the booth may shed further light on these requirements, as well as on the knowledge elements that are critical to performance in conference interpreting. It will be published by the author in 2018. Further reading The reader is referred to the following publications for additional reading: Drechsel, Alexander. 2015. App profile: Interpreters' Help (Blog Post). https://www.adrechsel.de/dolmetschblog/interpretershelp [last accessed October 1, 2017] Drechsel, Alenxander. 2016. Dan Kenig and Intragloss (Podcast Interview). https://www.adrechsel.de/langfm/dan-kenig-intragloss [last accessed October 1, 2017] Drechsel, Alenxander. 2016. Interpreters' Help helps interpreters with terminology (Podcast Interview). https://www.adrechsel.de/langfm/interpretershelp [last accessed October 1, 2017] Goldsmith, Josh. 2017. "The Interpreter’s Toolkit: Intragloss - a useful glossary-building tool for interpreters". aiic.net February 28, 2017. http://aiic.net/p/7886. [last accessed October 1, 2017] References AIIC. 2017. Workshop “Dolmetscher für Dolmetscher. https://aiic.de/event/8-dolmetscher-fuer-dolmetscher- workshop/ [last accessed September 20, 2017]. Rütten, Anja. 2015. “Summary table of terminology tools for interpreters”. www.termtools.dolmetscher-wissen- alles.de [last accessed September 20, 2017]. Sand, Peter. 2003. “Manage your Terminology with Interplex”. https://aiic.net/page/1321/manage-your- terminology-with-interplex/lang/1. In: The AIIC Webzine [last accessed September 20, 2017]. Wagener, Leonie. 2012. “Vorbereitende Terminologiearbeit im Konferenzdolmetschen unter besonderer Berücksichtigung der Zusammenarbeit im Dolmetschteam“ [Preparatory Terminlogy Work in Conference Interpreting with special Emphasis on Cooperation in Teams of Interpreters]. Master thesis at the University of Applied Sciences Cologne, Faculty of Information and Communication Sciences, Institute for Translation and Multilingual Communication. Zetzsche, Jost. 2017. “The Translator’s Tool Box for Translators - A Computer Primer”. Version 13, March 2017. International Writers’ Group, LLC. 102 Annex Program Developer Operating system Price Flashterm.eu Eisenrieth Windows, Mac, iOS and solo edition with special Dokumentations browser-based interpreter module GmbH available in 2018 for 499 €, demo on request Glossarmanager.de Glossarmanager Windows free GbR/Frank Brempe, Bonn Glossary Assistant Reg Martin, tablets running Android free (http://www.swiss32.com) Switzerland 4.1 or later, phones running Android 4.2 or later, Windows based PC Interplex Peter Sand, Eric Windows, Mac, iOS full license $75, (http://fourwillows.com/ Hartner, Geneva Interplex HD for iPad interplex.html) $19.99 $, Interplex lite for iPhone (viewer) InterpretBank.com Claudio Fantinuoli, Windows and iOS; Full licence 119 € plus Germersheim access to glossaries from VAT, 59 € student Android/iOS mobile version, 29 € upgrade devices www.interpretershelp.com Benoît Werner and all (browser-based), free public glossaries + Yann Plancqueel, additionally Boothmate one private glossary; pro Berlin, Paris for Windows, Mac OS X version 20.00 €/month; and ipad free education plans Intragloss.com Dan Kenig and Daniel Mac; Windows-version regular price: Pohoryles, Tel expected $49/month, $99/3 Aviv/Paris months, $269/year Lookup Christoph Stoll, Windows free (http://www.lookup- Heidelberg web.de/) Terminus Nils Wintringham, Windows (for W8, don’t CHF 148 plus VAT, (http://www.wintringham.c Zürich use the default 50% off educational h/cgi/ installation folder) license, free demo with ayawp.pl?T=terminus) limited number of entries 103 The SCATE Prototype: A Smart Computer-Aided Translation Environment Vincent Vandeghinste Sven Coppers KU Leuven Hasselt University - tUL - imec [email protected] [email protected] Jan Van den Bergh Tom Vanallemeersch Hasselt University - tUL - imec KU Leuven [email protected] [email protected] Bram Bulté Ayla Rigouts Terryn Ghent University KU Leuven [email protected] [email protected] Els Lefever Iulianna van der Lek-Ciudin Ghent University KU Leuven [email protected] [email protected] Karin Coninx Frieda Steurs Hasselt University - tUL - imec KU Leuven [email protected] [email protected] Abstract We present the SCATE prototype: A Smart Computer-Aided Translation Environment, developed in the SCATE research project. Its user interface displays translation suggestions coming from different resources, in an intelligible and interactive way. It contains carefully designed representations that show relevant context to clarify why certain suggestions are given. In addition, several relationships between the source and the suggestions are made explicit so the user understands how a suggestion can be used in order to select the most appropriate one. Well-designed interaction techniques are included that improve the efficiency of the user interface. The suggestions are generated through different web services, such as fuzzy matching based on a translation memory (TM), machine translation (MT) and terminology extraction. MT and TM are combined using a pre-translation mechanism. A lookup mechanism highlights terms in the source segment that are available with their translation equivalents in the bilingual glossary. This paper presents the interface and the underlying web services, and discusses preliminary evaluations of the interface and the pre-translation mechanism. 1 Introduction We present a demonstration prototype of a computer-aided translation system that was built in the SCATE project (Smart Computer-Aided Translation Environment) (Vandeghinste et al. 2014). This project, which is currently in its final phase, investigates several aspects related to translation technology, such as the design of translators’ user interfaces, the combination of machine translation (MT) and translation memory (TM), syntactic fuzzy matching, bilingual term extraction using parallel and comparable corpora, and confidence estimation of MT. The project is motivated by the fact that translators tend to have a limited trust in MT output, and translation environments provide a limited integration of resources. The SCATE prototype consists of a carefully designed user interface that displays translation suggestions and terminology in an intelligible and interactive way. Translation suggestions are generated through a web service which integrates a TM system's fuzzy matching with MT. Terminology support is provided and terminology is automatically 104 Proceedings of the 39th Conference Translating and the Computer, pages 104–113, c London, UK, November 16-17, 2017. 2017 AsLing extracted from parallel corpora. Advanced autocompletion functionality allows to efficiently use the translation suggestions. While the SCATE prototype demo uses a medical corpus in a specific language pair (English-Dutch), the SCATE technology is sufficiently generic to be applicable to other domains and language pairs. This paper is structured as follows. Section 2 describes the state-of-the-art in computer- aided translation environments. Section 3 describes the SCATE prototype. Section 4 provides details on a preliminary evaluation of the interface and of the combination of TM and MT. Section 5 discusses conclusions and future work. 2 State-of-the-art in computer-assisted translation Computer-assisted translation (CAT) tools have been commercialised since the late 1990s, triggering new business models and greatly influencing the translation and localisation processes, and the way translators work. Users can perform basic project management tasks, create and maintain TM and terminology databases, query MT engines and online databases directly from the translation editor, automatically extract terms from reference materials, align parallel corpora, and use automatic quality control checks on the target document to detect various types of errors. Moreover, cloud-based systems have made collaboration much easier as an entire team can work on the same text simultaneously in real time, leaving comments, sharing and updating resources instantly. Despite the wide range of functionalities and possibilities, CAT tools are not used to their true potential either because of usability issues or because the integration of various technologies (TM, MT, term bases) is not yet optimal (Ehrensberger-Dow and O’Brien, 2015; Zaretskaya, 2015; Krüger, 2016; Moorkens and O’Brien, 2016). Moreover, translators have not fully adopted MT as an aid because they do not trust the quality of the commercial MT engines (Van den Bergh et al., 2015; Cadwell et al., 2017). We briefly review current commercial translation environments according to two criteria: usability and extent of integration of different resources (terminology, TM and MT). 2.1 Usability The user interface of CAT tools typically provides access to resources such as translation memories (TM), machine translation (MT) and terminology databases (TB). Tools differ in the way resources are made available, more specifically in terms of the visual proximity of suggestions, information provided on the origin of (parts of) a translation suggestion, and options to facilitate the reuse of sub-segments from TM or MT. With regard to the visual proximity of suggestions, these are ideally displayed on a single screen, together with the surrounding context of the segment being translated, as translators like to have all the information at their fingertips (Lagoudaki, 2009). Different approaches have been adopted: some tools offer a limited amount of suggestions close to the active working area, while others offer each of these resources in dedicated subwindows. For instance, MateCat1 shows relevant resources in a tabbed interface immediately below the active working area, while Lilt2 shows one suggestion from either TM or MT in the same field. The second approach, as exemplified by SDL Trados Studio 3 and WordFast,4 allows to select the type of resource to be displayed, thus limiting the variety of information that is 1 https://www.matecat.com/ 2 https://lilt.com/ 3 http://www.sdl.com/software-and-services/translation-software/sdl-trados-studio/ 4 http://www.wordfast.com/ 105 close to the active working area. Moorkens and O’Brien (2016) confirms that there are proponents for both approaches. In SCATE, we follow the first approach as it eliminates visual focus shifts (see Section 3). Most CAT tools offer limited information on the origin of (parts of) a translation suggestion. The focus is mainly on highlighting the differences between the text to translate and matches from the TM (including match percentages). For MT, most of the time no justification is provided; typically, MT is used as a black box. Teixeira (2014:171) shows that metadata can help translators make well-informed decisions. He concludes that metadata “helps translators adapt their translation strategies more easily according to the suggestion type”. Moorkens and O’Brien (2016) indicates that translators like information about the provenance of the MT suggestions and estimation of their quality. In the context of post- editing, Viera and Specia (2011) argues that translators value on-the-fly highlighting of word alignment in order to keep the connection between source and target text. In other words, it appears useful to explicitly link parts of a source sentence with parts of the translation suggestion. As discussed in Section 3, the SCATE prototype is strongly focused on providing visual aids that explain the origin of translation suggestions and their link with the source text. As for reuse of sub-segments from TM or MT, we point to recent user research, including surveys and field studies (Van den Bergh et al., 2015; Moorkens and O’Brien, 2016), that investigates the interaction between machines and translators. One conclusion is that translators value improved TM-MT integration methods (e.g. copy/paste, drag-and-drop within editor). Reuse of sub-segments is also possible through interactive translation prediction (ITP) (Koehn and Haddow, 2009; Sanchis-Trilles et al., 2014; Torregrosa et al., 2017). This is a form of human-computer interaction in which users are presented, as they type, with translation suggestions from all available resources. Suggestions are displayed either in a drop-down list or directly under the target segment. Commercial translation software developers have implemented this technology in different ways and use different terms to refer to it: predictive typing, AutoSuggest, Autocomplete, Autowrite. Research has shown that translators prefer ITP to classical post-editing because it minimizes the number of keystrokes and thus increases productivity (Koehn and Haddow, 2009; Sanchis-Trilles et al., 2014; Zaretskaya, 2015). 2.2 Integration Translation environments typically include functionalities for terminology management and support for MT. Terminology management mostly consists of basic features to retrieve, save, search, import/export, and maintain terms and term bases. MT integration takes place either via plugins or by combining MT with various other linguistic resources (TM, TB). While some translation environments include a tool that can be used to extract potential terms from TM (automatic term extractor), research conducted within SCATE shows that term extraction has not yet become a standard practice in the translator’s workflow (Steurs et al., 2016; Van den Bergh et al., 2015), leading translators to mainly rely on their TMs and concordance features. Whereas a hybrid approach (combining both linguistic and statistical methods) may be the best method for preparing terminology collections in commercial environments (Warburton, 2015), most translation environments are still limited to monolingual statistical term extraction that often produces either too much “noise” (too many general lexicon words) or “silence” (real terms that are ignored). Moreover, there is still a lack of integration of terminology in translation editors. In order to tackle the problem of integration, the Lilt tool,5 to give an example, combines the glossary with a concordance feature and updates both resources while the translator works. In the SCATE prototype, we 5 https://lilt.com/kb/translators/lexicon 106 approach both of the above issues by incorporating a bilingual term extractor and by smoothly integrating bilingual terminology in the translation editor. Translation environments usually integrate MT and TM in a rather trivial way. They either offer the translation of a fuzzy match (given some threshold) or an MT suggestion. A growing body of research has explored different ways of combining information coming from TMs and MT. An MT system can be constrained to the use of relevant parts of a fuzzy match (Zhechev and van Genabith, 2010), for example by adding XML markup to MT input (Koehn and Senellart, 2010). Other methods have focused on augmenting the translation table of a phrase-based MT system with aligned sub-segments from a retrieved TM match (Biçici and Dymetman, 2008). Alternatively, information from the fuzzy matches can also be integrated in the MT system itself (Li et al., 2017). In the SCATE prototype, we opt for an approach which is based on XML markup and relatively straightforward to implement (see Section 3). The next section describes the SCATE prototype in more detail, focusing on the main user interface and the integration of the different technologies. 3 The SCATE prototype The SCATE prototype is a web-based translation environment, built through a user-centered development approach. Figure 1 presents an overview of the user interface. We focus on usability and on interaction techniques to integrate various translation technologies. The contributions within the SCATE user interface explain how different translation suggestions are generated, why they might be useful to the translator and which relationships exist between them. The aim is to support the translator’s decision-making process during the selection of a translation suggestion. For the demo we use the English-Dutch part of the medical corpus EMEA (Tiedemann, 2007), which contains about 300,000 sentence pairs. This resource was used to train the MT, as a TM, and as resource for bilingual term extraction. 3.1 Usability The SCATE prototype includes four different translation aids: (1) matches from the TM, (2) hybrid MT, (3) alternatives for the selected term and (4) an autocomplete feature to predict the rest of a word or word group. We developed a web service that accepts the sentence to translate and provides fuzzy matches (Figure 1.E) and MT output (Figure 1.C). To support better decision-making about the use of translation suggestions, the user interface focuses on intelligibility. This focus is key to clarify the behaviour of the complex algorithms behind the translation suggestions. The algorithms that find matches (Figure 1.E) in the TM are made intelligible to the translator with icons that depict the used matching metric, with scores representing the level of similarity, and by highlighting parts in the matches that are potentially useful. Existing CAT tools, such as MateCat, at most highlight differences instead of similarities. In the SCATE prototype, partial matches that are often translated by the same group of words are used as pre-translations by the MT engine (Figure 1.C). To make this clear to translators, pre-translations are shown in bold in both the matches and the MT suggestion. On the left side of the matches, potential term translation options, aggregated from TB, TM and MT are listed (Figure 1.D), each with a metric informing the translator about its usefulness. For MT and TM, an absolute value is shown to represent how often an option occurs in the TM matches. For the TB, we show relative frequency. For ITP, all options are considered and can be manually added to the translation. To further enhance intelligibility, occurrences of these options in the matches are automatically highlighted. 107 Figure 1. Overview of the SCATE user interface. (A) Sentence to translate, (B) edit field, (C) hybrid MT, (D) translation alternatives and (E) TM matches. At all times, the preceding (F) and subsequent sentences (G) remain visible. Vertical bars (I) can visualise active filters (H) such as difficulty, responsibilities and progress. In addition to multiple preceding (Figure 1.F) and subsequent sentences (Figure 1.G), all translation aids remain visible at all times, eliminating the visual focus shifts typically required in other CAT tools (Ehrensberger-Dow et al., 2014; Lagoudaki, 2009). Furthermore, a simultaneous exploration of up to four different kinds of relationships between various sorts of suggestions is supported when typing in the editing field (Figure 1.B) or when hovering the mouse cursor over a word in any sentence. (1) Words in the sentence that belong to the same word group are highlighted in the same colour. (2) Translations of the hovered word are highlighted within the TM and MT. Words in the source language appear in yellow, whereas words in the target language appear in blue. (3) Synonyms and alternative translations of the word appear within the matches and MT suggestion in the same colour as the word itself. An overview of all synonyms is shown in the alternatives list (Figure 1.D). This overview works in the inverse direction as well: by hovering over an alternative in the list, (4) occurrences of the alternative are highlighted in the matches from the TM. When the first occurrence of the alternative is not within the viewport (the part of a scrollable window currently visible), the panel with matches will automatically scroll. 3.2 Integration The SCATE prototype combines web services connecting to a TM and a phrase-based statistical MT system, Moses (Koehn et al., 2007). TM matches are retrieved using three metrics: Levenshtein distance (Levenshtein, 1966), METEOR (Lavie and Agarwal, 2007) and shared partial subtree matching, a measure based on syntactic similarity (Vanallemeersch and Vandeghinste, 2015). For each sentence, the N best matches from the TM (according to the fuzzy match score) are stored in a reduced TM subset, together with information on the match score, rank, used fuzzy metric, and part-of-speech (POS) sequence of both the source and target sentence. Two ‘sliders’ can be set by the user: the TM slider and the MT slider. Matches with a score higher than the TM slider are directly used as final translation, and sentences which have no fuzzy match at all or no fuzzy match that scores higher than the MT-slider are sent straight to the Moses SMT system (as illustrated in Figure 2). 108 Figure 2. System of ‘sliders’ for the selection of MT, TM-MT integration or TM. To produce hybrid TM-MT suggestions, the MT system is constrained to use certain word sequences (or pre-translations) extracted from the TM matches. Initially, a four-stage alignment procedure is followed for each triplet of input sentence, TM source sentence and TM target sentence (see Figure 3). Step 1 identifies the overlapping spans between the input sentence and TM source sentence. Step 2 aligns the TM source sentence with the TM target sentence at the word level using the automatic word alignment and lexical probabilities derived by GIZA++ (Och and Ney, 2003) and Moses. Step 3 identifies consistently aligned spans6 for the TM source and target sentence, using the grow-diag-final heuristic (Koehn, 2009), and consistently aligned sub-spans of these spans are identified based on the same criteria. Finally, step 4 couples the consistently aligned spans between TM source and target to overlapping spans in the input sentence. The extracted spans are subsequently filtered (based on criteria such as minimum span length, occurrence of at least one content word, and percentage of aligned words), weighted (taking into account span length, span frequency across TM matches, and fuzzy match score of the strongest match in which the aligned span occurs) and ranked. The best ranked non- overlapping spans are added to the input sentence using XML markup, and these augmented input sentences are sent to Moses. Figure 3. Illustration of alignment procedure. Step 1: identification of overlapping spans in input and TM source. Step 2: word-level alignment between TM source and target. Step 3: finding consistently aligned spans. Step 4: coupling aligned spans in TM target to input. As for the integration of terminology, we generated a bilingual TB offline using TExSIS (Macken et al., 2013), a hybrid terminology extraction tool that uses POS patterns to obtain a preliminary list of candidate terms, which is subsequently filtered statistically. The list of 6 Pairs of spans in the source and target language in which words are not aligned with words outside the spans. 109 alternatives for the selected term (Figure 2.D) aggregates suggestions from MT and TM on the basis of word alignment, and from the bilingual TB. When generating the bilingual TB, we restrict our search to nouns, noun phrases and adjectives, and ensure long multiword terms (MWTs), such as cholangiocarcinoma of the extrahepatic bile ducts, are not omitted. As this linguistic strategy overgenerates since it extracts every occurrence of all valid linguistic patterns (i.e. all nouns, all adjectives, all noun+noun etc.), we apply statistical detection of terms, using the two main principles of termhood and unithood. Termhood indicates the specificity of lexical entries to a certain domain, and is calculated by comparing the relative frequencies of the candidate term in the domain-specific corpus with a general language corpus. Unithood only applies to MWTs and is based on the idea that MWTs are more than the sum of the meaning of the different parts and that the different parts are strongly connected. Therefore, unithood is computed by comparing the frequencies of cooccurrence of the parts with the expected cooccurrence based on the relative frequencies of each of the parts. While both principles are effective, some terms may go undetected, for instance when they are too rare or new, meaning that the term frequency is too low to get significant results for any statistical measure. Due to the Zipfian distribution of language (the long tail distribution of rare words), non-terms may be extracted as well, such as idiomatic phrases (e.g. significant part) or (multi)words that are not domain- specific (e.g. guitar players was extracted from the medical corpus). Based on the monolingual lists of term candidates and the sentence-aligned input corpus, source and target term candidates are linked to each other, in order to generate the bilingual TB. To this end, word alignment is performed on the corpus using the GIZA++ word alignment toolkit. Based on these word alignments, each candidate term in the source language is linked to a candidate term in the target language. These results are filtered by comparing the frequency of the source language candidate term with the number of times it is linked to the target language term according to the word alignment. If this results in a value of less than 20%, the target language term is discarded as a translation suggestion. Discarded suggestions include partial translations (e.g. medication - diureticummedicatie) and wrong spellings. Correct suggestions may be discarded because the source and target term have a different POS tag (e.g. x-ray - radiologie, ‘radiology’). On the other hand, incorrect suggestions may be retained: for instance, one term may be the hyponym or hypernym of the other (e.g. patient - kind, ‘child’), or terms may be only loosely related (e.g. treatment - medicatie, ‘medication’). 4 Evaluation At various stages of the SCATE project, we involved professional translators and translation experts in the design of techniques and interfaces, and in preliminary evaluations. We carried out a formative study in which 8 participants used two versions of the SCATE prototype to translate a text. Both versions provided the same translation suggestions, but in the first version, these suggestions were presented without the intelligibility features described in Section 3, to measure their impact on the user experience. The results show that making more contextual information available has a positive impact. The general usability increased slightly from 71.6 to 76.6 (above average), as measured by the SUS scoring method (Brooke, 1996). Judging by the overall comments, participants highly appreciated the intelligible version. More specifically, professional translators value the fact that match scores are indicated, that words in the TM which match the sentence to translate are highlighted, and that relationships between suggestions are made explicit through visual marks. These features help them to better understand why a translation suggestion might be useful or not, while not being perceived as distracting. Contrary to our expectations, displaying more meta-information is 110 not always desired by our participants. We point out that the quality of the suggestions is always more important than making them more understandable. As for in vitro testing of the integration of TM and MT, we carried out preliminary tests on three TMs (EMEA, DGT7 and a TM provided to us by a software development company8) to evaluate the quality of the hybrid TM-MT suggestions. For each of the datasets, three automated evaluation metrics (BLEU, METEOR and TER) indicated a significant increase in translation quality compared to ‘pure’ MT output. Additionally, qualitative spot checks by translators revealed that in a majority of cases the hybrid suggestions proved to be better than the pure MT output in terms of grammaticality and/or fluency, or provided interesting translation alternatives. 5 Conclusions and future work We presented an innovative prototype CAT system that was built in the SCATE project. The prototype combines different types of translation suggestions into a carefully designed user interface and makes the suggestions available through ITP. The visualisations remain compact and are presented close to the sentence to be translated. We apply bilingual (instead of monolingual) term extraction, combine statistics with linguistic patterns during extraction, and access MT as a glass box: internal information from the phrase-based MT system is used to produce hybrid MT output and to visualise links between the sentence to translate, the MT output and fuzzy matches. Preliminary evaluation of the prototype shows that providing more metadata in an intelligible and interactive manner is not perceived as distracting and helps translators to decide on the best translation suggestions. In vitro evaluation of the hybrid MT output has shown that it produces more useful translation suggestions than pure MT. In addition to the increased quality of the MT output, the highlighting of pre-translations taken directly from the TM has the potential of increasing translators’ confidence in MT output. This, however, needs to be further studied. Current and future work includes the integration of a quality estimation metric for MT (Tezcan et al., 2017), options to configure the translation workflow, as well as support for terminology extraction from comparable corpora (Bowker, 2003; Delpech, 2014). With regard to TM-MT integration, we intend to include functionality for automated fuzzy match repair (Ortega et al., 2016) and perform in-depth tests of syntactic fuzzy matching. More specific evaluations which focus on the impact of intelligibility on the user experience and performance are ongoing. Since the techniques developed in SCATE are generic, we plan to perform tests with other language pairs and domains. Finally, we intend to perform a comparative evaluation of the SCATE prototype with another state-of-the-art tool. Acknowledgements The SCATE project is funded by the Flemish agency for Innovation and Technology (IWT), under project number 130041. References Biçici, Ergun and Marc Dymetman. 2008. Dynamic translation memory: using statistical machine translation to improve translation memory fuzzy matches. CICLing, Haifa, Israel, pages 454–465. 7Subset of 1.7 million sentences (Steinberger et al., 2013). 8150 000 sentences. Provided through a confidentiality agreement. 111 Bowker, Lynne. 2002. Working Together: A Collaborative Approach to DIY Corpora. In The First International Workshop on Language Resources for Translation Work and Research, pages 29–36. Brooke, John. 1996. SUS: A quick and dirty usability scale. In P. W. Jordan et al. (Eds.), Usability Evaluation in Industry, 189 (194). Taylor and Francis, London, pages 4–7. Cadwell, Patrick, Sharon O’Brien, and Carlos S. C. Teixeira. 2017. Resistance and accommodation: factors for the (non-) adoption of machine translation among professional translators. In Perspectives Studies in Translatology, pages 1–21. Delpech, Estelle. 2014. Leveraging Comparable Corpora for Computer-assisted Translation. In Comparable Corpora and Computer-Assisted Translation. John Wiley & Sons, Inc., Hoboken, NJ, USA, pages 1–39. Ehrensberger-Dow, Maureen and Gary Massey. 2014. Cognitive ergonomic issues in professional translation. In: Schwieter, John W. & Aline Ferreira (eds), The Development of Translation Competence: Theories and Methodologies from Psycholinguistics and Cognitive Science. Newcastle upon Tyne: Cambridge Scholars Publishing, pages 58–86. Ehrensberger-Dow, Maureen and Sharon O’Brien. 2015. Ergonomics of the translation workplace: Potential for cognitive friction. In Special Issue of Translation Spaces, 4 (1), pages 98–118. Koehn, Philipp. 2009. Statistical machine translation. Cambridge: Cambridge University Press. Koehn, Philipp and Barry Haddow. 2009. Interactive Assistance to Human Translators Using Statistical Machine Translation Methods. In Machine Translation Summit XII , pages 73–80. Koehn, Philipp, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th ACL, demonstration session, pages 177–180. Koehn, Philipp and Jean Senellart. 2010. Convergence of translation memory and statistical machine translation. In Proceedings of the 2nd Joint EM+/CNGL Workshop “Bringing MT to the User: Research on Integrating MT in the Translation Industry”(JEC '10), pages 21–31. Krüger, Ralph. 2016. Contextualising Computer-Assisted Translation Tools and Modelling Their Usability. In Trans-Kom - Journal of Translation and Technical Communication Research, 9(1), pages 114–148. Lagoudaki, Elina 2009. Translation editing environments. In MT Summit XII: Workshop Beyond Translation Memories. Lavie, Alon and Abhaya Agarwal. 2007. METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In Proceedings of WMT, pages 228–231. Levenshtein, Vladimir. 1966. Binary Codes Capable of Correcting Deletions, Insertions, and Reversals. In Soviet Physics-Doklady, 10, pages 707–710. Li, Liangyou, Carla P. Escartín, Andy Way, and Qun Liu. 2017. Combining translation memories and statistical machine translation using sparse features. In Machine Translation, 30(3), pages 183–202. Macken, Lieve, Els Lefever, and Véronique Hoste. 2013. TExSIS: Bilingual Terminology Extraction from Parallel Corpora Using Chunk-based Alignment. In Terminology, 19 (1), John Benjamins Publishing Company, Amsterdam, Netherlands, pages 1–30. Moorkens, Joss and Sharon O’Brien. 2016. Assessing user interface needs of post-editors of machine translation. In Human Issues in Translation Technology: The IATIS Yearbook, pages 109–130. 112 Och, Franz J. and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. In Computational Linguistics, 29(1), pages 19–51. Ortega, John E., Felipe Sánchez-Martınez, and Mikel L. Forcada. 2016. Fuzzy-match repair using black-box machine translation systems: what can be expected? In Proceedings of AMTA, Vol. 1, pages 27–39. Sanchis-Trilles, Germán, Vicent Alabau, Christian Buck, Michael Carl, Francisco Casacuberta, Mercedes García- Martínez, Ulrich Germann, Jesús González-Rubio, Robin L. Hill, Philipp Koehn, Luis A. Leiva, Bartolomé Mesa-Lao, Daniel Ortiz-Martínez, Herve Saint-Amand, Chara Tsoukala, Enrique Vidal. 2014. Interactive translation prediction versus conventional post-editing in practice: a study with the CasMaCat workbench. In Machine Translation, 28(3-4), pages 217–235. Steinberger, Ralf, Andreas Eisele, Szymon Klocek, Spyridon Pilos, and Patrick Schlüter. 2013. DGT-TM: A freely available translation memory in 22 languages. In Proceedings of LREC, pages 454–459. Steurs, Frieda, Iulianna van der Lek-Ciudin, and Tom Vanallemeersch. 2016. How translators work in real-life: SCATE observations. In Translating for Europe Forum. Brussels, Belgium, 27-28 October 2016. Teixeira, Carlos S. C. 2014. Perceived vs. measured performance in the post-editing of suggestions from machine translation and translation memories. In Third Workshop on Post-Editing Technology and Practice, pages 45–59. Tezcan, Arda, Véronique Hoste, Bart Desmet, and Lieve Macken. 2015. UGENT-LT3 SCATE System for Machine Translation Quality Estimation. In Proceedings of WMT, pages 353–360. Tiedemann, Jörg. 2009. News from OPUS-A collection of multilingual parallel corpora with tools and interfaces. In RANLP, pages 237–248. Torregrosa, Daniel, Juan Antonio Pérez-Ortiz, and Mikel L. Forcada. 2017. Comparative Human and Automatic Evaluation of Glass-Box and Black-Box Approaches to Interactive Translation Prediction. The Prague Bulletin of Mathematical Linguistics, 108(1), pages 97–108. Vanallemeersch, Tom and Vincent Vandeghinste. 2015. Assessing linguistically aware fuzzy matching in translation memories. In Proceedings of EAMT, Antalya, Turkey, pages 153–160. Vandeghinste, Vincent, Tom Vanallemeersch, Frank Van Eynde, Lieve Macken, Els Lefever, Véronique Hoste, Marie-Francine Moens, Joris Pelemans, Patrick Wambacq, Mieke Haesen, Karin Coninx, and Ken De Wachter. 2014. Smart Computer Aided Translation Environment. In Proceedings of EAMT, Dubrovnik, Croatia, page 135. Van den Bergh, Jan, Eva Geurts, Donald Degraen, Mieke Haesen, Iulianna van der Lek-Ciudin, and Karin Coninx. 2015. Recommendations for Translation Environments to Improve Translators' workflows. In Translating and the Computer 37, Asling, London, pages 106–119. Vieira, Lucas Nunes and Lucia Specia. 2011. A review of translation tools from a post-editing perspective. In 3rd joint EM+/CNGL Workshop bringing MT to the user: Research meets translators (JEC), pages 33–42. Warburton, Kara. 2015. Managing terminology in commercial environments. In Handbook of Terminology, Volume 1, John Benjamins Publishing Company, pages 359–391. Zaretskaya, Anna. 2015. The use of machine translation among professional translators. In Proceedings of the EXPERT Scientific and Technological Workshop, pages 1–12.. Zhechev, Ventsislav and Josef van Genabith. 2010. Seeding statistical machine translation with translation memory output through tree-based structural alignment. In Proceedings of SSST, pages 43–51. 113 The Localisation Industry Word Count Standard: GMX-V Slaying the Word Count Dragon Andrzej Zydroń IT expert on Localisation United Kingdom [email protected] Abstract Word and character counts are the basis of virtually all metrics relating to costs in the L10N Industry. An enduring problem with these metrics has been the lack of consistency between various computer assisted tools (CAT) and translation management systems (TMS). Notwithstanding these inconsistencies there are also issues with common word counts generated by word processing systems such as Microsoft Word. Not only do different CAT and TMS systems generate differing word and character counts, but there is also a complete lack of transparency as to how these counts are arrived at: specifications aren't published and systems can produce quite widely different metrics. To add clarity, consistency and transparency to the issue of word and character counts the Global Information Management Metrics Volume (GMX-V) standard was created. Starting with version 1.0 and then as version 2.0 GMX-V addresses the problem of counting words and characters in a localisation task, and how to exchange such data electronically. This workshop goes through the details of how to identify and count words and characters using a standard canonical form, including documents in Chinese, Japanese and Thai, as well as how to exchange such data between systems. 1 In the beginning… One of the most enduring features of the localization industry has been the inconsistency of word counts not only between rival products, but also sometimes between different versions of the same product. Trying to establish a measure for the size of a given localization task is not unlike trying to fight a many headed dragon. The havoc that the lack of a uniform system of measurement can cause was recently exemplified in 1999 when the Mars Climate Orbiter Spacecraft was lost because one NASA team used Imperial units while another used metric units for a key spacecraft operation. The total cost of this error was $125 million. Trying to cope with a lack of a common definition for estimating the size of a localization task can also be equally catastrophic! This lack of a unified count is similar to the situation for general measurements before the advent of the French Revolution. A French foot ('pied du roi' - 12·79 inches) was different from an English foot (12 inches) as was the Welsh foot (9 inches). Certainly the French appendage was the larger. The basis of the current Imperial linear measures in England were unified by Edward I in 1308 who ordained (in a highly scientific manner for the 14th century) that an inch was to be three grains of barley, dry and round, taken from the middle of the ear and that twelve inches were to make a foot. I have often suspected that many of the metrics produced by current CAT tools use similar formulas based on their output. It took the French Revolution to provide a (mostly) logical approach to establishing general units of measure based on a decimal scale (although somehow the 10 day week did not catch on). 2 Microsoft Word Counts Why not just use Microsoft Word as the basis for word and character counts? This in itself is deeply flawed: 114 Proceedings of the 39th Conference Translating and the Computer, pages 114–118, c London, UK, November 16-17, 2017. 2017 AsLing 1. Microsoft does not, to the best of my knowledge, publish the basis of its word counts, so there is no way of independently verifying them. 2. Even within Word, the counts that are produced do not reflect the actual workload for the translation of a document: Word includes automatically generated text such as table of contents, indexes etc. in the word counts. It does not include header and footer text. Word also counts automatically generated numeric list items as words. 3. The basis of Microsoft Word counts have, in the past, changed between version: a lack of consistency and continuity, even between the latest versions of Word. 4. How do you conduct a word count for non-Word documents, say a complex XML, HTML, or FrameMaker document? 5. How do you count hyphenated words? 6. How do you count ‘aujourd'hui’, or ‘quelque'un m'a dit’ in French? 7. How do you count the following XML fragment in Word: <g id=”g1”>exa<x id=”x1”/>mple</g> 3 Standards to the rescue GMX/V (Global Information Management Metrics - Volume) is an ETSI LIS (European Telecommunications Standards Institute Localization Industry Standards) standard. Originally developed within LISA OSCAR it has been incorporated along with the other LISA OSCAR standards within ETSI LIS, where it has been developed further. GMX/V version 2.0 was published in 2012 and includes factors for converting Chinese, Japanese, Korean and Thai character counts to word counts. GMX was always intended to be a group of standards relating to providing key standard metrics, such as ‘P’ for percentage fuzzy match, ‘C’ for complexity and ‘Q’ for required quality. Using GMX/V/P/Q/C you can totally quantify and automate the quoting for a localization task. GMX/V addresses two very important issues: 1. How do you unambiguously and verifiably count words and characters for a given localization task? 2. How do exchange word and character counts in a uniform and rigorous form between systems? Interestingly, for a document containing only text and without any header, footer, table of contents etc. document GMX/V produces word counts that are not dissimilar to Microsoft Word, but it does so in a documented and verifiable form. To summarize, GMX/V provides: 1. A clear and unambiguous way of counting as well as categorizing word and character counts for all languages and scripts. 2. An XML vocabulary for exchanging localization metrics data between computer systems. 4 Words and Characters GMX/V mandates both word and character counts. Character counts convey the most precise definition of a translation task, whereas word counts are the most commonly used metric in the translation industry. GMX/V encompasses both measurements, thus affording the translation suppliers and customers with a choice as to which measurement most adequately reflects the translation task in question. 115 5 Canonical Form One of the main problems with calculating word and character counts is the plethora of differing proprietary file formats, which can contain a mix of form and content data. Trying to establish a standard that addresses all of these formats is impossible – the word count dragon has too many heads to attempt to cut them all off with one swipe. As soon as one head is cut off, a new one will appear somewhere else. A better approach is to force the dragon to enter a narrow passage where the heads are all forced together. Enter the XLIFF knight in shining on a charger called Unicode. XLIFF is the OASIS standard for XML Localization Interchange File Format and is designed as a way of exchanging translatable data in an XML format. GMX/V relies on the XLIFF representation as the canonical form for the basis of word and character counts. GMX/V mandates that all characters are counted in their Unicode representation and that all multiple space characters are reduced to a single character. In addition word boundaries are defined with reference to Unicode Technical report 29 – Word Boundaries. This provides an unambiguous definition of what constitutes a word. By using XLIFF as the canonical form for counting the source language text GMX/V establishes a common and well-defined format for word and character counts. GMX/V uses the XLIFF ‘source’ element for the canonical form. Example: <source>An example of the canonical form of a text unit.</source> Within XLIFF, inline codes are interpreted as inline XML elements. The inline elements are not included in the word and character counts, but form a separate inline element count of their own. The frequency of inline elements can have an impact on the translation workload, so a separate count is useful when sizing up a job. For the canonical form, only ‘g’ (inline elements with content) and ‘x’ (inline elements with no content) inline elements are used. Example: <source>In this <g id=”g1”>example</g> the in-line codes do not feature in the word and character counts.</source> <source>In this <g id=”g1”>exa<x id=”x1”/>mple</g> the in-line codes do not feature in the word and character counts.</source> Stand alone punctuation characters also feature as an additional category in both word and character counts. They are included in the main count, but can be deducted from both by mutual consent if they do not increase the translation workload. GMX/V addresses all of the issues of how to count words and characters in the XLIFF canonical format. GMX/V proposes a sentence level of granularity for counting purposes within XLIFF. The sentence is the common accepted atomic unit for translation. GMX/V does not preclude producing metrics directly from non-XLIFF format files as long as the format for counting is based on the XLIFF canonical form for each text unit being counted. This can be done dynamically on the fly. In these instances an audit file will be necessary for verification purposes. The main goal of GMX/V is to provide a detailed count for words and characters based on the characteristics of individual sentences. The aim is to provide sufficient detail to enable an accurate definition of the scale of the translation task. The customer and supplier can then decide which of the statistics to use or not when costing the translation task for a given file. 116 6 Asia-Pacific Scripts Version 2.0 of GMX/V added word count support for Japanese, Chinese, Korean and Thai. Word counts for these languages are based on factors applied to character counts. These factors are well established in the localization industry and have been used over many years. You divide the character counts by the following factor for each script to obtain the word counts: 1. Chinese (all forms): 2.8 2. Japanese: 3.0 3. Korean: 3.3 4. Thai: 6.0 For instance if a Chinese document contains 13,456 characters (using the GMX/V specification) the word count will be 13456/2.8 = 4,806 words. 7 Quantitative and Qualitative Measurements GMX/V counts fall into two categories – how many, and what type. The primary count will always be unqualified – how many characters and words are there in the file. This is the minimal conformance level proposed for GMX/V. A typical translatable document will contain a variety of text elements. Some of these elements will contain non-translatable text, some will have been matched from translation memory, some will have been fuzzy matched by the customer. It is therefore important to be able to categorize the word and character counts according to type in order to provide a figure in words and characters for the localization task. 8 Count Categories GMX/V recommends the following count categories: 1. Total Count – the overall count. 2. Exact Matched Count – this is an accumulation of the word and character count for text units that have been matched unambiguously with a prior translation and require no translator input. 3. Leveraged Matched Count - this is an accumulation of the word and character count for text units that have been matched against a leveraged translation memory database. 4. Fuzzy Matched Count - this is an accumulation of the word and character count for text units that have been fuzzy matched against a leveraged translation memory database. 5. Alphanumeric Only Text Unit Count – this is an accumulation of the word and character count for text units that have been identified as containing only alphanumeric words. 6. Numeric Only Text Unit Count – this is an accumulation of the word and character count for text units that have been identified as containing only numeric words. 7. Punctuation Only Text Unit Count – this is an accumulation of the word and character count for text units that have been identified as containing only punctuation. 117 8. Stand Alone Punctuation Count – this is an accumulation of the stand-alone punctuation word and character counts from the individual text units that make up the document. 9. Measurement Only Count – this is an accumulation of the word and character count from measurement only text units. 10. Other Non-Translatable Word Count – other non-translatable word and character counts. Similar counts exist for characters. 9 Summary GMX/V is based on well-defined standards: 1. XLIFF 2. Unicode ISO 10646 3. Unicode TR29 GMX/V provides unambiguous and verifiable counts for words and characters, standalone punctuation and inline code and references for all languages and scripts. It also provides additional qualitative counts for the text element categories detailed above. All of this detail allows a precise and unambiguous definition of the localization task for a given electronic file. This rich detail allows suppliers and customers to be able to precisely measure the task in hand. This must surely be a good thing for the localization industry as a whole. In addition, GMX/V provides a way of electronically exchanging counts between different systems. Full details of the GMX/V 2.0 are available: https://xtm-intl.com/manuals/gmx-v/GMX-V-2.0.html 118 Author Index Berard, Alexandre, 11 Besacier, Laurent, 11 Boitet, Christian, 70 Boulton, Neil, 89 Cacheiro Quintas, Laura, 1 Coppers, Sven, 104 Corpas Pastor, Gloria, 7 Esperanca-Rodier, Emmanuelle, 11 Fantinuoli, Claudio, 25 Farrell, Michael, 35 Goldsmith, Joshua, 40 Kalantzi, Dimitra, 51 Kerremans, Koen, 55 Läubli, Samuel, 59 Lefever, Els, 104 Lemaire, Claire, 70 Luyten, Kris, 104 Orrego-Carmona, David, 59 Prandi, Bianca, 76 Riding, Jon, 89 Rigouts Terryn, Ayla, 104 Rossi, Caroline, 11 Rütten, Anja, 98 Stengers, Helene, 55 Steurs, Frieda, 104 Van den Bergh, Jan, 104 van der Lek-Ciudin, Iulianna, 104 Vanallemeersch, Tom, 104 Vandeghinste, Vincent, 104 Zydroń, Andrzej, 114 119 Sponsors Gold Sponsor Sponsors Silver Sponsors Please highlight these dates in your diary : Translating and the Computer TC 40 15-16 November 2018 will organise: London (UK) For information on next year's 40th Translating and the Computer conference, TC40, please check http://www.asling.org for how and when to submit proposals for talks, workshops and posters, and check out other useful information as it becomes available. Next year’s conference TC 40

References (262)

AENOR. 2012. Norma UNE 153010. Subtitulado para personas sordas y personas con discapacidad auditiva. AENOR, Madrid. http://implantecoclear.org/documentos/accesibilidad/UNE_153010_2012.pdf [last accessed September 02, 2017].
Allen, I. Elaine, Seaman Jeff. 2010. Class differences: Online education in the United States, 2010. Babson Survey Research Group.
Conseil Supérieur de l'Audiovisuel. 2011. Charte relative à la qualité du sous-titrage à destination des personnes sourdes ou malentendantes. http://www.csa.fr/en/Espace-juridique/Chartes/Charte-relative-a-la-qualite-du- sous-titrage-a-destination-des-personnes-sourdes-ou-malentendantes-Decembre-2011 [last accessed September 02, 2017].
Del Pozo, Arantza. 2014. SUMAT Final Report. http://www.fp7-sumat-project.eu/uploads/2014/07/D1-5_Final- Report-June-2014.pdf [last accessed September 02, 2017].
Díaz Cintas, Jorge. 2014. Technological Strides in Subtitling. In Chan, S-W, editor in chief, Routledge Encyclopedia of Translation Technology. Routledge, London, pages 632-643.
Díaz Cintas, Jorge. 2003. Teoría y práctica de la subtitulación: inglés-español. Barcelona: Ariel. Docebo. 2014. E-Learning Market Trends and Forecast 2017-2021. https://www.docebo.com/resource/elearning-market-trends-and-forecast-2017-2021/ [last accessed September 02, 2017].
Ford Williams, Gareth. 2009. Inline Subtitling Editorial Guidelines V1.1. http://www.bbc.co.uk/guidelines/futuremedia/accessibility/subtitling_guides/online_sub_editorial_guidelines _vs1_1.pdf [last accessed September 02, 2017].
Ivarsson, Jan, and Carroll, Mary. 1998. Code of good subtitling practice. In Language Today, April: 1-3. https://www.esist.org/wp-content/uploads/2016/06/Code-of-Good-Subtitling-Practice.PDF.pdf [last accessed September 02, 2017].
Karamitroglou, Fotios. 1998. A Proposed Set of Subtitling Standards in Europe. In Translation Journal, 2:2. http://translationjournal.net/journal/04stndrd.htm [last accessed September 02, 2017].
Melero, Maite, Oliver, Antoni, Badia, Toni. 2006. Automatic Multilingual Subtitling in the e-TITLE Project. http://www.mt-archive.info/Aslib-2006-Melero.pdf [last accessed September 02, 2017].
Munteanu, Cosmin, Baecker, Ron, and Penn, Gerald. 2008. Collaborative editing for improved usefulness and usability of transcript-enhanced webcasts. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, Italy, pages 373-382.
MUSA IST Project. http://sifnos.ilsp.gr/musa/ [last accessed September 02, 2017].
Patiniotaki, Emmanouela. 2013. Assistive Technology and Audiovisual Translation: A key combination for Access Services in Online Education. In A Global Village 11: 52-55. http://aglobalvillage.org/journal/issue11/e-democracy/emmanuela-patiniotaki/ [last accessed September 02, 2017].
Valor Miró, Juan Daniel, Silvestre-Cerdà, Joan Albert, Civera, Jorge, Turró, Carlos, and Juan, Alfons. 2015. Efficiency and usability study of innovative computer-aided transcription strategies for video lecture repositories. In Elsevier's Speech Communication journal, 74: 65-75.
Zhang, Dongsong, Zhou, Lina, Briggs, Robert O., and Nunamaker Jr., Jay F. 2006. Instructional video in e- learning: Assessing the impact of interactive video on learning effectiveness. Information and Management, 43: 15-27.
Corpas Pastor, G. and Fern, L. 2016. "A survey of interpreters' needs and practices related to language technology". Technical paper [FFI2012-38881-MINECO/TI-DT-2016-1]. University of Malaga.
Corpas Pastor, G. and Seghiri, M. (eds.) 2016. Corpus-based Approaches to Translation and Interpreting: From Theory to Applications. Frankfurt: Peter Lang.
Costa, H.; Corpas Pastor, G. and Durán Muñoz, I. 2014. "Technology-assisted Interpreting". vol. 25, nº 3, MultiLingual 143. April/May. 27-32.
Costa, H.; Corpas Pastor, G. and Durán Muñoz, I. (2018/in press). Assessing Terminology Management Systems for Interpreters". In G. Corpas Pastor and I. Durán Muñoz (eds.) Trends in e-tools and resources for translators and interpreters. Leiden: Brill. 57-84.
Fantinuoli, C. 2018/in press. "Computer-assisted interpreting: Challenges and Future Perspectives". In G. Corpas Pastor and I. Durán Muñoz (eds.) Trends in e-tools and resources for translators and interpreters. Leiden: Brill. 153-174.
Orlando, M. 2010. "Digital pen technology and consecutive interpreting: another dimension in note-taking training and assessment". The Interpreters' Newsletter 15. 71-86.
Pöchhacker, F. (ed.). 2015. Routledge Encyclopedia of Interpreting Studies. London: Routledge.
Sandrelli, A. 2005. "Designing CAIT (Computer-Assisted Interpreter Training) Tools: Black Box." MuTra 2005 -Challenges of Multidimensional Translation: Conference Proceedings. Saarbrücken, 2-6 May 2005.
Sandrelli, A. and Hawkins, J. 2006. "From Black Box to the Virtual Interpreting Environment (VIE): another step in the development of Computer Assisted Interpreter Training." The Future of Conference Interpreting: Training, Technology and Research. University of Westminster, 30 June -1 July 2006. London.
Sandrelli, A. 2015. "Becoming an interpreter: the role of computer technology". MonTI. Monografías de Traducción e Interpretación 2. 111-138.
Straniero, S. and Falbo, C. (eds.) 2012. Breaking Ground in Corpus-based Interpreting Studies. Frankfurt: Peter Lang. References
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, Ł., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and Zheng, X. (2015). TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software available from tensorflow.org.
Becerra-Bonache, L. & Jiménez López, M.D (2016). Could Machine Learning Shed Light on Natural Language Complexity? Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity, pages 1-11, Osaka, Japan, December 11-17.
Bahdanau, D., Cho, K., and Bengio, Y. (2015). Neural Machine Translation by Jointly Learning to Align and Translate. In ICLR 2015, pages 3104-3112, San Diego, California, USA.
Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural Machine Translation by Jointly Learning to Align and Translate. CoRR, abs/1409.0473. Retrieved from http://arxiv.org/abs/1409.0473
Bérard, A., Pietquin, O., Besacier, L., Servan, C. Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text Translation. NIPS Workshop on end-to-end learning for speech and audio processing, Dec 2016, Barcelona, Spain. 2016. 〈hal-01408086〉
Cho, K., Merrienboer, B. van, Gülçehre, Ç., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. CoRR, abs/1406.1078. Retrieved from http://arxiv.org/abs/1406.1078
Chorowski, J. K., Bahdanau, D., Serdyuk, D., Cho, K., and Bengio, Y. (2015). Attention-Based Models for Speech Recognition. In Advances in Neural Information Processing Systems (NIPS 2015), pages 577-585, Montréal, Canada.
Dupoux, E. (2016). Cognitive Science in the era of Artificial Intelligence: A roadmap for reverse-engineering the infant language-learner. CoRR, abs/1607.08723. Retrieved from http://arxiv.org/abs/1607.08723
Ehrensberger-Dow, M., Göpferich, S., & O'Brien, S. (2015). Interdisciplinarity in Translation and Interpreting Process Research. John Benjamins Publishing Company.
Jean, S., Cho, K., Memisevic, R., & Bengio, Y. (2014). On Using Very Large Target Vocabulary for Neural Machine Translation. CoRR, abs/1412.2007. Retrieved from http://arxiv.org/abs/1412.2007
Kingma, D. and Ba, J. (2014). Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980.
Koskinen, Kaisa et Minna Ruokonen, Love letters or hate mail? Translators' technology acceptance in the light of their emotional narratives. In D. Kenny (Ed.), Human issues in translation technology, Londres et New York, Routledge, 2017, p. 8-24.
Knowles, R., & Koehn, P. (2016). Neural interactive translation prediction. AMTA 2016, Vol., 107. Philipp Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin,A., and Herbst, E., 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, pages 177- 180, Prague, Czech Republic, June. Association for Computational Linguistics.
Koehn, P. (2017). Introduction to Neural Machine Translation, webinar du 24 janvier 2017, Webinar series by Omniscien Technologies.
Koehn, P. et al. (2013). Cognitive Analysis and Statistical Methods for Advanced Computer Aided Translation (CASMACAT), Final Public Report. http://www.casmacat.eu/uploads/Deliverables/final-public-report.pdf [consulté le 26 janvier 2017].
Kublička, F., Toral, A., Sanchez-Cartagena, V. (2017) Fine-grained human evaluation of neural versus phrase- based machine translation. The Prague Bulletin of Mathematical Linguistics. Available from: https://www.researchgate.net/publication/317304955_Fine- Grained_Human_Evaluation_of_Neural_Versus_Phrase-Based_Machine_Translation Martikainen, H., & Kübler, N. (2016). Ergonomie cognitive de la post-édition de traduction automatique : enjeux pour la qualité des traductions. ILCEA. Revue de l'Institut des langues et cultures d'Europe, Amérique, Afrique, Asie et Australie, (27). Consulté à l'adresse https://ilcea.revues.org/3863
O'Brien, S. (2012) Translation as human-computer interaction. Translation Spaces, Vol. 1(1), pages 101-122.
Rossi, C. (submitted) 'Introducing statistical machine translation in translator training: from uses and perceptions to course design, and back again' Tradumatica.
Sutskever, I., Vinyals, O., and Le, Q. V. (2014). Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems (NIPS 2014), pages 3104-3112, Montre ál, Canada.
Vilar, D. Xu, J., D'Haro L. F., et al., 2006. Error analysis of statistical machine translation output. In : Proceedings of LREC. 2006. p. 697-702.
Zaremba, W., Sutskever, I., and Vinyals, O. (2014). Recurrent neural network regularization. arXiv preprint arXiv:1409.2329.
Berber-Irabien, Diana-Cristina. 2010. Information and Communication Technologies in Conference Interpreting. Lambert Academic Publishing.
Biagini, Giulio. 2016. Printed glossary and electronic glossary in simultaneous interpretation: a comparative study. Master's thesis, Università degli studi di Trieste.
Corpas Pastor, Gloria and Lily May Fern. 2016. A survey of interpreters' needs and their practices related to language technology. Technical report, Universidad de Málaga.
Fantinuoli, Claudio. 2016. InterpretBank. Redefining computer-assisted interpreting tools. In Proceedings of the Translating and the Computer 38 Conference. Editions Tradulex, London, pages 42-52.
Fantinuoli, Claudio. 2017a. Computer-assisted interpreting: challenges and future perspectives. In Isabel Durán Muñoz and Gloria Corpas Pastor, editors, Trends in e-tools and resources for translators and interpreters, Brill, Leiden.
Fantinuoli, Claudio. 2017b. Computer-assisted preparation in conference interpreting. Translation & Interpreting 9(2).
Gile, Daniel. 2009. Basic Concepts and Models for Interpreter and Translator Training: Revised edition. John Benjamins Publishing Company, Amsterdam, 2nd edition.
Hansen-Schirra, Silvia. 2012. Nutzbarkeit von Sprachtechnologien für die Translation. trans-kom 5(2):211-226.
Jurafsky, Dan and James H. Martin. 2009. Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall series in artificial intelligence. Pearson Prentice Hall, Upper Saddle River, N.J, 2nd edition.
Kalina, Sylvia. 2005. Quality Assurance for Interpreting Processes. Meta: Journal des traducteurs 50(2):768.
Orlando, Marc. 2014. A study on the amenability of digital pen technology in a hybrid mode of interpreting: Consec-simul with notes. Translation & Interpreting 6(2).
Prandi, Bianca. 2015. The use of CAI tools in interpreters' training: a pilot study. In Proceedings of the 37 conference Translating and the Computer. Editions Tradulex, London, pages 48-57.
Prandi, Bianca. 2017. Designing a multimethod study on the use of cai tools during simultaneous interpreting. In Translating and the Computer 39. London.
Pöchhacker, Franz. 2016. Introducing Interpreting Studies. Routledge, 2nd edition.
Setton, Robin and Andrew Dawrant. 2016. Conference interpreting: a complete course. Number volume 120 in Benjamins translation library (BTL). John Benjamins Publishing Company, Amsterdam & Philadelphia.
Tripepi Winteringham, Sarah. 2010. The usefulness of ICTs in interpreting practice. The Interpreters' Newsletter 15:87-99.
Will, Martin. 2015. Zur Eignung simultanfähiger Terminologiesysteme für das Konferenzdolmetschen. trans-kom 8(1):179-201.
Xu, Ran. 2015. Terminology Preparation for Simultaneous Interpreters. PhD thesis, University of Leeds.
Yu, Dong and Li Deng. 2015. Automatic speech recognition: a deep learning approach. Springer, London. References Hans van Halteren, 2008. Source Language Markers in EUROPARL Translations. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pages 937-944.
Behl, Holly. 2013a. The paperless interpreter experiment: Part I. http://www.paperlessinterpreter.com/paperless- interpreter-part-i/
Behl, Holly. 2013b. The paperless interpreter experiment: Part II. http://www.paperlessinterpreter.com/paperless-interpreter-part-ii/
Behl, Holly. 2015. The paperless interpreter experiment Part III: Microsoft Surface Pro 4. http://www.paperlessinterpreter.com/the-paperless-interpreter-experiment-part-iii-microsoft-surface-pro-4/
Creswell, John W., & Plano Clark, Vicki L. 2011. Designing and conducting mixed methods research (2nd ed.). Thousand Oaks, California: SAGE Publications.
Creswell, John W., Plano Clark, Vicki L., Gutmann, Michelle L., & Hanson, William E. 2003. Advanced mixed methods research designs. In Abbas Tashakkori & Charles Teddlie (Eds.), Handbook of mixed methods in social and behavioral research (pp. 209-240). Thousand Oaks, CA: SAGE Publications.
Costa, Hernani, Corpas Pastor, Gloria & Durán Muñoz, Isabel. 2014. "A comparative user evaluation of terminology management tools for interpreters." In 25th International Conference on Computational Linguistics (COLING' 14), 4th International Workshop on Computational Terminology (CompuTerm'14), pp. 68-76, Dublin, Ireland.
Costa, Hernani, Corpas Pastor, Gloria & Durán Muñoz, Isabel. 2014. "Technology-assisted interpreting." Multilingual 143, 27-32.
Drechsel, Alexander. 2013a. The tablet interpreter. http://vkdblog.files.wordpress.com/2013/03/tabletinterpreter- public.pdf
Drechsel, Alexander. 2013b, November 20. iPad interpreter. http://www.youtube.com/watch?v=qk3RNDGpe0Y&list=PLklixbOFpKxodoeh8lua0Zh9BkeI4GwLo&index
Drechsel, Alexander. 2017. The tablet interpreter. (2017 edition). https://static1.squarespace.com/static/52d4015ce4b0eab6f2d76b6f/t/594b8b7a414fb54310f5957d/149812313 2497/The+Tablet+Interpreter+Manual.pdf
Drechsel, Alexander & Behl, Holly. 2016. Kiss paper goodbye: Tablet technology for consecutive and simultaneous interpreting. Paper presented at the ATA 57 th Annual Conference, San Francisco, California.
Drechsel, Alexander & Goldsmith, Joshua. Forthcoming. Tablet Interpreting: The use of mobile devices in interpreting. In CIUTI-Forum 2016: Equitable Education through intercultural communication: Role and responsibility for non-state actors (eds. Forstner, Martin & Lee-Jahnke, Hannelore). Frankfurt am Main: Peter Lang.
El-Metwally, Maha. 2017. Consec-Simo as a tool for Consecutive Interpreting. Webinar presented online through eCPD webinars.
Goldsmith, Joshua. Forthcoming 2018. Tablet interpreting: Consecutive interpreting 2.0 for Public Service Interpreters. Translation and Interpreting Studies 13(3).
Goldsmith, Joshua & Drechsel, Alexander. 2015a. The tablet interpreter. Talk presented at the 2015 CIUTI Conference, Geneva, Switzerland.
Goldsmith, Joshua & Drechsel, Alexander. 2015b. Is there an app for that? Getting the most out of tablets in community interpreting. Workshop presented at the 2015 Critical Link Conference, Edinburgh.
Goldsmith, Joshua & Drechsel, Alexander. 2016. Tablet interpreting: Tips, tools and applications to make the most of your tablet while interpreting. Webinar presented at the Proz 2016 Virtual Conference for International Translation Day.
Goldsmith, Joshua & Holley, Josephine. 2015. Consecutive Interpreting 2.0: The Tablet Interpreting Experience." (Unpublished MA thesis.) University of Geneva.
Hamidi, Miriam & Pöchhacker, Franz. 2007. "Simultaneous consecutive interpreting: A new technique put to the test." Meta: Journal des traducteurs (52.2), 276-289.
Hof, Michelle. 2012. iPad: The ideal boothmate. http://aiic.net/p/6354.
Lozano, Luis, García-Cueto, Eduardo & Muñiz, José. 2008. "Effect of the number of response categories on the reliability and validity of rating scales." Methodology 2008:4, 73-79.
Moors, Guy. 2007. "Exploring the effect of a middle response category on response style in attitude management." Quality & Quantity 42:6, 779-794.
Muñiz, José, García-Cueto, Eduardo, & Lozano, Luis. 2005. "Item format and the psychometric properties of the Eysenck Personality Questionnaire." Personality and Individual Differences, 38, 61-69.
Oceguera López, Patricia. 2017. El uso de aplicaciones para tablets en la toma de notas del intérprete. (Unpublished BA thesis.) Universidad Autónoma de Baja California, Mexico.
Orlando, Marc. 2010. "Digital Pen Technology and Consecutive Interpreting: Another Dimension in Note- Taking Training and Assessment." The Interpreters' Newsletter 15, 71-86.
Orlando, Marc. 2014. "A study on the amenability of digital pen technology in a hybrid mode of interpreting: Consec-simul with notes." International Journal of Translation and Interpreting Research 6(2), 39-54. http://www.trans-int.org/index.php/transint.
Orlando, Marc. 2015. "Implementing digital pen technology in the consecutive interpreting classroom." In Andres, Dorte & Behr, Martina (eds.). To Know How to Suggest ... Approaches to Teaching Conference Interpreting, 171-200. Berlin: Frank & Timme.
Orlando, Marc. 2015. "Digital pen technology and interpreting training, practice and research: Status and trends." In S. Erlich and J. Napier (Eds.), Interpreter education in the digital age: Innovation, access and change, 125-152. Washington, DC: Gallaudet University Press.
Orlando, Marc. 2016. Training 21 st century translators and interpreters: At the crossroads of practice, research and pedagogy. Berlin: Frank & Timme.
Paone, Matteo Domenico. 2016. Mobile Geräte beim Simultandolmetschen mit besonderem Bezug auf Tablets (Unpublished MA thesis.) University of Vienna, Austria.
Rosado, Tony. 2013. Note-taking with iPad: Making our life easier. http://rpstranslations.wordpress.com/2013/05/28/note-taking-with-ipad-making-our-life-easier-2/
Scott, Juliette. 2012. One interpreter's road kit. http://www.catherinetranslates.com/interpreter-road-kit Sullivan, Gail, & Artino, Anthony Jr. 2013. "Analyzing and interpreting data from Likert-type scales." Journal of Graduate Medical Education 5(4): 541-2.
Will, Martin. 2015. "Zur Eignung simultanfähiger Terminologiesysteme für das Konferenzdolmetschen." trans- kom 8(1), 179-201.
DePalma, Don. 2013. Post-editing in practice. In tcworld e-magazine. http://www.tcworld.info/e- magazine/translation-and-localization/article/post-editing-in-practice/ [last accessed October 16, 2017].
KantanMT blog. 2014. Post-Editing Machine Translation. https://kantanmtblog.com/2014/08/20/post-editing- machine-translation/ [last accessed October 16, 2017].
Kelly, Nataly. 2014. Why So Many Translators Hate Translation Technology. Article on HuffPost. https://www.huffingtonpost.com/nataly-kelly/why-so-many-translators-h_b_5506533.html [last accessed October 16, 2017].
Memsource blog. 2015. Post-editing in Memsource. https://www.memsource.com/blog/2015/08/18/post-editing- in-memsource/ [last accessed October 16, 2017].
O'Brien, Sharon and Joss Moorkens. 2014. Towards Intelligent Post-Editing Interfaces. In Proceedings of FIT XXth World Congress 2014, 4-6 Aug 2014, Berlin, Germany, pages 131-137.
Stafilia, Dimitra. 2016. Translating Europe Forum 2016: A lot of talk about technology. Post on the PEEMPIP blog. http://blog.peempip.gr/tef-2016-technology/ [last accessed October 16, 2017].
References Braun, Sabine. 2015. 'Remote Interpreting'. In The Routledge Handbook of Interpreting, edited by Holly Mikkelson and Renee Jourdenais, 352-67. London/New York: Routledge Taylor and Francis Group.
Braun, Sabine, and Catherine Slater. 2014. 'Populating a 3D Virtual Learning Environment for Interpreting Students with Bilingual Dialogues to Support Situated Learning in an Institutional Context'. The Interpreter and Translator Trainer 8 (3): 469-85. doi:10.1080/1750399X.2014.971484.
Class, Barbara, and Barbara Moser-Mercer. 2013. 'Training Conference Interpreter Trainers with Technology -a Virtual Reality'. In Quality in Interpreting: Widening the Scope, Volume 1, edited by O. Garcia Becerra, E.M. Pradas Macias, and R. Barranco-Droege, 293-313. Granada: Editorial Comares S.L. González-Davies, Maria, and Vanessa Enríquez-Raído. 2016. 'Situated Learning in Translator and Interpreter Training: Bridging Research and Good Practice'. The Interpreter and Translator Trainer 10 (1): 1-11. doi:10.1080/1750399X.2016.1154339.
Kajzer-Wietzrzny, Marta, and Maria Tymczynska. 2014. 'Integrating Technology into Interpreter Training Courses: A Blended Learning Approach'. Edited by Maria Piotrowska and Sergiy Tyupa. inTRAlinea. http://www.intralinea.org/specials/article/2101.
Kiraly, Donald C., ed. 2016. Towards Authentic Experiential Learning in Translator Education. Göttingen: V&R unipress/Mainz University Press.
Olvera-Lobo, María Dolores, Bryan Robinson, José A. Senso, Ricardo Muñoz-Martín, Eva Muñoz-Raya, Miguel Murillo-Melero, Enrique Quero-Gervilla, María Rosa Castro-Prieto, and Tomás Conde-Ruano. 2009. 'Teleworking and Collaborative Work Environments in Translation Training'. Babel 55 (2): 165-80. doi:10.1075/babel.55.2.05olv.
Ritsos, P.D., R. Gittins, Sabine Braun, Catherine Slater, and J.C. Roberts. 2013. 'Training Interpreters Using Virtual Worlds'. LNCS Transactions on Computational Science XVIII (7848): 21-40. doi:10.1007/978- 3-642-38803-3_2.
Sandrelli, Annalisa, and Jesús de Manuel Jerez. 2007. 'The Impact of Information and Communication Technology on Interpreter Training'. The Interpreter and Translator Trainer 1 (2): 269-303. doi:10.1080/1750399X.2007.10798761.
Baziotis, Christos, Nikos Pelekis, and Christos Doulkeridis. 2017. Datastories at SemEval-2017 task 4: Deep LSTM with attention for message-level and topic-based sentiment analysis. In Proceedings of the 11th International Workshop on Semantic Evaluation. Vancouver, Canada, pages 747-754.
Bojar, Ondřej, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Shujian Huang, Matthias Huck, Philipp Koehn, Qun Liu, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Raphael Rubino, Lucia Specia, and Marco Turchi. 2017. Findings of the 2017 conference on machine translation (WMT17). In Proceedings of the Second Conference on Machine Translation, Volume 2: Shared Task Papers. Copenhagen, Denmark, pages 169-214.
Cadwell, Patrick, Sharon O'Brien, and Carlos S. C. Teixeira. 2017. Resistance and accommodation: factors for the (non-) adoption of machine translation among professional translators. To appear in Perspectives, available online at http://www.tandfonline.com/doi/full/10.1080/0907676X.2017.1337210.
Cieliebak, Mark, Jan Milan Deriu, Dominic Egger, and Fatih Uzdilli. 2017. A Twitter corpus and benchmark resources for German sentiment analysis. In Proceedings of the 5th International Workshop on Natural Language Processing for Social Media. Valencia, Spain, pages 45-51.
Desjardins, Renée. 2016. Translation and Social Media: In Theory, in Training and in Professional Practice. Palgrave Macmillan.
Doherty, Stephen and Dorothy Kenny. 2014. The design and evaluation of a statistical machine translation syllabus for translation students. The Interpreter and Translator Trainer 8(2):295-315.
Dombek, Magdalena. 2014. A study into the motivations of internet users contributing to translation crowdsourcing: the case of Polish Facebook user-translators. Ph.D. thesis, Dublin City University.
Flanagan, Marian. 2016. Cause for concern? Attitudes towards translation crowdsourcing in professional translators' blogs. The Journal of Specialised Translation 25:149-173.
Gaspari, Federico, Hala Almaghout, and Stephen Doherty. 2015. A survey of machine translation competences: insights for translation technology educators and practitioners. Perspectives 23(3):333-358.
Green, Spence, Jeffrey Heer, and Christopher D. Manning. 2013. The efficacy of human post-editing for language translation. In Proceedings of the 2013 Conference on Human Factors in Computing Systems (CHI). Paris, France.
Guerberof Arenas, Ana. 2013. What do professional translators think about post-editing? The Journal of Specialised Translation 19:75-95.
Heard, Reiner. 2017. FIT position paper on machine translation. Babel 63(1):130-135.
Hochreiter, Sepp and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9(8):1735-1780.
Jiménez-Crespo, Miguel A. 2017. Crowdsourcing and online collaborative translations: Expanding the limits of Translation Studies, volume 131 of Benjamins Translation Library. John Benjamins.
Koponen, Maarit. 2016. Is machine translation post-editing worth the effort? A survey of research into post-editing and effort. The Journal of Specialised Translation 25:131-148.
McDonough Dolmaya, Julie. 2011. A window into the profession: What translation blogs have to offer translation studies. The Translator 17(1):77-104.
Meijer, Siety. 1993. Attitudes towards machine translation. Language International 5(6):11-13.
O'Hagan, Minako. 2017. Deconstructing translation crowdsourcing with the case of a Facebook initiative. In Dorothy Kenny, editor, Human Issues in Translation Technology, Routledge, pages 25-44.
Pym, Anthony. 2011. What technology does to translating. Translation & Interpreting 3(1):1.
Rocktäschel, Tim, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiský, and Phil Blunsom. 2015. Reasoning about entailment with neural attention. arXiv preprint arXiv:1509.06664.
Rosenthal, Sara, Noura Farra, and Preslav Nakov. 2017. SemEval-2017 task 4: sentiment analysis in Twitter. In Proceedings of the 11th International Workshop on Semantic Evaluation. Vancouver, Canada, pages 502-518.
Sennrich, Rico. 2017. How grammatical is character-level neural machine translation? Assessing MT quality with contrastive translation pairs. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL). Valencia, Spain, pages 376-382.
Tyulenev, Sergey. 2015. Towards theorising translation as an occupation. Asia Pacific Translation and Intercultural Studies 2(1):15-29.
Blanchon, Hervé. 2004. Comment définir, mesurer et améliorer la qualité, l'utilisabilité et l'utilité des systèmes de TAO de l'écrit et de l'oral. Une bataille contre le bruit, l'ambiguïté, et le manque de contexte. Mémoire de HDR, 380 p., Université Joseph Fourier, Grenoble.
Blanchon, Hervé. 1992. A Solution to the Problem of Interactive Disambiguation. COLING 1992, 23-28 juillet. Nantes, France. 1233-1238.
Blanchon, Hervé. 1994. LIDIA-1 : une première maquette vers la TA Interactive "pour tous". Thèse de doctorat, Université Joseph Fourier -Grenoble I, Grenoble, 1994.
Blanchon, Hervé, and Boitet, Christian. 2007. Pour l'évaluation externe des systèmes de TA par des méthodes fondées sur la tâche. TAL, 2007: 33-65.
Boitet, Christian. 1995. Factors for success (and failure) in Machine Translation-some lessons of the first 50 years of R&D. Proc. MTS-V (Fifth Machine Translation Summit), 11-13 juillet, Luxembourg.
Boitet, Christian. 1976. Un essai de réponse à quelques questions théoriques et pratiques liées à la traduction automatique : définition d'un système prototype. Modélisation et simulation. Thèse de Doctorat d'État, Université Scientifique et Médicale de Grenoble, 250 p.
Boitet, Christian, and Blanchon, Hervé. 1994a. Promesses et problèmes de la "TAO pour tous". Après LIDIA-1, une première maquette. Langages. Le traducteur et l'ordinateur, sous la direction de Jean-René Ladmiral, 1994a: 20-47.
Boitet, Christian, and Blanchon, Hervé. 1994b. Multilingual Dialogue-Based MT for monolingual authors: the LIDIA project and a first mockup. Machine Translation 2, 99-132.
Chan, Andy Lung Jan. 2010. Perceived benefits of translator certification to stakeholders in the translation profession: A survey of vendor managers. Across Languages and Cultures 11, n° 1, 93-113.
Dam, Helle, and Zethsen, Karen. 2010. Translator status: Helpers and opponents in the ongoing battle of an emerging profession. Target 22, n° 2, 194-211.
Huynh, Cong Phap, Valérie Bellynck, Christian Boitet, et Hong Thai Nguyen. 2010. The iMAG concept: multilingual access gateway to an elected Web sites with incremental quality increase through collaborative post-edition of MT pretranslations. TALN-2010, Montréal.
Huynh, Cong-Phap. , multiannotés et multimédia. Thèse de doctorat, Université Joseph Fourier. Lardilleux, Adrien. 2010. Contribution des basses fréquences à l'alignement sous-phrastique multilingue : une approche différentielle. Thèse de doctorat, Université de Caen -Human-Computer Interaction.
Lemaire, Claire. 2016. Linguistic methodology to help German and French non-translator users to write bilingual specifications. 38. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft. Universität Konstanz.
Lemaire, Claire. 2017. Traductologie et traduction outillée : du traducteur spécialisé professionnel à l'expert métier en entreprise. Thèse de doctorat, Université Grenoble Alpes.
Nguyen, Hong-Thai. 2009. Des systèmes de TA homogènes aux systèmes de TAO hétérogènes. Interface homme-machine. Thèse de doctorat, Université Joseph-Fourier -Grenoble I.
Sérasset, Gilles & Boitet, Christian. 2000. On UNL as the future "html of the linguistic content" & the reuse of existing NLP components in UNL-related applications with the example of a UNL-French deconverter, Proc. COLING 2000, Saarbrücken, Germany, 7 p.
Slocum, Jonathan. 1985. A survey of machine translation: its history, current status, and future prospects. Computational linguistics 11.1, 1-17.
Vasconcellos, Muriel. 1995. Advanced software applications in Japan. Elsevier.
Vasconcellos, Muriel. 1993. The Present State of Machine Translation Usage Technology; Or: How Do I Use Thee? Let Me Count the Ways. MT Summit IV, July 20-22. Kobe, Japan. 47-62.
Zhang, Ying. 2016. Modèles et outils pour des bases lexicales "métier" multilingues et contributives de grande taille, utilisables tant en traduction automatique et automatisée que pour des services dictionnairiques variés. Thèse de doctorat, Université Grenoble Alpes.
Berber Irabien, Diana. 2010. Information and Communication Technologies in Conference Interpreting. Lambert Academic Publishing.
Biagini, Giulio. 2015. Glossario cartaceo e glossario elettronico durante l'interpretazione simultanea: uno studio comparativo (MA thesis). Università di Trieste.
Bilgen, Baris. 2009. Investigating Terminology Management for Conference Interpreters (MA thesis). University of Ottawa.
Corpas Pastor, Gloria and Lily May Fern. 2016. A survey of interpreters' needs and their practices related to language technology. Technical report, Universidad de Málaga.
Costa, Hernani, Gloria Corpas Pastor, and Isabel Durán Muñoz. 2014a. A comparative user evaluation of terminology management tools for interpreters. In Proceedings of the 4th International Workshop on Computational Terminology, pages 68-76.
Costa, Hernani, Gloria Corpas Pastor, and Isabel Durán Muñoz. 2014b. Technology-assisted interpreting. MultiLingual, 143, 25(3):27-32.
De Merulis, Gianpiero. 2013. L'uso di InterpretBank per la preparazione di una conferenza sul trattamento delle acque reflue: glossario terminologico e contributo sperimentale (MA thesis). Università di Bologna.
Fantinuoli, Claudio. 2009. InterpretBank: Ein Tool zum Wissensmanagement für Simultandolmetscher. In Wolfram Baur, Sylvia Kalina, Felix Mayer and Jutta Witzel (Eds.). Übersetzen in die Zukunft: Herausforderungen der Globalisierung für Dolmetscher und Übersetzer: Tagungsband der Internationalen Fachkonferenz des Bundesverbandes der Dolmetscher und Übersetzer e.V., BDÜ, Berlin, pages 411-417.
Fantinuoli, Claudio. 2012. InterpretBank. Design and implementation of a terminology and knowledge management software for conference interpreters. Epubli/Johannes Gutenberg-Universität Mainz, Berlin.
Fantinuoli, Claudio. 2016. InterpretBank. Redefining computer-assisted interpreting tools. In Proceedings of the Translating and the Computer 38 Conference in London. Editions Tradulex, Geneva, pages 42-52.
Fantinuoli, Claudio. 2017a. Computer-assisted preparation in conference interpreting. In Translation & Interpreting, 9(2).
Fantinuoli, Claudio. 2017b. Speech recognition in the interpreter workstation. In Proceedings of the Translating and the Computer 39 Conference in London, London.
Gacek, Michael. 2015. Softwarelösungen für DolmetscherInnen (MA thesis). University of Vienna.
Gile, Daniel. 1988. Le partage de l'attention et le 'modèle d'effort' en interprétation simultanée. The Interpreters' Newsletter, 1:4-22.
Gile, Daniel. 1997. Conference interpreting as a cognitive management problem. In Joseph H. Danks, Gregory M. Shreve, Stephen B. Fountain and Michael McBeath (Eds.). Cognitive Processes in Translation and Interpretation. Sage, London, pages 196-214.
Gile, Daniel. 1999. Testing the Effort Models' tightrope hypothesis in simultaneous interpreting -A contribution". Hermes -Journal of Linguistics 23.
Kahneman, Daniel. 1973. Attention and Effort. Prentice-Hall, Englewood Cliffs, NJ.
Prandi, Bianca. 2015a. L'uso di InterpretBank nella didattica dell'interpretazione: uno studio esplorativo (MA thesis). Università di Bologna.
Prandi, Bianca. 2015b. The use of CAI tools in interpreters' training: a pilot study. In Proceedings of the Translating and the Computer 37 Conference, London.
Prandi, Bianca. 2016. Analysis of the impact of CAI tools on simultaneous interpreting with a focus on cognitive processes and terminology consistency. Poster presented at TRA&CO Symposium, Johannes Gutenberg University Mainz/Germersheim.
Prandi, Bianca. 2017. Investigating cognitive load in simultaneous interpreting with the support of terminology management tools. Poster presented at 5 th Polish Eye Tracking Conference, Lublin.
Rütten, Anja. 28 April 2015. Summary table of terminology tools for interpreters / Übersichtstabelle Terminologietools für Dolmetscher / cuadro sinóptico de programas de gestión de terminología para intérpretes. http://blog.sprachmanagement.net/?p=706 [last accessed September 24, 2017]
Seeber, Kilian G. 2007. Thinking outside the cube: Modelling language processing tasks in a multiple resource paradigm. Interspeech 2007, Antwerp, pages 1382-1385.
Seeber, Kilian G. 2011. Cognitive load in simultaneous interpreting: Existing theories -new models. Interpreting 13 (2):176-204.
Seeber, Kilian G. and Dirk Kerzel. 2011. Cognitive load in simultaneous interpreting: Model meets data. Special issue of the International Journal of Bilingualism 16 (2):228-242.
Setton, Robin. 2003. Models of the interpreting process. In Angela Collados Aís and José Antonio Sabio Panilla (Eds.). Avances en la investigación sobre la interpretación (Advances in research on interpreting), Editorial Comares, Granada, pages 29-91.
Setton, Robin. 2013. Models of Interpreting. In Carol A. Chapelle (Ed.). The Encyclopaedia of Applied Linguistics. Wiley-Blackwell, Oxford, UK.
Setton, Robin. 2016. Models. In Franz Pöchhacker (Ed.). Encyclopaedia of Interpreting Studies. Routledge, London, pages 263-268.
Shannon, Claude E., and Warren Weaver. 1949. The Mathematical Theory of Communication. University of Illinois Press, Urbana.
Wickens, Christopher D. 1984. Processing resources in attention. In Raja Parasuraman and David R. Davies (Eds.). Varieties of attention. Academic Press, New York, pages 63-102.
Wickens, Christopher D. 2002. Multiple resources and performance prediction. Theoretical issues in ergonomics science, 3(2), pages 159-177.
Will, Martin. 2015. Zur Eignung simultanfähiger Terminologiesysteme für das Konferenzdolmetschen. Trans- Kom, 8(1):179-201.
Xu, Ran. 2015. Terminology Preparation for Simultaneous Interpreters (PhD thesis). University of Leeds. References
Alabau V, Buck C, Carl M, Casacuberta F, Garcia-Martinez M, Germann U, Gonzalez-Rubio J, Hill R, Leiva L, Mesa-Lao B, Ortiz D, Saint-Amand H, Sanchis G and Tsoukala C (2014), "CASMACAT: A Computer-assisted Translation Workbench", In Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics., 4, 2014. , pp. 25-28.
Eisele A, Federmann C, Saint-Armand H, Jellinghaus M, Herrmann T and Chen Y (2008), "Using Moses to Integrate Multiple Rule-Based Machine Translation Engines into a Hybrid System", In Proceedings of the Third Workshop on Statistical Machine Translation., 6, 2008. , pp. 179-182.
Hawkins J and Blakeslee S (2005), "On Intelligence" New York, Owl Books.
Koehn P, Och F J and Marcu D (2003), "Statistical Phrase-Based Translation", In Proceedings of HLT-NAACL 2003, Main Papers., 5, 2003. , pp. 48-54.
Kurzweil R (2012), "How to create a mind", Viking Penguin.
Nagle P (2017), "Get the Best from Neural MT with Quality Data". KantanMT, URL: https://kantanmtblog.com/ 2017/08/04/get-the-best-from-neural-mt-with-quality-data/ Retrieved 14:32 1-10-2017.
Ortiz-Martinez D and Casacuberta F (2014), "The New THOT Toolkit for Fully-Automatic and Interactive Statistical Machine Translation", In Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics., 4, 2014. , pp. 45-48.
Rees N and Riding J (2009), "Automatic Concordance Creation for Texts in Any Language", In Proceedings of Translation and the Computer 31. IMI/ASLIB.
Riding J (2007), "A relational method for the automatic analysis of highly-inflectional agglutinative morphologies". Thesis at: Oxford Brookes University (MPhil).
Riding J (2008), "Statistical Glossing, Language Independent Analysis in Bible Translation", In Translating and the Computer 30. ASLIB/IMI.
Riding J and van Steenbergen G (2011), "Glossing Technology in Paratext 7", The Bible Translator. Vol. 62(2), pp. 92-102.
Sanchez-Cartagena VM, Perez-Ortiz JA and Sanchez-Marinez F (2016), "Integrating Rules and Dictionaries from Shallow-Transfer Machine Translation into Phrase-Based Statistical Machine Translation", Journal for Artificial Intelligence Research, 1, 2016. (55)
Schone P and Jurafsky D (2000), "Knowledge-Free Induction of Morphology Using Latent Semantic Analysis", In Proceedings of CoNLL-2000 and LLL-2000. Lisbon , pp. 67-72.
Shterimov D, Nagle P, Casanellas L, Superbo R and O'Dowd T (2017), "Empirical evaluation of NMT and PBSMT quality for large-scale translation production". Thesis at: KantanMT.
Staley, J. (1986) "The Structure of John's Prologue: Its Implications for the Gospel's Narrative Structure", The Catholic Biblical Quarterly, 241-264
Tomasello M (2003), "Constructing a Language" Cambridge Mass., Harvard University Press.
Wendland E (2008), "Contextual Frames of Reference in Translation" Manchester & Kinderhook, St Jerome Publishing.
Wu Y, Schuster M, Chen Z, Le QV and Norouzi M (2016), "Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation". Thesis at: Google., 10, 2016.
Cognate: deus, deum? 2.
Cognate: ipso, ipsum? Cognate: facta, factum? Morph: _um?
Stem: de_? Stem: ips_? Stem: fact_?
Cognate: tenebrae, tenebris?
Name: Iohannes? Cognate: non, nomen? Gloss: de* = θε*
Cognate: omnes, omnia? Stem: e_t?
Cognate: hic, hoc? Cognate: ille, illum?
Cognate: hominem, hominum? Stem: homin_
Cognate: facta, factum, factus? Gloss: fact* = ποι* Cognate: mundo, mundum, mundus? Stem: mund_? Gloss: mund* = κοσμ* References AIIC. 2017. Workshop "Dolmetscher für Dolmetscher. https://aiic.de/event/8-dolmetscher-fuer-dolmetscher- workshop/ [last accessed September 20, 2017].
Rütten, Anja. 2015. "Summary table of terminology tools for interpreters". www.termtools.dolmetscher-wissen- alles.de [last accessed September 20, 2017].
Sand, Peter. 2003. "Manage your Terminology with Interplex". https://aiic.net/page/1321/manage-your- terminology-with-interplex/lang/1. In: The AIIC Webzine [last accessed September 20, 2017].
Wagener, Leonie. 2012. "Vorbereitende Terminologiearbeit im Konferenzdolmetschen unter besonderer Berücksichtigung der Zusammenarbeit im Dolmetschteam" [Preparatory Terminlogy Work in Conference Interpreting with special Emphasis on Cooperation in Teams of Interpreters]. Master thesis at the University of Applied Sciences Cologne, Faculty of Information and Communication Sciences, Institute for Translation and Multilingual Communication.
Zetzsche, Jost. 2017. "The Translator's Tool Box for Translators -A Computer Primer". Version 13, March 2017. International Writers' Group, LLC.
Biçici, Ergun and Marc Dymetman. 2008. Dynamic translation memory: using statistical machine translation to improve translation memory fuzzy matches. CICLing, Haifa, Israel, pages 454-465.
7Subset of 1.7 million sentences (Steinberger et al., 2013).
000 sentences. Provided through a confidentiality agreement.
Bowker, Lynne. 2002. Working Together: A Collaborative Approach to DIY Corpora. In The First International Workshop on Language Resources for Translation Work and Research, pages 29-36.
Brooke, John. 1996. SUS: A quick and dirty usability scale. In P. W. Jordan et al. (Eds.), Usability Evaluation in Industry, 189 (194). Taylor and Francis, London, pages 4-7.
Cadwell, Patrick, Sharon O'Brien, and Carlos S. C. Teixeira. 2017. Resistance and accommodation: factors for the (non-) adoption of machine translation among professional translators. In Perspectives Studies in Translatology, pages 1-21.
Delpech, Estelle. 2014. Leveraging Comparable Corpora for Computer-assisted Translation. In Comparable Corpora and Computer-Assisted Translation. John Wiley & Sons, Inc., Hoboken, NJ, USA, pages 1-39.
Ehrensberger-Dow, Maureen and Gary Massey. 2014. Cognitive ergonomic issues in professional translation. In: Schwieter, John W. & Aline Ferreira (eds), The Development of Translation Competence: Theories and Methodologies from Psycholinguistics and Cognitive Science. Newcastle upon Tyne: Cambridge Scholars Publishing, pages 58-86.
Ehrensberger-Dow, Maureen and Sharon O'Brien. 2015. Ergonomics of the translation workplace: Potential for cognitive friction. In Special Issue of Translation Spaces, 4 (1), pages 98-118.
Koehn, Philipp. 2009. Statistical machine translation. Cambridge: Cambridge University Press.
Koehn, Philipp and Barry Haddow. 2009. Interactive Assistance to Human Translators Using Statistical Machine Translation Methods. In Machine Translation Summit XII , pages 73-80.
Koehn, Philipp, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th ACL, demonstration session, pages 177-180.
Koehn, Philipp and Jean Senellart. 2010. Convergence of translation memory and statistical machine translation. In Proceedings of the 2nd Joint EM+/CNGL Workshop "Bringing MT to the User: Research on Integrating MT in the Translation Industry"(JEC '10), pages 21-31.
Krüger, Ralph. 2016. Contextualising Computer-Assisted Translation Tools and Modelling Their Usability. In Trans-Kom -Journal of Translation and Technical Communication Research, 9(1), pages 114-148.
Lagoudaki, Elina 2009. Translation editing environments. In MT Summit XII: Workshop Beyond Translation Memories.
Lavie, Alon and Abhaya Agarwal. 2007. METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In Proceedings of WMT, pages 228-231.
Levenshtein, Vladimir. 1966. Binary Codes Capable of Correcting Deletions, Insertions, and Reversals. In Soviet Physics-Doklady, 10, pages 707-710.
Li, Liangyou, Carla P. Escartín, Andy Way, and Qun Liu. 2017. Combining translation memories and statistical machine translation using sparse features. In Machine Translation, 30(3), pages 183-202.
Macken, Lieve, Els Lefever, and Véronique Hoste. 2013. TExSIS: Bilingual Terminology Extraction from Parallel Corpora Using Chunk-based Alignment. In Terminology, 19 (1), John Benjamins Publishing Company, Amsterdam, Netherlands, pages 1-30.
Moorkens, Joss and Sharon O'Brien. 2016. Assessing user interface needs of post-editors of machine translation. In Human Issues in Translation Technology: The IATIS Yearbook, pages 109-130.
Och, Franz J. and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. In Computational Linguistics, 29(1), pages 19-51.
Ortega, John E., Felipe Sánchez-Martınez, and Mikel L. Forcada. 2016. Fuzzy-match repair using black-box machine translation systems: what can be expected? In Proceedings of AMTA, Vol. 1, pages 27-39.
Sanchis-Trilles, Germán, Vicent Alabau, Christian Buck, Michael Carl, Francisco Casacuberta, Mercedes García- Martínez, Ulrich Germann, Jesús González-Rubio, Robin L. Hill, Philipp Koehn, Luis A. Leiva, Bartolomé Mesa-Lao, Daniel Ortiz-Martínez, Herve Saint-Amand, Chara Tsoukala, Enrique Vidal. 2014. Interactive translation prediction versus conventional post-editing in practice: a study with the CasMaCat workbench. In Machine Translation, 28(3-4), pages 217-235.
Steinberger, Ralf, Andreas Eisele, Szymon Klocek, Spyridon Pilos, and Patrick Schlüter. 2013. DGT-TM: A freely available translation memory in 22 languages. In Proceedings of LREC, pages 454-459.
Steurs, Frieda, Iulianna van der Lek-Ciudin, and Tom Vanallemeersch. 2016. How translators work in real-life: SCATE observations. In Translating for Europe Forum. Brussels, Belgium, 27-28 October 2016.
Teixeira, Carlos S. C. 2014. Perceived vs. measured performance in the post-editing of suggestions from machine translation and translation memories. In Third Workshop on Post-Editing Technology and Practice, pages 45-59.
Tezcan, Arda, Véronique Hoste, Bart Desmet, and Lieve Macken. 2015. UGENT-LT3 SCATE System for Machine Translation Quality Estimation. In Proceedings of WMT, pages 353-360.
Tiedemann, Jörg. 2009. News from OPUS-A collection of multilingual parallel corpora with tools and interfaces. In RANLP, pages 237-248.
Torregrosa, Daniel, Juan Antonio Pérez-Ortiz, and Mikel L. Forcada. 2017. Comparative Human and Automatic Evaluation of Glass-Box and Black-Box Approaches to Interactive Translation Prediction. The Prague Bulletin of Mathematical Linguistics, 108(1), pages 97-108.
Vanallemeersch, Tom and Vincent Vandeghinste. 2015. Assessing linguistically aware fuzzy matching in translation memories. In Proceedings of EAMT, Antalya, Turkey, pages 153-160.
Vandeghinste, Vincent, Tom Vanallemeersch, Frank Van Eynde, Lieve Macken, Els Lefever, Véronique Hoste, Marie-Francine Moens, Joris Pelemans, Patrick Wambacq, Mieke Haesen, Karin Coninx, and Ken De Wachter. 2014. Smart Computer Aided Translation Environment. In Proceedings of EAMT, Dubrovnik, Croatia, page 135.
Van den Bergh, Jan, Eva Geurts, Donald Degraen, Mieke Haesen, Iulianna van der Lek-Ciudin, and Karin Coninx. 2015. Recommendations for Translation Environments to Improve Translators' workflows. In Translating and the Computer 37, Asling, London, pages 106-119.
Vieira, Lucas Nunes and Lucia Specia. 2011. A review of translation tools from a post-editing perspective. In 3rd joint EM+/CNGL Workshop bringing MT to the user: Research meets translators (JEC), pages 33-42.
Warburton, Kara. 2015. Managing terminology in commercial environments. In Handbook of Terminology, Volume 1, John Benjamins Publishing Company, pages 359-391.
Zaretskaya, Anna. 2015. The use of machine translation among professional translators. In Proceedings of the EXPERT Scientific and Technological Workshop, pages 1-12..
Zhechev, Ventsislav and Josef van Genabith. 2010. Seeding statistical machine translation with translation memory output through tree-based structural alignment. In Proceedings of SSST, pages 43-51.
Stand Alone Punctuation Count -this is an accumulation of the stand-alone punctuation word and character counts from the individual text units that make up the document.
Measurement Only Count -this is an accumulation of the word and character count from measurement only text units.
Other Non-Translatable Word Count -other non-translatable word and character counts. Similar counts exist for characters.
Summary GMX/V is based on well-defined standards: 1. XLIFF 2. Unicode ISO 10646
Unicode TR29
GMX/V provides unambiguous and verifiable counts for words and characters, standalone punctuation and inline code and references for all languages and scripts. It also provides additional qualitative counts for the text element categories detailed above. All of this detail allows a precise and unambiguous definition of the localization task for a given electronic file. This rich detail allows suppliers and customers to be able to precisely measure the task in hand. This must surely be a good thing for the localization industry as a whole. In addition, GMX/V provides a way of electronically exchanging counts between different systems. Full details of the GMX/V 2.0 are available: https://xtm-intl.com/manuals/gmx-v/GMX-V-2.0.html

FAQs

What were the cognitive load findings when using CAI tools in interpreting?add

The study revealed that cognitive load peaks during terminology searches with CAI tools, potentially leading to errors if overload occurs.

How does the performance of MT systems vary across different languages?add

MT systems generally perform poorly for minority languages due to inadequate training data, affecting translation quality significantly.

What impact does electronic glossary usage have on interpreters' productivity?add

Using electronic glossaries improved terminological precision and reduced omissions, enhancing the overall productivity of interpreters.

What discrepancies exist in word count metrics across translation tools?add

Different CAT and TMS systems yield inconsistent word and character counts, complicating cost assessments in the localization industry.

How effective are current MT systems in Bible translation?add

Current MT systems lack sufficient training data for Bible translation, leading to quality issues and mistranslations that undermine confidence.

About the author

Gloria Corpas Pastor

Universidad de Málaga, Faculty Member

Papers

209

Followers

652

View all papers from Gloria Corpas Pastorarrow_forward

VIP: Voice-Text Integrated System for Interpreters

Sign up for access to the world's latest research

Abstract

Related papers

Key takeawaysAI

References (262)

FAQs

Related papers

Related topics

Key takeaways
AI