Academia.eduAcademia.edu

Statistical Machine Translation

description1,670 papers
group11,273 followers
lightbulbAbout this topic
Statistical Machine Translation (SMT) is a computational approach to translating text from one language to another using statistical models. It relies on algorithms that analyze bilingual text corpora to learn the probabilities of word and phrase translations, enabling the generation of translations based on observed patterns in the data.
lightbulbAbout this topic
Statistical Machine Translation (SMT) is a computational approach to translating text from one language to another using statistical models. It relies on algorithms that analyze bilingual text corpora to learn the probabilities of word and phrase translations, enabling the generation of translations based on observed patterns in the data.
The epic The Knight in the Panther's Skin by Shota Rustaveli is a well-known literary work inside and outside of Georgia, which has been translated into more than 50 languages. It comes as no surprise that the epic offers many topics for... more
English phrasal verbs pose complex semantic interpretations and cause significant problems in their translation into the target language. The paper selects some frequent and highly polysemous English phrasal verbs listed with 10 or more... more
Grammars are core elements of many NLP applications. In this paper, we present a system that automatically extracts lexicalized grammars from annotated corpora. The data produced by this system have been used in several tasks, such as... more
In this paper we report on the recent advancements and current status of the XTAG Project, housed at the University of Pennsylvania. We discuss the current coverage of the system, as evaluated on the TSNLP English sentences, hierarchical... more
De nombreux travaux en Traduction Automatique Statistique (TAS) pour des langues d'entrée morphologiquement riches montrent que la ségmentation morphologique et la normalisation orthographique améliorent la qualité des traductions en... more
Statistical machine translation is quite robust when it comes to the choice of input representation. It only requires consistency between training and testing. As a result, there is a wide range of possible preprocessing choices for data... more
In this paper, we study the effect of different word-level preprocessing decisions for Arabic on SMT quality. Our results show that given large amounts of training data, splitting off only proclitics performs best. However, for small... more
Statistical machine translation is quite robust when it comes to the choice of input representation. It only requires consistency between training and testing. As a result, there is a wide range of possible preprocessing choices for data... more
De nombreux travaux en Traduction Automatique Statistique (TAS) pour des langues d'entrée morphologiquement riches montrent que la ségmentation morphologique et la normalisation orthographique améliorent la qualité des traductions en... more
We describe an approach to automatic source-language syntactic preprocessing in the context of Arabic-English phrase-based machine translation. Source-language labeled dependencies, that are word aligned with target language words in a... more
En la actualidad, la traducción automática (TA) ha venido cobrando gran relevancia en la comunicación intercultural y en la transferencia de conocimiento multilingüe gracias a los avances en inteligencia artificial y el procesamiento del... more
Novel innovations in large language models (LLMs) have demonstrated their ability to generate and analyze literary texts. As a result of intricate semantic layers, metaphors, polyphony, and nonlinear narrative structures present in... more
The acceleration in telecommunication needs leads to many groups of research, especially in communication facilitating and Machine Translation fields. While people contact with others having different languages and cultures, they need to... more
Many Natural Language Processing (NLP) applications involve Named Entity Recognition (NER) as an important task, where it leads to improve the overall performance of NLP applications. In this paper the Deep learning techniques are used to... more
India's linguistic diversity has been widely documented.Yet existing debates on language loss do not examine conceptual repertoires encoded in Indian languages. Drawing on postcolonial analysis, philosophical accounts of concept loss, and... more
Automatic Term Recognition (ATR) is an important method for the summarization and analysis of large corpora, and normally requires a significant amount of linguistic input, in particular the use of part-of-speech taggers. For an... more
Despite being the seventh most widely spoken language in the world, Bengali has received much less attention in machine translation literature due to being low in resources. Most publicly available parallel corpora for Bengali are not... more
Samāsa or compounds are a regular feature of Indian Languages. They are also found in other languages like German, Italian, French, Russian, Spanish, etc. Compound word is constructed from two or more words to form a single word. The... more
This study investigates present-tense verbal constructions in Classical Arabic with special focus on the discrepancy between syntactic optimality and communicative optimality. Specifically, this study challenges the traditional view that... more
Machine translation (MT) has undergone a major transformation over the past decades, evolving from rule-based and statistical models into neural machine translation (NMT), which relies on deep learning architectures trained on large... more
Japan Patent Information Organization (JAPIO) participates in scientific paper subtask (ASPEC-EJ/CJ) and patent subtask (JPC-EJ/CJ/KJ) with phrase-based SMT systems which are trained with its own patent corpora. Using larger corpora than... more
System architecture, experimental settings and evaluation results of EHR group in the en-ja, zh-ja, JPCzh-ja and JPCko-ja tasks are described. Our system concept is combination of a rule based method and a statistical method. System... more
The present work describes a multilingual corpus of online content in the educational domain, i.e. Massive Open Online Course material, ranging from course forum text to subtitles of online video lectures, that has been developed via... more
Chimera is a machine translation system that combines the TectoMT deep-linguistic core with phrase-based MT system Moses. For English-Czech pair it also uses the Depfix postcorrection system. All the components run on Unix/Linux platform... more
This work presents parallel corpora automatically annotated with several NLP tools, including lemma and part-of-speech tagging, named-entity recognition and classification, named-entity disambiguation, word-sense disambiguation, and... more
In this poster we present QTLeap (qtleap.eu), an ongoing project whose goal is to research on and deliver an articulated methodology for machine translation that explores deep language engineering approaches, which handle the... more
Language users in multilingual environments who are trying to make sense of the linguistic challenges they face may well regard the advent of online machine translation (MT) applications as a welcome intervention. Such applications have... more
Statistical Machine Translation (SMT) systems often make mistake in translating a multi-word term (MWT). Building a bilingual MWT lexicon is one of the important steps to improve the translation result on sentence level. This thesis... more
RUOKONEN, M. ; MARICEL, B. ; KEMPPANEN, H. ; RUDVIN, M. ; SILVA-REIS, DENNYS ; TAKEDA, K. . Translators - A Century of Tensions and Transformations. In: Gambier, Yves; Wakabayashi, Judy. (Org.). A Cultural History of Translation: Volume... more
Artificial Intelligence (AI) has emerged as a transformative force in research translation within higher education, shifting the paradigm from basic automation to intelligent systems capable of semantic understanding and contextual... more
Download research papers for free!