Statistical Machine Translation Research Papers

Parallel Corpora and Implementation Possibilities in Multilingual Education (Georgian-Abkhazian Parallel Corpus of "The Knight in the Panther’s skin")

by Leila Avidzba

2026, International Journal of MULTILINGUAL EDUCATION

The epic The Knight in the Panther's Skin by Shota Rustaveli is a well-known literary work inside and outside of Georgia, which has been translated into more than 50 languages. It comes as no surprise that the epic offers many topics for... more

descriptionView Paper arrow_downwardDownload

English-to-Hindi Translation Divergence Study of English Phrasal Verbs

by Pursotam Kumar and

2026, Edelweiss Applied Science and Technology, 8(4),2257–2266

English phrasal verbs pose complex semantic interpretations and cause significant problems in their translation into the target language. The paper selects some frequent and highly polysemous English phrasal verbs listed with 10 or more... more

descriptionView Paper arrow_downwardDownload

Computer-Assisted Translation Risks and Threats in Legal Texts

by Mindreci Georgiana

2026, Strategii Manageriale

descriptionView Paper arrow_downwardDownload

A uniform method of grammar extraction and its applications

by Fei Xia

2026, Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics -

Grammars are core elements of many NLP applications. In this paper, we present a system that automatically extracts lexicalized grammars from annotated corpora. The data produced by this system have been used in several tasks, such as... more

descriptionView Paper arrow_downwardDownload

Maintaining the Forest and Burning out the Underbrush in XTAG

by Fei Xia

2026

In this paper we report on the recent advancements and current status of the XTAG Project, housed at the University of Pennsylvania. We discuss the current coverage of the system, as evaluated on the TSNLP English sentences, hierarchical... more

descriptionView Paper arrow_downwardDownload

Orthographic and Morphological Processing for English-Arabic Statistical Machine Translation

by Nizar Habash

2026

De nombreux travaux en Traduction Automatique Statistique (TAS) pour des langues d'entrée morphologiquement riches montrent que la ségmentation morphologique et la normalisation orthographique améliorent la qualité des traductions en... more

descriptionView Paper arrow_downwardDownload

Combination of Arabic preprocessing schemes for statistical machine translation

by Nizar Habash

2026

Statistical machine translation is quite robust when it comes to the choice of input representation. It only requires consistency between training and testing. As a result, there is a wide range of possible preprocessing choices for data... more

descriptionView Paper arrow_downwardDownload

Arabic preprocessing schemes for statistical machine translation

by Nizar Habash

2026

In this paper, we study the effect of different word-level preprocessing decisions for Arabic on SMT quality. Our results show that given large amounts of training data, splitting off only proclitics performs best. However, for small... more

descriptionView Paper arrow_downwardDownload

Combination of Arabic preprocessing schemes for statistical machine translation

by Nizar Habash

2026, Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL - ACL '06

Statistical machine translation is quite robust when it comes to the choice of input representation. It only requires consistency between training and testing. As a result, there is a wide range of possible preprocessing choices for data... more

descriptionView Paper arrow_downwardDownload

Orthographic and morphological processing for English–Arabic statistical machine translation

by Nizar Habash

2026, Machine Translation

De nombreux travaux en Traduction Automatique Statistique (TAS) pour des langues d'entrée morphologiquement riches montrent que la ségmentation morphologique et la normalisation orthographique améliorent la qualité des traductions en... more

descriptionView Paper arrow_downwardDownload

Syntactic preprocessing for statistical machine translation

by Nizar Habash

2026

We describe an approach to automatic source-language syntactic preprocessing in the context of Arabic-English phrase-based machine translation. Source-language labeled dependencies, that are word aligned with target language words in a... more

descriptionView Paper arrow_downwardDownload

Calidad de la traducción automática en contextos especializados: evaluación comparativa y límites de la traducción automática neuronal

by Blanca Hernández Pardo

2026, Libro de actas del Congreso CUICIID 2025

En la actualidad, la traducción automática (TA) ha venido cobrando gran relevancia en la comunicación intercultural y en la transferencia de conocimiento multilingüe gracias a los avances en inteligencia artificial y el procesamiento del... more

descriptionView Paper arrow_downwardDownload

Applications, Performance, and Research Gaps of Large Language Models in Literary Studies: A Scoping Review

by NEDA MOZAFFARI and

2026, Human Behavior and Emerging Technologies

Novel innovations in large language models (LLMs) have demonstrated their ability to generate and analyze literary texts. As a result of intricate semantic layers, metaphors, polyphony, and nonlinear narrative structures present in... more

Novel innovations in large language models (LLMs) have demonstrated their ability to generate and analyze literary texts. As a result of intricate semantic layers, metaphors, polyphony, and nonlinear narrative structures present in literary works, their analysis by LLMs demands a deep cognitive and semantic understanding. Hence, it is essential to investigate the present abilities of LLMs to understand complex literary narratives, assess their performance, and spot their shortcomings to deliver a consistent view regarding the upcoming development of technologies in computational literary studies. This study, through a systematic scoping review of data from 48 peer-reviewed articles from five major scientific databases, assesses the current state of research on LLM performance in the interpretation and production of literary texts. We analyzed the selected studies according to two principal axes: (1) the fields of literary applications of LLMs and their performance evaluation within each domain and (2) current theoretical and technical challenges and limitations. The findings revealed that LLMs have been applied to eight major literary tasks, including comprehensive literary analysis and interpretation, understanding and extracting character relationships and traits, stylistic analysis and authorship attribution, interpretation of metaphors and rhetorical features, evaluation and generation of literary content by LLMs, quotation attribution to fictional characters, literary text summarization, and literary translation. Key challenges and limitations were also identified, including data bias and dependency, human intervention and evaluation, constraints related to text and narrative length, and limitations in deep understanding and reasoning. In addition, we formulated six recommendations for future studies on developing and implementing LLMs in literary studies. This review provides a comprehensive roadmap for future researchers to identify current strengths and weaknesses, address existing gaps, and leverage the strengths in practical applications, such as literary translation.

descriptionView Paper arrow_downwardDownload

EXTENDING A MODEL FOR ONTOLOGY-BASED ARABIC-ENGLISH MACHINE TRANSLATION (NAN)

by Neama Abdulaziz Dahan

2026, International Journal of Artificial Intelligence and Applications (IJAIA)

The acceleration in telecommunication needs leads to many groups of research, especially in communication facilitating and Machine Translation fields. While people contact with others having different languages and cultures, they need to... more

descriptionView Paper arrow_downwardDownload

A Novel Approach for Named Entity Recognition on Hindi Language Using Residual Bilstm Network

by rita shelke

2026, International journal on natural language computing

Many Natural Language Processing (NLP) applications involve Named Entity Recognition (NER) as an important task, where it leads to improve the overall performance of NLP applications. In this paper the Deep learning techniques are used to... more

descriptionView Paper arrow_downwardDownload

Predication and Cultural Knowledge: An Approach to Reconstructing Indian Text Corpora

by A. P. Ashwin Kumar and

2026, INTERNATIONAL JOURNAL OF TRANSLATION

India's linguistic diversity has been widely documented.Yet existing debates on language loss do not examine conceptual repertoires encoded in Indian languages. Drawing on postcolonial analysis, philosophical accounts of concept loss, and... more

descriptionView Paper arrow_downwardDownload

Adapting Term Recognition to an Under-Resourced Language: the Case of Irish

by Adrian Doyle

2026, Proceedings of the Celtic Language Technology Workshop

Automatic Term Recognition (ATR) is an important method for the summarization and analysis of large corpora, and normally requires a significant amount of linguistic input, in particular the use of part-of-speech taggers. For an... more

descriptionView Paper arrow_downwardDownload

Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation

by Md. Tahmid Hasan

2026

Despite being the seventh most widely spoken language in the world, Bengali has received much less attention in machine translation literature due to being low in resources. Most publicly available parallel corpora for Bengali are not... more

descriptionView Paper arrow_downwardDownload

Samasa-karta: An online tool for producing compound words using IndoWordNet

by Irawati Kulkarni

2026

descriptionView Paper arrow_downwardDownload

Sam ā sa-Kart ā : An Online Tool for Producing Compound

by Irawati Kulkarni

2026

Samāsa or compounds are a regular feature of Indian Languages. They are also found in other languages like German, Italian, French, Russian, Spanish, etc. Compound word is constructed from two or more words to form a single word. The... more

descriptionView Paper arrow_downwardDownload

Syntactic Optimality vs. Communicative Optimality: The Syntax of Present-Tense Constructions in Classical Arabic

by Zeyad Al-Daher

2026, Forum for Linguistic Studies

This study investigates present-tense verbal constructions in Classical Arabic with special focus on the discrepancy between syntactic optimality and communicative optimality. Specifically, this study challenges the traditional view that... more

descriptionView Paper arrow_downwardDownload

MT Islam

by Mohammat MT

2026, MOhammet

Machine translation (MT) has undergone a major transformation over the past decades, evolving from rule-based and statistical models into neural machine translation (NMT), which relies on deep learning architectures trained on large... more

descriptionView Paper arrow_downwardDownload

Translation Using JAPIO Patent Corpora: JAPIO at WAT2016

by Terumasa Ehara

2026, International Conference on Computational Linguistics

Japan Patent Information Organization (JAPIO) participates in scientific paper subtask (ASPEC-EJ/CJ) and patent subtask (JPC-EJ/CJ/KJ) with phrase-based SMT systems which are trained with its own patent corpora. Using larger corpora than... more

descriptionView Paper arrow_downwardDownload

System Combination of RBMT plus SPE and Preordering plus SMT

by Terumasa Ehara

2026

System architecture, experimental settings and evaluation results of EHR group in the en-ja, zh-ja, JPCzh-ja and JPCko-ja tasks are described. Our system concept is combination of a rule based method and a statistical method. System... more

descriptionView Paper arrow_downwardDownload

Translation Crowdsourcing: Creating a Multilingual Corpus of Online Educational Content

by Valia Kordoni

2026

The present work describes a multilingual corpus of online content in the educational domain, i.e. Massive Open Online Course material, ranging from course forum text to subtitles of online video lectures, that has been developed via... more

descriptionView Paper arrow_downwardDownload

TectoMT - a deep linguistic core of the combined Cimera MT system

by Jan Hajič

2026

Chimera is a machine translation system that combines the TectoMT deep-linguistic core with phrase-based MT system Moses. For English-Czech pair it also uses the Depfix postcorrection system. All the components run on Unix/Linux platform... more

descriptionView Paper arrow_downwardDownload

QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages

by Jan Hajič

2026

This work presents parallel corpora automatically annotated with several NLP tools, including lemma and part-of-speech tagging, named-entity recognition and classification, named-entity disambiguation, word-sense disambiguation, and... more

descriptionView Paper arrow_downwardDownload

QTLeap : A European scientific research project on machine translation by deep language engineering approaches

by Jan Hajič

2026

In this poster we present QTLeap (qtleap.eu), an ongoing project whose goal is to research on and deliver an articulated methodology for machine translation that explores deep language engineering approaches, which handle the... more

descriptionView Paper arrow_downwardDownload

Translation technology explored: Has a three-year maturation period done Google Translate any good?

by Alta van Rensburg

2026, Stellenbosch Papers in Linguistics Plus

Language users in multilingual environments who are trying to make sense of the linguistic challenges they face may well regard the advent of online machine translation (MT) applications as a welcome intervention. Such applications have... more

descriptionView Paper arrow_downwardDownload

コンパラブルコーパスニヨルフクゴウゴタイヤクジショノジドウコウチク

by 俊立梁

2026

Statistical Machine Translation (SMT) systems often make mistake in translating a multi-word term (MWT). Building a bilingual MWT lexicon is one of the important steps to improve the translation result on sentence level. This thesis... more

descriptionView Paper arrow_downwardDownload

Translators - A Century of Tensions and Transformations

by Dennys Silva-Reis

2026

RUOKONEN, M. ; MARICEL, B. ; KEMPPANEN, H. ; RUDVIN, M. ; SILVA-REIS, DENNYS ; TAKEDA, K. . Translators - A Century of Tensions and Transformations. In: Gambier, Yves; Wakabayashi, Judy. (Org.). A Cultural History of Translation: Volume... more

descriptionView Paper arrow_downwardDownload

Artificial Intelligence in Research Translation in Higher Education: Applied Potentials for University Students

by Ahmed Shaker Alalaq

2026, Journal of Digital Learning and Distance Education

Artificial Intelligence (AI) has emerged as a transformative force in research translation within higher education, shifting the paradigm from basic automation to intelligent systems capable of semantic understanding and contextual... more

descriptionView Paper arrow_downwardDownload

Statistical Machine Translation

Related Topics