Development of English-Punjabi Parallel Corpus for Idioms and Phrases using Automatic Text Alignment Technique

Authors(2) :-Jitin Chhabra, Dharam Veer Sharma

Machine Translation has gained a widespread attention in the area of Natural Language Processing due to the increasing Human-Computer interactions in the recent past. Machine Translation provides assistance in the communication among different cultural languages using its rule based and statistical methods of linguistic computations. The Machine Translation systems made so far faces some hurdles in the understanding and processing of multi word fixed expressions like Idioms and Phrases. In order to provide the better efficiency in the translation process of Idioms and Phrases, we proposed the development of Parallel Corpus using Automatic Alignment Technique. We implemented the Alignment Technique on the Idioms and Phrases of English and Punjabi Scripts. We have used the Adjectives from the Tokenized Words of expressions as an element of context identifier and a bilingual English-Punjabi dictionary for the translation process. We have performed the Alignment Experiment and found the mappings for the development of parallel corpus. The resulted set of mappings has been matched with the original equivalents and the wrong sets of mappings are filtered out. The results also pointed out the problems of context identification and ambiguity of words which give rise to the wrong set of mappings.

Authors and Affiliations

Jitin Chhabra
Department of Computer Science, Punjabi University, Patiala, India
Dharam Veer Sharma
Associate Professor, Department of Computer Science, Punjabi University, Patiala, India

Machine Translation, Parallel Corpus, Idioms and Phrases, Word level matching, Context Identification, Automatic Text Alignment.

  1. Krings, H.P, "Translation problems and translation strategies of advanced German learners of French" , in the precedings of Interlingual and intercultural communication ,pp. 263-75,1986.
  2. Gurpreet Singh Josan and Monika Gaule, "Machine Translation of Idioms from English to Hindi", in International Journal Of Computational Engineering Research, vol.2, pp. 5-54, Oct-2012.
  3. Amir Shojaei " Translation of Idioms and Fixed Expressions: Strategies and Difficulties", in the proceedings of Theory and Practice in Language Studies,vol.2, pp. 1220-1229, June-2012.
  4. Sofia Trypanagnostopoulou, Janet DeCesaris, "Using a Parallel corpus as a dictionary Resource: Studying Idioms in an English-Greek Parallel Corpus", in the Proceedings of 9th conference on Hellenic language and Terminology, pp.211-220, Nov-2013.
  5. Xiaoping Jiang and Josta van Rij-Heyligers, "Parallel Corpus in Translation Studies: An Intercultural Approach" in the international symposium on Using Corpora in Contrastive and Translation Studies, pp.1-27, September-2008.
  6. Linli Chen,"Integrated Translation Approach of English Idioms", in the Journal of Language Teaching and Research, Vol. 1, No. 3, pp. 227-230,2010.
  7. S.K. Dwivedi and P. P. Sukadeve, "Machine Translation System Indian Perspectives", Proceeding of Journal of Computer Science Vol. 6 No. 10. pp 1082-1087,2010.
  8. AminehAdelnia, HosseinVahidDastjerdi, "Translation of Idioms: A Hard Task for the Translator", English Department, University of Isfahan, Isfahan, Iran ,Theory and Practice in Language Studies, Vol. 1, No. 7, pp. 879-883,2011.
  9. M. Baker, "A coursebook on translation" London and New York: Routledge,1992.
  10. Margarita Straksien," Analysis of Idiom Translation Strategies from English into Lithuanian", studies about languages, Vol No.14, pp.13-19,2009.

Publication Details

Published in : Volume 2 | Issue 4 | July-August 2017
Date of Publication : 2017-08-31
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 673-677
Manuscript Number : CSEIT1724175
Publisher : Technoscience Academy

ISSN : 2456-3307

Cite This Article :

Jitin Chhabra, Dharam Veer Sharma, "Development of English-Punjabi Parallel Corpus for Idioms and Phrases using Automatic Text Alignment Technique", International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 2, Issue 4, pp.673-677, July-August-2017.
Journal URL :

Article Preview