Manuscript Number : CSEIT1724175
Development of English-Punjabi Parallel Corpus for Idioms and Phrases using Automatic Text Alignment Technique
Authors(2) :-Jitin Chhabra, Dharam Veer Sharma Machine Translation has gained a widespread attention in the area of Natural Language Processing due to the increasing Human-Computer interactions in the recent past. Machine Translation provides assistance in the communication among different cultural languages using its rule based and statistical methods of linguistic computations. The Machine Translation systems made so far faces some hurdles in the understanding and processing of multi word fixed expressions like Idioms and Phrases. In order to provide the better efficiency in the translation process of Idioms and Phrases, we proposed the development of Parallel Corpus using Automatic Alignment Technique. We implemented the Alignment Technique on the Idioms and Phrases of English and Punjabi Scripts. We have used the Adjectives from the Tokenized Words of expressions as an element of context identifier and a bilingual English-Punjabi dictionary for the translation process. We have performed the Alignment Experiment and found the mappings for the development of parallel corpus. The resulted set of mappings has been matched with the original equivalents and the wrong sets of mappings are filtered out. The results also pointed out the problems of context identification and ambiguity of words which give rise to the wrong set of mappings.
Jitin Chhabra Machine Translation, Parallel Corpus, Idioms and Phrases, Word level matching, Context Identification, Automatic Text Alignment. Publication Details Published in : Volume 2 | Issue 4 | July-August 2017 Article Preview
Department of Computer Science, Punjabi University, Patiala, India
Dharam Veer Sharma
Associate Professor, Department of Computer Science, Punjabi University, Patiala, India
Date of Publication : 2017-08-31
License: This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 673-677
Manuscript Number : CSEIT1724175
Publisher : Technoscience Academy