English to Punjabi Statistical Based Machine Translation System

Singh, Gagandeep

English to Punjabi Statistical Based Machine Translation System

Files

1125.pdf (1.34 MB)

Date

2010-08-13T11:43:11Z

Authors

Singh, Gagandeep

Supervisors

Bhatia, Parteek

Abstract

Machine Translation (MT) refers to the use of computers for the task of translating automatically from one language to another. The differences between languages and especially the inherent ambiguity of language make MT a very difficult problem. Traditional approaches to MT have relied on humans supplying linguistic knowledge in the form of rules to transform text in one language to another. Given the vastness of language, this is a highly knowledge intensive task. Statistical MT is a different approach that automatically acquires knowledge from large amounts of training data. This knowledge, which is typically in the form of probabilities of various language features, is used to guide the translation process. This thesis provides an overview of use of Statistical Machine Translation to translate text from English language to Punjabi language. To develop the translation system CMU-Statistical Language Modeling Toolkit, GIZA++, and ISI ReWrite Decoder were used. CMU-Statistical Language Modeling Toolkit is a set of Unix software tools used to facilitate work related to language modeling in the field of Statistical Machine Translation. To develop Translation Model, GIZA++ is used. GIZA++ is an open source tool used to develop Translation Models for Statistical Machine Translation systems. GIZA++ works with mkcls that is a tool to generate classes. So GIZA++ and mkcls, both tools were used to develop Translation Model. Along with these, ISI ReWrite Decoder is used for decoding. Developed system was tested with the corpus of around 6000 parallel sentences of English and Punjabi language. System worked for simple sentences and can be enhanced in future.

Description

M.E. (Software Engineering)

Keywords

Statistical Machine Translation, NLP

URI

http://hdl.handle.net/10266/1125

Collections

Masters Theses@CSED

Full item page

English to Punjabi Statistical Based Machine Translation System

Files

Date

Authors

Supervisors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By