English to Punjabi Statistical Based Machine Translation System
| dc.contributor.author | Singh, Gagandeep | |
| dc.contributor.supervisor | Bhatia, Parteek | |
| dc.date.accessioned | 2010-08-13T11:43:11Z | |
| dc.date.available | 2010-08-13T11:43:11Z | |
| dc.date.issued | 2010-08-13T11:43:11Z | |
| dc.description | M.E. (Software Engineering) | en |
| dc.description.abstract | Machine Translation (MT) refers to the use of computers for the task of translating automatically from one language to another. The differences between languages and especially the inherent ambiguity of language make MT a very difficult problem. Traditional approaches to MT have relied on humans supplying linguistic knowledge in the form of rules to transform text in one language to another. Given the vastness of language, this is a highly knowledge intensive task. Statistical MT is a different approach that automatically acquires knowledge from large amounts of training data. This knowledge, which is typically in the form of probabilities of various language features, is used to guide the translation process. This thesis provides an overview of use of Statistical Machine Translation to translate text from English language to Punjabi language. To develop the translation system CMU-Statistical Language Modeling Toolkit, GIZA++, and ISI ReWrite Decoder were used. CMU-Statistical Language Modeling Toolkit is a set of Unix software tools used to facilitate work related to language modeling in the field of Statistical Machine Translation. To develop Translation Model, GIZA++ is used. GIZA++ is an open source tool used to develop Translation Models for Statistical Machine Translation systems. GIZA++ works with mkcls that is a tool to generate classes. So GIZA++ and mkcls, both tools were used to develop Translation Model. Along with these, ISI ReWrite Decoder is used for decoding. Developed system was tested with the corpus of around 6000 parallel sentences of English and Punjabi language. System worked for simple sentences and can be enhanced in future. | en |
| dc.description.sponsorship | CSED | en |
| dc.format.extent | 1572515 bytes | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.uri | http://hdl.handle.net/10266/1125 | |
| dc.language.iso | en | en |
| dc.subject | Statistical Machine Translation, NLP | en |
| dc.title | English to Punjabi Statistical Based Machine Translation System | en |
