English to Punjabi Statistical Based Machine Translation System

Singh, Gagandeep

English to Punjabi Statistical Based Machine Translation System

dc.contributor.author	Singh, Gagandeep
dc.contributor.supervisor	Bhatia, Parteek
dc.date.accessioned	2010-08-13T11:43:11Z
dc.date.available	2010-08-13T11:43:11Z
dc.date.issued	2010-08-13T11:43:11Z
dc.description	M.E. (Software Engineering)	en
dc.description.abstract	Machine Translation (MT) refers to the use of computers for the task of translating automatically from one language to another. The differences between languages and especially the inherent ambiguity of language make MT a very difficult problem. Traditional approaches to MT have relied on humans supplying linguistic knowledge in the form of rules to transform text in one language to another. Given the vastness of language, this is a highly knowledge intensive task. Statistical MT is a different approach that automatically acquires knowledge from large amounts of training data. This knowledge, which is typically in the form of probabilities of various language features, is used to guide the translation process. This thesis provides an overview of use of Statistical Machine Translation to translate text from English language to Punjabi language. To develop the translation system CMU-Statistical Language Modeling Toolkit, GIZA++, and ISI ReWrite Decoder were used. CMU-Statistical Language Modeling Toolkit is a set of Unix software tools used to facilitate work related to language modeling in the field of Statistical Machine Translation. To develop Translation Model, GIZA++ is used. GIZA++ is an open source tool used to develop Translation Models for Statistical Machine Translation systems. GIZA++ works with mkcls that is a tool to generate classes. So GIZA++ and mkcls, both tools were used to develop Translation Model. Along with these, ISI ReWrite Decoder is used for decoding. Developed system was tested with the corpus of around 6000 parallel sentences of English and Punjabi language. System worked for simple sentences and can be enhanced in future.	en
dc.description.sponsorship	CSED	en
dc.format.extent	1572515 bytes
dc.format.mimetype	application/pdf
dc.identifier.uri	http://hdl.handle.net/10266/1125
dc.language.iso	en	en
dc.subject	Statistical Machine Translation, NLP	en
dc.title	English to Punjabi Statistical Based Machine Translation System	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 1125.pdf
Size:: 1.34 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.79 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Masters Theses@CSED