English to Punjabi Statistical Based Machine Translation System

dc.contributor.authorSingh, Gagandeep
dc.contributor.supervisorBhatia, Parteek
dc.date.accessioned2010-08-13T11:43:11Z
dc.date.available2010-08-13T11:43:11Z
dc.date.issued2010-08-13T11:43:11Z
dc.descriptionM.E. (Software Engineering)en
dc.description.abstractMachine Translation (MT) refers to the use of computers for the task of translating automatically from one language to another. The differences between languages and especially the inherent ambiguity of language make MT a very difficult problem. Traditional approaches to MT have relied on humans supplying linguistic knowledge in the form of rules to transform text in one language to another. Given the vastness of language, this is a highly knowledge intensive task. Statistical MT is a different approach that automatically acquires knowledge from large amounts of training data. This knowledge, which is typically in the form of probabilities of various language features, is used to guide the translation process. This thesis provides an overview of use of Statistical Machine Translation to translate text from English language to Punjabi language. To develop the translation system CMU-Statistical Language Modeling Toolkit, GIZA++, and ISI ReWrite Decoder were used. CMU-Statistical Language Modeling Toolkit is a set of Unix software tools used to facilitate work related to language modeling in the field of Statistical Machine Translation. To develop Translation Model, GIZA++ is used. GIZA++ is an open source tool used to develop Translation Models for Statistical Machine Translation systems. GIZA++ works with mkcls that is a tool to generate classes. So GIZA++ and mkcls, both tools were used to develop Translation Model. Along with these, ISI ReWrite Decoder is used for decoding. Developed system was tested with the corpus of around 6000 parallel sentences of English and Punjabi language. System worked for simple sentences and can be enhanced in future.en
dc.description.sponsorshipCSEDen
dc.format.extent1572515 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10266/1125
dc.language.isoenen
dc.subjectStatistical Machine Translation, NLPen
dc.titleEnglish to Punjabi Statistical Based Machine Translation Systemen

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
1125.pdf
Size:
1.34 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.79 KB
Format:
Item-specific license agreed upon to submission
Description: