English to Punjabi Statistical Based Machine Translation System
Loading...
Files
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Machine Translation (MT) refers to the use of computers for the task of translating
automatically from one language to another. The differences between languages and
especially the inherent ambiguity of language make MT a very difficult problem.
Traditional approaches to MT have relied on humans supplying linguistic knowledge
in the form of rules to transform text in one language to another. Given the vastness
of language, this is a highly knowledge intensive task. Statistical MT is a different
approach that automatically acquires knowledge from large amounts of training data.
This knowledge, which is typically in the form of probabilities of various language
features, is used to guide the translation process.
This thesis provides an overview of use of Statistical Machine Translation to translate
text from English language to Punjabi language. To develop the translation system
CMU-Statistical Language Modeling Toolkit, GIZA++, and ISI ReWrite Decoder
were used.
CMU-Statistical Language Modeling Toolkit is a set of Unix software tools used to
facilitate work related to language modeling in the field of Statistical Machine
Translation.
To develop Translation Model, GIZA++ is used. GIZA++ is an open source tool used
to develop Translation Models for Statistical Machine Translation systems. GIZA++
works with mkcls that is a tool to generate classes. So GIZA++ and mkcls, both tools
were used to develop Translation Model.
Along with these, ISI ReWrite Decoder is used for decoding.
Developed system was tested with the corpus of around 6000 parallel sentences of
English and Punjabi language. System worked for simple sentences and can be
enhanced in future.
Description
M.E. (Software Engineering)
