Sanskrit Language Enconversion to Universal Networking Language (UNL)
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Machine Translation (MT) has been the prime research area in last few decades. Researchers
from different domains like statistics, linguistics, mathematics, artificial intelligence and
philosophy have been witnessed to work on solving various problems related to MT. Several
methodologies have been used by researchers to develop MT systems for different languages.
Developing MT system based on Universal Networking Language (UNL) is also an effort in
the direction of MT field. UNL was first launched in 1996 by United Nations University
(UNU) at Institute of Advanced Studies, Tokyo Japan. Key components of UNL for natural
language processing are EnConverter and DeConverter. The first component is used to
convert the Natural Language (NL) sentence into equivalent UNL statements and the second
component performs the reverse operation i.e. generates the NL from UNL expressions. The
focus of the research work carried out in this thesis is on the development of Enconverter
system for Sanskrit language.
The thesis starts with introduction part which provides information about the importance of
machine translation in today’s multilingual world, Sanskrit language structure, UNL system,
need of MT, problems faced during MT development and the comparison of UNL with other
systems. This work also highlights a comprehensive survey of MT approaches, existing
MT systems, linguistic tools, data repositories and MT platforms. Among the available
research in machine translation system, it is found that a little work has been done by the
researchers for Sanskrit language MT development. The work that has been done, does not
take care of application of neural network for designing stemmer, tagger, parser as well as
translator for developing Sanskrit MT system. Further keeping in mind the research gaps
from the survey there is a need to develop a new Sanskrit MT system which could perform
translation in multiple languages simultaneously with less effort. In this work, a new MT
system “SANSUNL” for Sanskrit language is proposed which translates Sanskrit to UNL expressions. The proposed “SANSUNL” MT system consists of seven layers. Each layer is
having its unique functionality that includes pre-processing, Part-of-Speech (POS) tagging,
parsing, node-list creation, case marker identification, unmatched word handling and UNL
expression generation. It uses state-of-the-art technology i.e. neural network for POS tagging
and CYK parsing algorithm for language parsing. A new Sanskrit grammar and a new
algorithm is proposed for parsing and generating the parse tree for Sanskrit text. The system
also uses a new stemmer to find the base form of words to perform the shallow parsing of input
text. To test and evaluate the proposed system five data-sets has been used. The system is
evaluated using both traditional methods which includes Fluency score and adequacy score as
well as automatic evaluation method Bilingual Evaluation Understudy (BLEU) score. From
result analysis it is found that the proposed system is performing very well and effectively
translate Sanskrit text to UNL expressions. A new Sanskrit to English MT system is also
proposed which uses layers of “SANSUNL” system. The system is tested on 500 Sanskrit
sentence data-set and evaluated using fluency, adequacy and BLEU score. Finally the thesis
is concluded with the perspective of future work.
