Please use this identifier to cite or link to this item: http://hdl.handle.net/10266/6173
Title: Sanskrit Language Enconversion to Universal Networking Language (UNL)
Authors: Sitender
Supervisor: Bawa, Seema
Keywords: Machine Translation,;Sanskrit Enconverter;Sanskrit Stemmer;Sanskrit POS Tagger;SMT
Issue Date: 26-Oct-2021
Abstract: Machine Translation (MT) has been the prime research area in last few decades. Researchers from different domains like statistics, linguistics, mathematics, artificial intelligence and philosophy have been witnessed to work on solving various problems related to MT. Several methodologies have been used by researchers to develop MT systems for different languages. Developing MT system based on Universal Networking Language (UNL) is also an effort in the direction of MT field. UNL was first launched in 1996 by United Nations University (UNU) at Institute of Advanced Studies, Tokyo Japan. Key components of UNL for natural language processing are EnConverter and DeConverter. The first component is used to convert the Natural Language (NL) sentence into equivalent UNL statements and the second component performs the reverse operation i.e. generates the NL from UNL expressions. The focus of the research work carried out in this thesis is on the development of Enconverter system for Sanskrit language. The thesis starts with introduction part which provides information about the importance of machine translation in today’s multilingual world, Sanskrit language structure, UNL system, need of MT, problems faced during MT development and the comparison of UNL with other systems. This work also highlights a comprehensive survey of MT approaches, existing MT systems, linguistic tools, data repositories and MT platforms. Among the available research in machine translation system, it is found that a little work has been done by the researchers for Sanskrit language MT development. The work that has been done, does not take care of application of neural network for designing stemmer, tagger, parser as well as translator for developing Sanskrit MT system. Further keeping in mind the research gaps from the survey there is a need to develop a new Sanskrit MT system which could perform translation in multiple languages simultaneously with less effort. In this work, a new MT system “SANSUNL” for Sanskrit language is proposed which translates Sanskrit to UNL expressions. The proposed “SANSUNL” MT system consists of seven layers. Each layer is having its unique functionality that includes pre-processing, Part-of-Speech (POS) tagging, parsing, node-list creation, case marker identification, unmatched word handling and UNL expression generation. It uses state-of-the-art technology i.e. neural network for POS tagging and CYK parsing algorithm for language parsing. A new Sanskrit grammar and a new algorithm is proposed for parsing and generating the parse tree for Sanskrit text. The system also uses a new stemmer to find the base form of words to perform the shallow parsing of input text. To test and evaluate the proposed system five data-sets has been used. The system is evaluated using both traditional methods which includes Fluency score and adequacy score as well as automatic evaluation method Bilingual Evaluation Understudy (BLEU) score. From result analysis it is found that the proposed system is performing very well and effectively translate Sanskrit text to UNL expressions. A new Sanskrit to English MT system is also proposed which uses layers of “SANSUNL” system. The system is tested on 500 Sanskrit sentence data-set and evaluated using fluency, adequacy and BLEU score. Finally the thesis is concluded with the perspective of future work.
URI: http://hdl.handle.net/10266/6173
Appears in Collections:Doctoral Theses@CSED

Files in This Item:
File Description SizeFormat 
Sitender-PhD Thesis-950903033-CSED.pdf9.86 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.