Sanskrit Language Enconversion to Universal Networking Language (UNL)

Sitender

Sanskrit Language Enconversion to Universal Networking Language (UNL)

Files

Sitender-PhD Thesis-950903033-CSED.pdf (9.62 MB)

Date

2021-10-26

Authors

Sitender

Supervisors

Bawa, Seema

Abstract

Machine Translation (MT) has been the prime research area in last few decades. Researchers from different domains like statistics, linguistics, mathematics, artificial intelligence and philosophy have been witnessed to work on solving various problems related to MT. Several methodologies have been used by researchers to develop MT systems for different languages. Developing MT system based on Universal Networking Language (UNL) is also an effort in the direction of MT field. UNL was first launched in 1996 by United Nations University (UNU) at Institute of Advanced Studies, Tokyo Japan. Key components of UNL for natural language processing are EnConverter and DeConverter. The first component is used to convert the Natural Language (NL) sentence into equivalent UNL statements and the second component performs the reverse operation i.e. generates the NL from UNL expressions. The focus of the research work carried out in this thesis is on the development of Enconverter system for Sanskrit language. The thesis starts with introduction part which provides information about the importance of machine translation in today’s multilingual world, Sanskrit language structure, UNL system, need of MT, problems faced during MT development and the comparison of UNL with other systems. This work also highlights a comprehensive survey of MT approaches, existing MT systems, linguistic tools, data repositories and MT platforms. Among the available research in machine translation system, it is found that a little work has been done by the researchers for Sanskrit language MT development. The work that has been done, does not take care of application of neural network for designing stemmer, tagger, parser as well as translator for developing Sanskrit MT system. Further keeping in mind the research gaps from the survey there is a need to develop a new Sanskrit MT system which could perform translation in multiple languages simultaneously with less effort. In this work, a new MT system “SANSUNL” for Sanskrit language is proposed which translates Sanskrit to UNL expressions. The proposed “SANSUNL” MT system consists of seven layers. Each layer is having its unique functionality that includes pre-processing, Part-of-Speech (POS) tagging, parsing, node-list creation, case marker identification, unmatched word handling and UNL expression generation. It uses state-of-the-art technology i.e. neural network for POS tagging and CYK parsing algorithm for language parsing. A new Sanskrit grammar and a new algorithm is proposed for parsing and generating the parse tree for Sanskrit text. The system also uses a new stemmer to find the base form of words to perform the shallow parsing of input text. To test and evaluate the proposed system five data-sets has been used. The system is evaluated using both traditional methods which includes Fluency score and adequacy score as well as automatic evaluation method Bilingual Evaluation Understudy (BLEU) score. From result analysis it is found that the proposed system is performing very well and effectively translate Sanskrit text to UNL expressions. A new Sanskrit to English MT system is also proposed which uses layers of “SANSUNL” system. The system is tested on 500 Sanskrit sentence data-set and evaluated using fluency, adequacy and BLEU score. Finally the thesis is concluded with the perspective of future work.

Keywords

Machine Translation,, Sanskrit Enconverter, Sanskrit Stemmer, Sanskrit POS Tagger, SMT

URI

http://hdl.handle.net/10266/6173

Collections

Doctoral Theses@CSED

Full item page

Sanskrit Language Enconversion to Universal Networking Language (UNL)

Files

Date

Authors

Supervisors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By