Please use this identifier to cite or link to this item: http://hdl.handle.net/10266/6105
Title: Cloud-based Sanskrit to Hindi Machine Translation System
Authors: Singh, Muskaan
Supervisor: Chana, Inderveer
Kumar, Ravinder
Keywords: Machine Translation;Natural Language Processing;Sanskrit;Hindi;morphology;translation system
Issue Date: 31-May-2021
Abstract: Machine Translation(MT), one of the several applications of Natural Language Processing (NLP) enables an automatic translation of sentences or documents from one language to another. It aims at reducing the language barriers of human communication belonging to different linguistic backgrounds. Language perplexity has a tremendous impact on several aspects of human subsistence, which can be mitigated with effective use of MT. It endeavours at minimising the involvement of human-being. Although machine-generated output may differs from human translation, it is easily understandable. It manifests its effectiveness by producing grammatically and semantically fluent output. The work presented in this thesis is a modest endeavour to study in detail the extant modelling techniques of MT. It provides a chance to deeply understand the various issues and aspects of the current study. It also serves the purpose of finding the gap in the research area and avoid duplication. It serves the developers with resource's required for modelling techniques such as corpus, domains, toolkits, models, features and their evaluation measures. Sanskrit-Hindi translation has been in existence since many years but it lacks extensibility, generalizability and adaptability which have been overcome by the proposed system developed in this research work. In this work, we have proposed and presented a hybrid MTS for translating Sanskrit to the Hindi language. The technique developed uses linguistic features from rule-based feed to train neural machine translation system. The work is novel and applicable to any low-resource language with rich morphology. It is a generic system covering various domains with minimal human intervention. The performance analysis of work conducted on automatic and linguistic measures. It has shown through results i.e., BLEU score of 61.02% of proposed and developed system outperforms earlier work for this language pair. The proposed MTS is deployed further on the cloud to offer translation as a cloud service and improve the quality of service (QoS). It is developed on TensorFlow and deployed under the cluster of virtual machines in the Amazon Web Server (EC2). The significance of this work lies in demonstrating the management of recurrent changes in terms of corpus, domain, algorithm and rules. The accuracy, speed and response time of the MT system are quite encouraging and satisfactory. The proposed hybrid model is faster and more efficient than the existing rule-based systems. In non-rule match cases, the rule-based model does not return any output however, the proposed model has always returned the best solution. The existing model is quite complex for long sentences, and sometimes these are practically infeasible but the proposed model is efficient in such cases also. OI98The system on cloud is evaluated for different QoS parameters like response time, server load, CPU utilization and throughput. The experimental results asserts, with the availability of elastic computing resources in the cloud environment, the job completion time irrespective of its size can be assured to be within a fixed time limit with high accuracy. The work presented in this thesis has been validated with a case study presented at the end. It outlines the developed taxonomy of error analysis based on different linguistic levels, i.e., orthography, morphology, lexical, syntax, semantics and pragmatics. Consequently, the previous taxonomies were expanded to adapt the errors transpired in morphological rich Indo-European languages. The MTS employed for the case-study is developed as a service using linguistic analysis along with deep learning to aid the teaching and learning process. As far as direct access to Sanskrit text is concerned, it requires a good grammatical knowledge, manual access to the dictionary, knowledge of syntax and semantics which is a tough and time-consuming process. This interactive interface will assist the school as well as university students enrolled in distance education by promoting self-learning. The main aim of the proposed system is to make the scriptures and philosophical texts such as Gita, Ramayana and Upanishads, available in the Sanskrit language, accessible to the common user. It also substantially provides future research directions and aid in the human error analysis process.
URI: http://hdl.handle.net/10266/6105
Appears in Collections:Doctoral Theses@CSED

Files in This Item:
File Description SizeFormat 
Final Thesis - Muskaan Singh - with Signature.pdfPhD Thesis8.39 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.