UNL Punjabi Deconverter
Loading...
Files
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
iii
The World Wide Web represents a formidable tool for communication and information
access. With simple equipment, it is possible to access innumerable documents about a
huge variety of topics, from any place around the world. However, despite the abundance
of information, languages very often cause problems. When most of the web pages today
are written in few most commonly used languages like English, French, Chinese etc, it
becomes difficult for a person with insufficient knowledge of these languages to access
and use this tool of communication and information. This has prompted the need to
devise means of automatically converting the information from one natural language to
another natural language, called Machine Translation. This process needs syntactic and
semantic analysis of both source and target languages. Interlingua based machine
translation has received a considerable attention because of economy of translation of
effort and also additional attraction of the Interlingua providing a knowledge
representation scheme.
In this thesis work, we have dealt with the language independent deconverter for the
Punjabi language it takes as input a UNL (Universal Networking language) expression.
For the purpose of conversion we use Interlingua which follow the UNL specifications
proposed by UNU/IAS Tokyo. UNL (Universal Networking language) is a language used
to represent a semantic graph equivalent of a concept (contained in text document). The
system takes a set of UNL expression as input and with the help of language independent
algorithm and language dependent data generates corresponding Punjabi sentence. The
process of deconversion involves syntax planning, case marker generation and
morphology phase. The syntax planning phase is aimed at generation of proper sequence
of words for the target sentence. These phases first reads the input UNL file and convert
it into semantic-net like structure known as nodenet. Nodenet is a directed acyclic graph
structure, which defines the sentence in the form of Directed Acylic Graph. We use
lexicon files to map the UWs to target language worlds. After generating a nodenet, the
problem of the syntax plan generation get reduce to the problem of Directed Acylic
Graph traversal. Proper traversal of the node net generates the syntax plan of the target
sentence. This syntax plan needs to be processed by the case-marking file, which apply
proper case marker for each and every relations. This case-marking phase is next
processed by the morphology phase. The morphology phase gives a final form of the
target sentence.
