Rule Based Semi-Supervised Morphological Analyzer for Extending the Range of Existing System
Loading...
Files
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The Internet today has to face the complexity of dealing with multilinguality. People speak different languages and the number of natural languages along with their
dialects is estimated to be close to 4000. Among the top 100 languages in the world,
Hindi occupies the fifth position with the number of speakers being close to 200
million. The information need of this large section of humanity will place its unique
demand on the web calling for knowledge processing of Hindi documents on the web.
Morphological analyzer is an essential and basic tool for building any language
processing application for a natural language. There are two main approaches of
learning the morphology i.e. Supervised and Unsupervised.
The existing morph analyzer, freely downloadable at http://www.iiit.net/ltrc/morph/,
has a coverage of around 50%. The thesis focuses on how strength of existing morph
analyzer can be improved by merging it with a semi-supervised approach for learning
of Morphology. In the process of working towards morphological analysis for Hindi
language, we have referred the algorithm implemented by Utpal Sharma, Jugal Kalita
and Rajib Das in their paper ‘Unsupervised learning of Morphology for Building
Lexicon for a Highly Inflectional language’ in our system and merged it with the
existing morph analyzer in order to increase the strength of existing morph analyzer.
Further, we tested our system on some new text files and discussed the consequences
of algorithm implemented by which the coverage of existing morph analyzer is
improved. System has around 20% more coverage than the existing system. The
coverage of the system can further be improved with the help of implementing the
system on new text files and algorithm being performed in iterative manner.
Description
M.E. Computer Sc. & Engg. Department
