Rule Based Semi-Supervised Morphological Analyzer for Extending the Range of Existing System

dc.contributor.authorBajaj, Teena
dc.contributor.supervisorBhatia, Prateek
dc.date.accessioned2008-08-12T05:50:18Z
dc.date.available2008-08-12T05:50:18Z
dc.date.issued2008-08-12T05:50:18Z
dc.descriptionM.E. Computer Sc. & Engg. Departmenten
dc.description.abstractThe Internet today has to face the complexity of dealing with multilinguality. People speak different languages and the number of natural languages along with their dialects is estimated to be close to 4000. Among the top 100 languages in the world, Hindi occupies the fifth position with the number of speakers being close to 200 million. The information need of this large section of humanity will place its unique demand on the web calling for knowledge processing of Hindi documents on the web. Morphological analyzer is an essential and basic tool for building any language processing application for a natural language. There are two main approaches of learning the morphology i.e. Supervised and Unsupervised. The existing morph analyzer, freely downloadable at http://www.iiit.net/ltrc/morph/, has a coverage of around 50%. The thesis focuses on how strength of existing morph analyzer can be improved by merging it with a semi-supervised approach for learning of Morphology. In the process of working towards morphological analysis for Hindi language, we have referred the algorithm implemented by Utpal Sharma, Jugal Kalita and Rajib Das in their paper ‘Unsupervised learning of Morphology for Building Lexicon for a Highly Inflectional language’ in our system and merged it with the existing morph analyzer in order to increase the strength of existing morph analyzer. Further, we tested our system on some new text files and discussed the consequences of algorithm implemented by which the coverage of existing morph analyzer is improved. System has around 20% more coverage than the existing system. The coverage of the system can further be improved with the help of implementing the system on new text files and algorithm being performed in iterative manner.en
dc.format.extent1154110 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10266/567
dc.language.isoen_USen
dc.subjectMorph Analyazeren
dc.titleRule Based Semi-Supervised Morphological Analyzer for Extending the Range of Existing Systemen
dc.typeThesisen

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
T567.pdf
Size:
1.1 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.79 KB
Format:
Item-specific license agreed upon to submission
Description: