Rule Based Semi-Supervised Morphological Analyzer for Extending the Range of Existing System
| dc.contributor.author | Bajaj, Teena | |
| dc.contributor.supervisor | Bhatia, Prateek | |
| dc.date.accessioned | 2008-08-12T05:50:18Z | |
| dc.date.available | 2008-08-12T05:50:18Z | |
| dc.date.issued | 2008-08-12T05:50:18Z | |
| dc.description | M.E. Computer Sc. & Engg. Department | en |
| dc.description.abstract | The Internet today has to face the complexity of dealing with multilinguality. People speak different languages and the number of natural languages along with their dialects is estimated to be close to 4000. Among the top 100 languages in the world, Hindi occupies the fifth position with the number of speakers being close to 200 million. The information need of this large section of humanity will place its unique demand on the web calling for knowledge processing of Hindi documents on the web. Morphological analyzer is an essential and basic tool for building any language processing application for a natural language. There are two main approaches of learning the morphology i.e. Supervised and Unsupervised. The existing morph analyzer, freely downloadable at http://www.iiit.net/ltrc/morph/, has a coverage of around 50%. The thesis focuses on how strength of existing morph analyzer can be improved by merging it with a semi-supervised approach for learning of Morphology. In the process of working towards morphological analysis for Hindi language, we have referred the algorithm implemented by Utpal Sharma, Jugal Kalita and Rajib Das in their paper ‘Unsupervised learning of Morphology for Building Lexicon for a Highly Inflectional language’ in our system and merged it with the existing morph analyzer in order to increase the strength of existing morph analyzer. Further, we tested our system on some new text files and discussed the consequences of algorithm implemented by which the coverage of existing morph analyzer is improved. System has around 20% more coverage than the existing system. The coverage of the system can further be improved with the help of implementing the system on new text files and algorithm being performed in iterative manner. | en |
| dc.format.extent | 1154110 bytes | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.uri | http://hdl.handle.net/10266/567 | |
| dc.language.iso | en_US | en |
| dc.subject | Morph Analyazer | en |
| dc.title | Rule Based Semi-Supervised Morphological Analyzer for Extending the Range of Existing System | en |
| dc.type | Thesis | en |
