A Cross-modal System for Image Annotation and Retrieval

Kaur, Parminder

A Cross-modal System for Image Annotation and Retrieval

dc.contributor.author	Kaur, Parminder
dc.contributor.supervisor	Pannu, Husanbir Singh
dc.contributor.supervisor	Malhi, Avleen Kaur
dc.date.accessioned	2023-09-26T11:33:19Z
dc.date.available	2023-09-26T11:33:19Z
dc.date.issued	2023-09-26
dc.description.abstract	Human beings experience life through a spectrum of modes such as vision, taste, hearing, smell, and touch. These multiple modes are integrated for information processing in our brain using a complex network of neuron connections. Likewise for artificial intelligence to mimic the human way of learning and evolve into the next generation, it should elucidate multi-modal information fusion efficiently. Modality is a channel that conveys information about an object or an event such as image, text, video, and audio. A research problem is said to be multi-modal or cross-modal when it incorporates information from more than a single modality. Multi-modal systems involve one mode of data to be inquired for any (same or varying) modality outcome whereas cross-modal system strictly retrieves the information from a dissimilar modality. As the input–output queries belong to diverse modal families, their coherent comparison is still an open challenge with their primitive forms and subjective definition of content similarity. Lately, cross-modal retrieval has attained plenty of attention due to enormous multi-modal data generation every day in the form of audio, video, image, and text. One vital requirement of cross-modal retrieval is to reduce the heterogeneity gap among miscellaneous modalities so that one modality's results can be effectively retrieved from the other. So, a novel unsupervised cross-modal retrieval framework (association of image and text modalities) based on associative learning is proposed in this thesis where two traditional SOMs are trained separately for images and collateral text and then they are integrated together using the Hebbian learning network to facilitate the cross-modal retrieval process. Experimental outcomes on a popular Wikipedia dataset and the primary endoscopy data demonstrate that the presented technique outshines various existing state-of-the-art techniques.	en_US
dc.description.sponsorship	Thapar Institute	en_US
dc.identifier.uri	http://hdl.handle.net/10266/6614
dc.language.iso	en	en_US
dc.publisher	Thapar Instituet	en_US
dc.subject	Machine learning	en_US
dc.subject	cross-modal	en_US
dc.subject	image and text	en_US
dc.subject	associative learning	en_US
dc.subject	data analysis	en_US
dc.title	A Cross-modal System for Image Annotation and Retrieval	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: PhDThesis_ParminderKaurCSED.pdf
Size:: 9.46 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.03 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Doctoral Theses@CSED