Please use this identifier to cite or link to this item: http://hdl.handle.net/10266/6645
Title: Semantic Aware Generative Adversarial Network for Image Super-resolution
Authors: Sharma, Shailza
Supervisor: Kumar, Vinay
Dhall, Abhinav
Keywords: Super-resolution;Deep learning;GAN;CNN;Auto-encoder;Face hallucination
Issue Date: 26-Oct-2023
Abstract: This thesis investigates the field of super-resolution using deep learning methodologies, with a specific focus on Convolutional Neural Networks and Generative Adversarial Networks. The primary objective is to enhance the resolution and quality of low-resolution images by proposing novel architectures and methodologies that address the inherent challenges in this domain. The first contribution of this thesis is the development of a novel GAN-based architecture for super-resolution. The proposed architecture incorporates a dual-stage upsampling approach for an upscaling factor of 4, utilizing inter and intra residual dense connections. This design enables the model to effectively capture high-frequency texture details in images. Furthermore, the integration of semantic information with the input image enhances the depiction of objects, resulting in visually compelling outcomes. To ensure stable training, spectral normalization is employed in the discriminator architecture. The second contribution of this thesis is the introduction of the Generative Adversarial Based SRINet model. This model obviates the need for linear filters by integrating complex filter structures within the network. Additionally, the architecture incorporates dense skip connections to enhance the network’s learning capability while retaining computational efficiency. A progressive upscaling approach is employed to preserve high-frequency components and produce output images with fine texture details. Furthermore, this thesis presents a novel GAN-based progressive face hallucination network. To generate output images with 3D parametric information, an auxiliary supervision network is utilized, leveraging the shape model of a 3D Morphable Model (3DMM) to generate 2D images with 3D parametric information. Additionally, an autoencoder is proposed to incorporate highfrequency components using high-resolution coefficients of Discrete Cosine Transform (DCT). An Inverse DCT (IDCT) block is introduced within the network to convert frequency domain coefficients to the spatial domain, effectively embedding high-resolution DCT information into the face hallucination network. Lastly, this work investigates the benefits of incorporating audio signals in the video face hallucination task. Empirical evidence demonstrates that audio signals aid in retrieving lost visual information and maintaining visual consistency across consecutive frames. A novel lip-reading loss, inspired by visual speech recognition, is introduced, enabling the proposed architecture to generate facial images with fine texture details in areas such as the mouth and lips. Additionally, a frequency-based loss function is incorporated to effectively capture salient frequency features.
URI: http://hdl.handle.net/10266/6645
Appears in Collections:Doctoral Theses@ECED

Files in This Item:
File Description SizeFormat 
Semantic Aware GAN for Image Super-Resolution.pdfPhD Thesis37.27 MBAdobe PDFView/Open    Request a copy


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.