Semantic Aware Generative Adversarial Network for Image Super-resolution

Sharma, Shailza

Semantic Aware Generative Adversarial Network for Image Super-resolution

Files

Primary Semantic Aware GAN for Image Super-Resolution.pdf (36.4 MB)

Date

2023-10-26

Authors

Sharma, Shailza

Supervisors

Kumar, Vinay

Dhall, Abhinav

Abstract

This thesis investigates the field of super-resolution using deep learning methodologies, with a specific focus on Convolutional Neural Networks and Generative Adversarial Networks. The primary objective is to enhance the resolution and quality of low-resolution images by proposing novel architectures and methodologies that address the inherent challenges in this domain. The first contribution of this thesis is the development of a novel GAN-based architecture for super-resolution. The proposed architecture incorporates a dual-stage upsampling approach for an upscaling factor of 4, utilizing inter and intra residual dense connections. This design enables the model to effectively capture high-frequency texture details in images. Furthermore, the integration of semantic information with the input image enhances the depiction of objects, resulting in visually compelling outcomes. To ensure stable training, spectral normalization is employed in the discriminator architecture. The second contribution of this thesis is the introduction of the Generative Adversarial Based SRINet model. This model obviates the need for linear filters by integrating complex filter structures within the network. Additionally, the architecture incorporates dense skip connections to enhance the network’s learning capability while retaining computational efficiency. A progressive upscaling approach is employed to preserve high-frequency components and produce output images with fine texture details. Furthermore, this thesis presents a novel GAN-based progressive face hallucination network. To generate output images with 3D parametric information, an auxiliary supervision network is utilized, leveraging the shape model of a 3D Morphable Model (3DMM) to generate 2D images with 3D parametric information. Additionally, an autoencoder is proposed to incorporate highfrequency components using high-resolution coefficients of Discrete Cosine Transform (DCT). An Inverse DCT (IDCT) block is introduced within the network to convert frequency domain coefficients to the spatial domain, effectively embedding high-resolution DCT information into the face hallucination network. Lastly, this work investigates the benefits of incorporating audio signals in the video face hallucination task. Empirical evidence demonstrates that audio signals aid in retrieving lost visual information and maintaining visual consistency across consecutive frames. A novel lip-reading loss, inspired by visual speech recognition, is introduced, enabling the proposed architecture to generate facial images with fine texture details in areas such as the mouth and lips. Additionally, a frequency-based loss function is incorporated to effectively capture salient frequency features.

Keywords

Super-resolution, Deep learning, GAN, CNN, Auto-encoder, Face hallucination

URI

http://hdl.handle.net/10266/6645

Collections

Doctoral Theses@ECED

Full item page

Semantic Aware Generative Adversarial Network for Image Super-resolution

Files

Date

Authors

Supervisors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By