Semantic Aware Generative Adversarial Network for Image Super-resolution

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

This thesis investigates the field of super-resolution using deep learning methodologies, with a specific focus on Convolutional Neural Networks and Generative Adversarial Networks. The primary objective is to enhance the resolution and quality of low-resolution images by proposing novel architectures and methodologies that address the inherent challenges in this domain. The first contribution of this thesis is the development of a novel GAN-based architecture for super-resolution. The proposed architecture incorporates a dual-stage upsampling approach for an upscaling factor of 4, utilizing inter and intra residual dense connections. This design enables the model to effectively capture high-frequency texture details in images. Furthermore, the integration of semantic information with the input image enhances the depiction of objects, resulting in visually compelling outcomes. To ensure stable training, spectral normalization is employed in the discriminator architecture. The second contribution of this thesis is the introduction of the Generative Adversarial Based SRINet model. This model obviates the need for linear filters by integrating complex filter structures within the network. Additionally, the architecture incorporates dense skip connections to enhance the network’s learning capability while retaining computational efficiency. A progressive upscaling approach is employed to preserve high-frequency components and produce output images with fine texture details. Furthermore, this thesis presents a novel GAN-based progressive face hallucination network. To generate output images with 3D parametric information, an auxiliary supervision network is utilized, leveraging the shape model of a 3D Morphable Model (3DMM) to generate 2D images with 3D parametric information. Additionally, an autoencoder is proposed to incorporate highfrequency components using high-resolution coefficients of Discrete Cosine Transform (DCT). An Inverse DCT (IDCT) block is introduced within the network to convert frequency domain coefficients to the spatial domain, effectively embedding high-resolution DCT information into the face hallucination network. Lastly, this work investigates the benefits of incorporating audio signals in the video face hallucination task. Empirical evidence demonstrates that audio signals aid in retrieving lost visual information and maintaining visual consistency across consecutive frames. A novel lip-reading loss, inspired by visual speech recognition, is introduced, enabling the proposed architecture to generate facial images with fine texture details in areas such as the mouth and lips. Additionally, a frequency-based loss function is incorporated to effectively capture salient frequency features.

Description

Citation

Endorsement

Review

Supplemented By

Referenced By