Meta-heuristic Based Optimization of Deep Neural Networks
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Deep Learning (DL) has emerged out as the most important sub-area of machine learning (ML). It deals with the design and application of deep neural networks (DNN) which are multi-layered adaptations of artificial neural networks (ANN). A machine learning model is typically a formula that learns its parameters from the data but there are some higher level parameters, known as the „hyper-parameters‟ that cannot be learnt from the data. DNNs involve various hyper-parameters such as - number of layers and nodes, activation function, optimizer, regularization rate, loss function, etc. DNNs are architecturally complex and need to be trained on large data. There are enormous choices for their hyper-parameters and it is challenging to pick the best of them. Discovery of the suitable hyper-parameters is especially important for the DNNs implemented to recognize complex multimedia data that is being generated by various devices at a very high speed.
In this research work, traditional and meta-heuristic optimization approaches have been analyzed for DNN optimization. Convolutional and recurrent variants of DNNs have been implemented to recognize image objects and predict streaming data of the Indian stock market. Four experimental cases have been designed and Genetic Algorithm (GA) based approach is used to find the optimal hyper-parameter combination for DNN design. The proposed optimization process includes two phases. The first phase quickly returns the optimal set of hyper-parameters to design a DNN. It is applied to both Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). Compared to the traditional grid search based methods it has provided an average speed-up of 8 times for CNN and 6.5 times for RNN. The second phase has been applied only to RNNs deployed to process streaming data. It finds an appropriate subset of the training-data for near optimal prediction performance. The optimized RNN version has been experimentally observed to be 74.34% faster than single layered Long Short Term Memory (LSTM) architecture and 75.86% faster than the deep LSTM model. The decline in accuracy is 7.17% and 10.78% respectively.
