Image and Video Denoising using DnCNN
Business Problem
With the increase in the number of digital images, the demand for pleasing and accurate images is increasing. However, the images captured by modern cameras get degraded by noise. Noise in an image is a distortion of colour information in images. Noise is the term to coin digital distortion. The image turns out to be noisier when captured at night. The case study attempts to build a predictive model which takes the noisy image as input and outputs its denoised counterpart.
Use of Deep Learning
This problem is based on Computer Vision. Advancements in Deep Learning like CNNs have been able to provide State-of-the-Art performance in Image Denoising. The model used to perform Image Denoising is DnCNN (Denoising Convolutional Neural Networks).
Dataset
Both BSD300 and BSD500 datasets were used as training data. BSD68 was used for validation data. Because of limited data, each image was used 4 times, i.e. scaling to [1.0, 0.7, 0.8, 0.7].
Each scaled image was split into patches of 50x50 with a stride of 20. Each patch was added a Gaussian Noise with a standard deviation between [1, 55]. The code for data generation is given below.
DnCNN Architecture
There are three types of layers in DnCNN-
- Conv + ReLU: filter size of 3, no of filters as 64, a stride of 1, using zero paddings to maintain the output shape after convolution, using ReLU as the activation function. The output is of shape (batch_size, 50, 50, 64)
- Conv + Batch Normalization + ReLU: filter size of 3, no of filters as 64, a stride of 1, using zero paddings to maintain the output shape after convolution, using Batch Normalization layer for better convergence, ReLU as the activation function. The output is of shape (batch_size, 50, 50, 64).
- Conv: filter size of 3, a stride of 1, no of filters as c (3 if a colour image, 1 if a grayscale image), using zero paddings to maintain output shape after convolution. The output shape is (batch_size, 50, 50, c).
The output of the DnCNN model is a residual image. Therefore, Original Image = Noise Image — Residual Image
In DnCNN, zeros are padded before convolution at each layer to make sure that each feature map of the middle layers has the same size as the input image. According to the paper, the simple zero-padding strategy does not result in any boundary artifacts.
The paper suggests the depth of 17, but this case study works on both depth = 12 and depth = 8.
Evaluation Metric
The evaluation metric is PSNR(Peak Signal to Noise Ratio) score. It is simply a numeric value that represents how good is a constructed denoise image compared to the original image.
Model Training
Batch Size = 128, Steps Per Epoch = 2000, epochs = 30, Adam Optimizer with learning rate decay
Results
PSNR on the BSD68 dataset is ~28 for s.d. 25 and ~25 for s.d. 50.
If depth = 12, PSNR on the BSD68 dataset is 28.30 for s.d. 25 and 26.13 for s.d. 50.
Application: Video Denoising
We can extend this idea to video frames. Each frame passed as input to the DnCNN model, and the resulting frame is passed to the Video Writer.
Future Work
We can further improve the performance of Video Denoising by using CNNs which could take a sequence of frames as input.
References
https://arxiv.org/pdf/1608.03981.pdf
Final Note
Thank You very much for reading my blog. I hope this blog helps you in your journey of Computer Vision. Feel free to comment with your thoughts :)
You can connect with me over Linkedin — https://www.linkedin.com/in/saproo-varun/
You can find my GitHub repo — https://github.com/saproovarun/DnCNN-Keras