Image and Video Denoising using DnCNN

Varun Saproo
4 min readFeb 23, 2021

Business Problem

With the increase in the number of digital images, the demand for pleasing and accurate images is increasing. However, the images captured by modern cameras get degraded by noise. Noise in an image is a distortion of colour information in images. Noise is the term to coin digital distortion. The image turns out to be noisier when captured at night. The case study attempts to build a predictive model which takes the noisy image as input and outputs its denoised counterpart.

Use of Deep Learning

This problem is based on Computer Vision. Advancements in Deep Learning like CNNs have been able to provide State-of-the-Art performance in Image Denoising. The model used to perform Image Denoising is DnCNN (Denoising Convolutional Neural Networks).

Dataset

Both BSD300 and BSD500 datasets were used as training data. BSD68 was used for validation data. Because of limited data, each image was used 4 times, i.e. scaling to [1.0, 0.7, 0.8, 0.7].

Each scaled image was split into patches of 50x50 with a stride of 20. Each patch was added a Gaussian Noise with a standard deviation between [1, 55]. The code for data generation is given below.

Code to Generate Data

DnCNN Architecture

There are three types of layers in DnCNN-

DnCNN Architecture
  1. Conv + ReLU: filter size of 3, no of filters as 64, a stride of 1, using zero paddings to maintain the output shape after convolution, using ReLU as the activation function. The output is of shape (batch_size, 50, 50, 64)
  2. Conv + Batch Normalization + ReLU: filter size of 3, no of filters as 64, a stride of 1, using zero paddings to maintain the output shape after convolution, using Batch Normalization layer for better convergence, ReLU as the activation function. The output is of shape (batch_size, 50, 50, 64).
  3. Conv: filter size of 3, a stride of 1, no of filters as c (3 if a colour image, 1 if a grayscale image), using zero paddings to maintain output shape after convolution. The output shape is (batch_size, 50, 50, c).

The output of the DnCNN model is a residual image. Therefore, Original Image = Noise Image — Residual Image

In DnCNN, zeros are padded before convolution at each layer to make sure that each feature map of the middle layers has the same size as the input image. According to the paper, the simple zero-padding strategy does not result in any boundary artifacts.

The paper suggests the depth of 17, but this case study works on both depth = 12 and depth = 8.

Evaluation Metric

The evaluation metric is PSNR(Peak Signal to Noise Ratio) score. It is simply a numeric value that represents how good is a constructed denoise image compared to the original image.

Model Training

Batch Size = 128, Steps Per Epoch = 2000, epochs = 30, Adam Optimizer with learning rate decay

Results

Noise Image (25 Standard Deviations)
Denoised, depth = 8
Denoised, depth = 12

PSNR on the BSD68 dataset is ~28 for s.d. 25 and ~25 for s.d. 50.

If depth = 12, PSNR on the BSD68 dataset is 28.30 for s.d. 25 and 26.13 for s.d. 50.

Application: Video Denoising

We can extend this idea to video frames. Each frame passed as input to the DnCNN model, and the resulting frame is passed to the Video Writer.

Video Denoising

Future Work

We can further improve the performance of Video Denoising by using CNNs which could take a sequence of frames as input.

References

https://arxiv.org/pdf/1608.03981.pdf

Final Note

Thank You very much for reading my blog. I hope this blog helps you in your journey of Computer Vision. Feel free to comment with your thoughts :)

You can connect with me over Linkedin — https://www.linkedin.com/in/saproo-varun/

You can find my GitHub repo — https://github.com/saproovarun/DnCNN-Keras

--

--

Varun Saproo

I’m a Data Science Practitioner with a strong background in Computer Engineering. Looking forward to exciting opportunities in Data Science and Software Engg.