Image and Video Denoising using DnCNN

4 min readFeb 23, 2021

Business Problem

With the increase in the number of digital images, the demand for pleasing and accurate images is increasing. However, the images captured by modern cameras get degraded by noise. Noise in an image is a distortion of colour information in images. Noise is the term to coin digital distortion. The image turns out to be noisier when captured at night. The case study attempts to build a predictive model which takes the noisy image as input and outputs its denoised counterpart.

Use of Deep Learning

This problem is based on Computer Vision. Advancements in Deep Learning like CNNs have been able to provide State-of-the-Art performance in Image Denoising. The model used to perform Image Denoising is DnCNN (Denoising Convolutional Neural Networks).

Dataset

Both BSD300 and BSD500 datasets were used as training data. BSD68 was used for validation data. Because of limited data, each image was used 4 times, i.e. scaling to [1.0, 0.7, 0.8, 0.7].

Each scaled image was split into patches of 50x50 with a stride of 20. Each patch was added a Gaussian Noise with a standard deviation between [1, 55]. The code for data generation is given below.

Code to Generate Data

DnCNN Architecture

There are three types of layers in DnCNN-

Conv + ReLU: filter size of 3, no of filters as 64, a stride of 1, using zero paddings to maintain the output shape after convolution, using ReLU as the activation function. The output is of shape (batch_size, 50, 50, 64)
Conv + Batch Normalization + ReLU: filter size of 3, no of filters as 64, a stride of 1, using zero paddings to maintain the output shape after convolution, using Batch Normalization layer for better convergence, ReLU as the activation function. The output is of shape (batch_size, 50, 50, 64).
Conv: filter size of 3, a stride of 1, no of filters as c (3 if a colour image, 1 if a grayscale image), using zero paddings to maintain output shape after convolution. The output shape is (batch_size, 50, 50, c).

The output of the DnCNN model is a residual image. Therefore, Original Image = Noise Image — Residual Image

In DnCNN, zeros are padded before convolution at each layer to make sure that each feature map of the middle layers has the same size as the input image. According to the paper, the simple zero-padding strategy does not result in any boundary artifacts.

The paper suggests the depth of 17, but this case study works on both depth = 12 and depth = 8.

Evaluation Metric

The evaluation metric is PSNR(Peak Signal to Noise Ratio) score. It is simply a numeric value that represents how good is a constructed denoise image compared to the original image.