Deep sound-field denoiser

K. Ishikawa et al. ``Deep sound-field denoiser: optically-measured sound-field denoising using deep neural network,'' Opt. Express, vol.31, no.20, pp.33405-20 (2023). [Paper] [GitHub]

This paper proposes a deep sound-field denoiser, a deep neural network (DNN) based denoising of optically measured sound-field images. Sound-field imaging using optical methods has gained considerable attention due to its ability to achieve high-spatial-resolution imaging of acoustic phenomena that conventional acoustic sensors cannot accomplish. However, the optically measured sound-field images are often heavily contaminated by noise because of the low sensitivity of optical interferometric measurements to airborne sound. Here, we propose a DNN-based sound-field denoising method. Time-varying sound-field image sequences are decomposed into harmonic complex-amplitude images by using a time-directional Fourier transform. The complex images are converted into two-channel images consisting of real and imaginary parts and denoised by a nonlinear-activation-free network. The network is trained on a sound-field dataset obtained from numerical acoustic simulations with randomized parameters. We compared the method with conventional ones, such as image filters and a spatiotemporal filter, on numerical and experimental data. The experimental data were measured by parallel phase-shifting interferometry and holographic speckle interferometry. The proposed deep sound-field denoiser significantly outperformed the conventional methods on both the numerical and experimental data.

Sound-field image dataset

We have created a sound-field image dataset containing 50,000 clean and noisy pairs of sound-field images in the frequency domain. The images are calculated by 2D acoustic simulation with randomized parameters. Each image represents complex-valued amplitude at a certain frequency. Two types of noises are considered, white noise and speckle noise. The histogram shows the SNRs of the noisy images.

Method

The sound-field dataset is used to train the network. The complex-valued amplitude images are converted into two-channel images with the real and imaginary parts, and the network learns noisy-to-clean image mapping. In the inference process, an input noisy time-domain image sequence, e.g., sound-field video, is transformed into the complex-valued amplitude images by time directional Fourier transform, followed by denoising by the trained network. By the inverse Fourier transform, a denoised sound-field video is obtained.

Results

The proposed method, Ours (W), is compared with several conventional denoising methods: Gaussian filter, median filter, Non-Local Mean, BM3D, windowed Fourier filter, and spatiotemporal bandpass filter. It is also compared with other DNN architectures: DnCNN and LRDUNet. The table above shows that the proposed method outperforms other methods for all conditions (N represents the number of sound sources in the acoustic simulation). Ours (W) shows more than 30 dB PSNR improvement.

The denoised images show that Ours (W) restores the sound fields very accurately from the noisy data, even when almost no sound wave can be recognized by eyes in the noisy data. We also confirmed the superiority of the proposed methods for actual data measured by parallel phase-shifting interferometer and speckle holographic interferometer.