With the exponential growth in unmanned aerial vehicle (UAV)-based applications, there is a need to ensure safe and secure operations. From a security perspective, detecting and localizing intruder UAVs is still a challenge. It is even more challenging to accurately estimate the number of intruder UAVs on the scene. In this work, we propose a simple acoustic-based technique to detect and estimate the number of UAVs. Our method utilizes acoustic signals generated from the motion of UAV motors and propellers. Acoustic signals are captured by flying an arbitrary number of ten UAVs in different combinations in an indoor setting. The recorded acoustic signals are trimmed, processed, and arranged to create an UAV audio dataset. The UAV audio dataset is subjected to time-frequency transformations to generate audio spectrogram images. The generated spectrogram images are then fed to a custom lightweight convolutional neural network (CNN) architecture to estimate the number of UAVs in the scene. Following training, the proposed model achieves an average test accuracy of 93.33% as compared to state-of-the-art benchmark models. Furthermore, the deployment feasibility of the proposed model is validated by running inference time calculations on edge computing devices, such as the Raspberry Pi 4, NVIDIA Jetson Nano, and NVIDIA Jetson AGX Xavier.
© 2023 Acoustical Society of America.