A call recognition approach for endangered or threatened chorusing amphibian species using deep learning architectures
MetadataShow full metadata
Audio signal analysis has become prominent in biological domains for detecting endangered or threatened species like Houston toad and Crawfish frog. Researchers at Texas State University and Texas A&M University are working on a project to steward these species and understanding the causes of their decline. The researchers are currently using an Automated Recording Device (ARD), the Toadphone 1, which is an embedded solution. The hardware platform can perform detection tasks without human interruption and can provide near real-time notification. However, this device’s predictive model for the software solution has limited success to serve the primary purpose for which it was developed, which is to provide proper identification of Houston toad calls. Also, the current predictive model for Toadphone 1 was only designed for the Houston toad calls. There is another near-threatened chorusing amphibian, the Crawfish frog, which has become a concern of the researchers working to protect this species.
This thesis research experimented with a modified predictive model for the existing Toadphone 1 software solution, predicting a Houston toad call with decreased false-positive rates. The model can also perform the call recognition task for Crawfish frog calls. This work used the audio data for Houston toad and Crawfish frog collected by the Department of Biology to train the predictive model. Before training, the audio data spectrum was studied to find the frequency range of Houston toad and Crawfish frog call. Next, the audio data have been iteratively preprocessed using digital filters and then applying framing, the Hamming window function to each frame. Mel-frequency Cepstral Coefficients (MFCCs) with their first and second derivatives or Spectral Sub-band Centroids (SSCs) or Mel-spectrograms audio features have been extracted for each frame. These features were used to train the predictive or classification model for Houston toad or Crawfish frog call prediction. Advanced Recurrent Neural Network (RNN) algorithms such as Long Short-Term Memory unit (LSTM) or Gated Recurrent Unit (GRU) and Convolutional Neural Network (CNN) were utilized, which are sub-fields of deep learning network architectures. Several model architectures were experimented with using different combinations of classifiers and audio features with tuned hyperparameters to build the best predictive model. The voting mechanism of ensemble learning was developed to make the final prediction from the three-best models. Lastly, the predictive model was evaluated on a near real-time prediction system.