Herpetofauna Species Classification from Camera Trap Images Using Deep Neural Network for Conservation Monitoring
MetadataShow full metadata
Protection of endangered species requires continuous monitoring and updated information about the existence, location, and behavioral alterations in their habitat. Remotely activated camera or “camera traps” represent a reliable and effective photo documentation method of local population size, locomotion, and predator-prey relationships of wild species. However, Species recognition from gathered images is a challenging assignment due to a large amount of intra-class variability, viewpoint variation, lighting illumination, occlusion, background clutter, and deformation. Manual data processing from large volume of images and captured video is laborious, time consuming, and expensive. There is an urgent need to establish a framework of automated wildlife species recognition by image classification. The recent advancement of deep learning methods has demonstrated significant outcomes for object and species identification in images. This thesis proposes an automated animal species recognition system by image classification using computer vision algorithms and machine learning techniques. The goal is to train and validate a convolutional neural network (CNN) architecture that will classify three herpetofauna species: snake, lizard, and toad from the camera trap samples.
The proposed solution offers two self-trained deep convolutional neural network (DCNN) classification algorithms CNN-1 and CNN-2, to solve binary and multiclass problem. The machine learning block of both architectures is same for the CNN-1 and CNN-2, while CNN-2 has been incorporated with several data augmentation processes such as rotation, zoom, flip, and shift to the existing samples during the training period. Also, the impact of changing CNN parameters, optimizers, and regularization techniques on classification accuracy is investigated in this study. The initial experiment implies building a flexible binary and multiclass CNN architecture with labeled images accumulated from several online sources. Once the baseline model is formulated and tested with satisfactory accuracy, new camera trap imagery data is executed to the model for recognition purpose. All three species have classified individually regarding background samples to distinguish the presence of target species in a camera trap dataset. The performance is evaluated based on the classification accuracy within their group using two separate sets of validation and testing data. In the end, both models have tested to predict the category of a new example to compare the models' generalization ability with a challenging camera trap data.