Brief Communication: Three Errors and Two Problems in a Recent Paper: gazeNet : End-to-end Eye-movement Event Detection with Deep Neural Networks (Zemblys, Niehorster, and Holmqvist, 2019)
MetadataShow full metadata
A final version of this research is now published as of April 13, 2020 in Behavior Research Methods. Access to this article is through this link: Brief communication: Three errors and two problems in a recent paper: gazeNet: End-to-end eye-movement event detection with deep neural networks (Zemblys, Niehorster, and Holmqvist, 2019) https://rdcu.be/b3z4nZemblys et al. (Behavior Research Methods, 51(2), 840–864, 2019) reported on a method for the classification of eyemovements (“gazeNet”). I have found three errors and two problems with that paper that are explained herein. Error 1: The gazeNet classification method was built assuming that a hand-scored dataset from Lund University was all collected at 500 Hz, but in fact, six of the 34 recording files were actually collected at 200 Hz. Of the six datasets that were used as the training set for the gazeNet algorithm, two were actually collected at 200 Hz. Problem 1 has to do with the fact that even among the 500 Hz data, the inter-timestamp intervals varied widely. Problem 2 is that there are many unusual discontinuities in the saccade trajectories from the Lund University dataset that make it a very poor choice for the construction of an automatic classification method. Error 2 The gazeNet algorithm was trained on the Lund dataset, and then compared to other methods, not trained on this dataset, in terms of performance on this dataset. This is an inherently unfair comparison, and yet nowhere in the gazeNet paper is this unfairness mentioned. Error 3 arises out of the novel event-related agreement analysis employed by the gazeNet authors. Although the authors intended to classify unmatched events as either false positives or false negatives, many are actually being classified as true negatives. True negatives are not errors, and any unmatched event misclassified as a true negative is actually driving kappa higher, whereas unmatched events should be driving kappa lower.