Detecting computer-generated random responding in questionnaire-based data: A comparison of seven indices

Behav Res Methods. 2019 Oct;51(5):2228-2237. doi: 10.3758/s13428-018-1103-y.

Abstract

With the development of online data collection and instruments such as Amazon's Mechanical Turk (MTurk), the appearance of malicious software that generates responses to surveys in order to earn money represents a major issue, for both economic and scientific reasons. Indeed, even if paying one respondent to complete one questionnaire represents a very small cost, the multiplication of botnets providing invalid response sets may ultimately reduce study validity while increasing research costs. Several techniques have been proposed thus far to detect problematic human response sets, but little research has been undertaken to test the extent to which they actually detect nonhuman response sets. Thus, we proposed to conduct an empirical comparison of these indices. Assuming that most botnet programs are based on random uniform distributions of responses, we present and compare seven indices in this study to detect nonhuman response sets. A sample of 1,967 human respondents was mixed with different percentages (i.e., from 5% to 50%) of simulated random response sets. Three of the seven indices (i.e., response coherence, Mahalanobis distance, and person-total correlation) appear to be the best estimators for detecting nonhuman response sets. Given that two of those indices-Mahalanobis distance and person-total correlation-are calculated easily, every researcher working with online questionnaires could use them to screen for the presence of such invalid data.

Keywords: Botnet; Functional method; Mahalanobis distance; Mechanical Turk; Person–total correlation; Random responding; Response coherence.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computers
  • Humans
  • Surveys and Questionnaires*