Evaluating CloudResearch's Approved Group as a solution for problematic data quality on MTurk

doi:10.3758/s13428-022-01999-x

. 2023 Dec;55(8):3953-3964.

doi: 10.3758/s13428-022-01999-x. Epub 2022 Nov 3.

Evaluating CloudResearch's Approved Group as a solution for problematic data quality on MTurk

David J Hauser¹, Aaron J Moss², Cheskie Rosenzweig^{2

3}, Shalom N Jaffe^{2

4}, Jonathan Robinson^{2

5}, Leib Litman^{2

4}

Affiliations

¹ Department of Psychology, Queen's University, Kingston, ON, Canada. david.hauser@queensu.ca.
² CloudResearch, Queens, NY, USA.
³ Department of Clinical Psychology, Columbia University, New York, NY, USA.
⁴ Department of Psychology, Lander College, Flushing, NY, USA.
⁵ Department of Computer Science, Lander College, Flushing, NY, USA.

PMID: 36326997
PMCID: PMC10700412
DOI: 10.3758/s13428-022-01999-x

Evaluating CloudResearch's Approved Group as a solution for problematic data quality on MTurk

David J Hauser et al. Behav Res Methods. 2023 Dec.

. 2023 Dec;55(8):3953-3964.

doi: 10.3758/s13428-022-01999-x. Epub 2022 Nov 3.

Authors

David J Hauser¹, Aaron J Moss², Cheskie Rosenzweig^{2

3}, Shalom N Jaffe^{2

4}, Jonathan Robinson^{2

5}, Leib Litman^{2

4}

Affiliations

¹ Department of Psychology, Queen's University, Kingston, ON, Canada. david.hauser@queensu.ca.
² CloudResearch, Queens, NY, USA.
³ Department of Clinical Psychology, Columbia University, New York, NY, USA.
⁴ Department of Psychology, Lander College, Flushing, NY, USA.
⁵ Department of Computer Science, Lander College, Flushing, NY, USA.

PMID: 36326997
PMCID: PMC10700412
DOI: 10.3758/s13428-022-01999-x

Abstract

Maintaining data quality on Amazon Mechanical Turk (MTurk) has always been a concern for researchers. These concerns have grown recently due to the bot crisis of 2018 and observations that past safeguards of data quality (e.g., approval ratings of 95%) no longer work. To address data quality concerns, CloudResearch, a third-party website that interfaces with MTurk, has assessed ~165,000 MTurkers and categorized them into those that provide high- (~100,000, Approved) and low- (~65,000, Blocked) quality data. Here, we examined the predictive validity of CloudResearch's vetting. In a pre-registered study, participants (N = 900) from the Approved and Blocked groups, along with a Standard MTurk sample (95% HIT acceptance ratio, 100+ completed HITs), completed an array of data-quality measures. Across several indices, Approved participants (i) identified the content of images more accurately, (ii) answered more reading comprehension questions correctly, (iii) responded to reversed coded items more consistently, (iv) passed a greater number of attention checks, (v) self-reported less cheating and actually left the survey window less often on easily Googleable questions, (vi) replicated classic psychology experimental effects more reliably, and (vii) answered AI-stumping questions more accurately than Blocked participants, who performed at chance on multiple outcomes. Data quality of the Standard sample was generally in between the Approved and Blocked groups. We discuss how MTurk's Approval Rating system is no longer an effective data-quality control, and we discuss the advantages afforded by using the Approved group for scientific studies on MTurk.

Keywords: Data quality; Participant recruitment; Response bias; Test validity.

PubMed Disclaimer

Conflict of interest statement

Author DJH declares no competing financial or non-financial interests. Authors AJM, CR, SNJ, JR, and LL are employees of and receive salaries from CloudResearch.

Figures

**Fig. 1**
Store Location Effect in the Soda Task. Responses were rank transformed to minimize the impact of implausible answers. Higher numbers indicate a willingness to pay more for the soda. * indicates a significant difference at p < .001. Error bars show standard errors

**Fig. 2**
Squared Discrepancy Scores by Group. Z-transformed squared discrepancy scores range from 0 to 5 with higher scores indicating greater response consistency. Error bars show standard errors.

**Fig. 3**
Coders’ Judgments for the Image Identification Task. Coders judged whether participants accurately identified the content of three simple images and whether participants appeared to Google the answer. Error bars show standard errors

**Fig. 4**
Rank Transformed Estimates for the Population of Chicago. Responses were rank transformed to minimize the impact of implausible answers. Higher numbers indicate higher population estimates. * indicates a significant difference at p < .001. Error bars show standard errors

**Fig. 5**
Rank Transformed Estimates for the Math Problem. Responses were rank transformed to minimize the impact of implausible answers. Higher numbers indicate higher product estimates. * indicates a significant difference at p < .01. Error bars show standard errors

See this image and copyright information in PMC

References

1. Ahler, D. J., Roush, C. E., & Sood, G. (2019). The micro-task market for lemons: Data quality on Amazon’s Mechanical Turk. Political Science Research and Methods, 1–20. 10.1017/psrm.2021.57
1. Bai, H. (2018). Evidence that a large amount of low quality responses on MTurk can be detected with repeated GPS coordinates. Retrieved from: https://www.maxhuibai.com/blog/evidence-that-responses-from-repeating-gp...
1. Berinsky AJ, Margolis MF, Sances MW. Separating the shirkers from the workers? Making sure respondents pay attention on self-administered surveys. American Journal of Political Science. 2014;58(3):739–753. doi: 10.1111/ajps.12081. - DOI
1. Brandt, M. J., IJzerman, H., Dijksterhuis, A., Farach, F. J., Geller, J., Giner-Sorolla, R., ...Van’t Veer, A. (2014). The replication recipe: What makes for a convincing replication? Journal of Experimental Social Psychology, 50, 217–224.10.1016/j.jesp.2013.10.005
1. Buhrmester M, Kwang T, Gosling SD. Amazon's Mechanical Turk: A new source of inexpensive, yet high-quality data? Perspectives on Psychological Science. 2011;6:3–5. doi: 10.1177/1745691610393980. - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

[1] Ahler, D. J., Roush, C. E., & Sood, G. (2019). The micro-task market for lemons: Data quality on Amazon’s Mechanical Turk. Political Science Research and Methods, 1–20. 10.1017/psrm.2021.57

[2] Ahler, D. J., Roush, C. E., & Sood, G. (2019). The micro-task market for lemons: Data quality on Amazon’s Mechanical Turk. Political Science Research and Methods, 1–20. 10.1017/psrm.2021.57

[3] Bai, H. (2018). Evidence that a large amount of low quality responses on MTurk can be detected with repeated GPS coordinates. Retrieved from: https://www.maxhuibai.com/blog/evidence-that-responses-from-repeating-gp...

[4] Bai, H. (2018). Evidence that a large amount of low quality responses on MTurk can be detected with repeated GPS coordinates. Retrieved from: https://www.maxhuibai.com/blog/evidence-that-responses-from-repeating-gp...

[5] Berinsky AJ, Margolis MF, Sances MW. Separating the shirkers from the workers? Making sure respondents pay attention on self-administered surveys. American Journal of Political Science. 2014;58(3):739–753. doi: 10.1111/ajps.12081. - DOI

[6] Berinsky AJ, Margolis MF, Sances MW. Separating the shirkers from the workers? Making sure respondents pay attention on self-administered surveys. American Journal of Political Science. 2014;58(3):739–753. doi: 10.1111/ajps.12081. - DOI

[7] Brandt, M. J., IJzerman, H., Dijksterhuis, A., Farach, F. J., Geller, J., Giner-Sorolla, R., ...Van’t Veer, A. (2014). The replication recipe: What makes for a convincing replication? Journal of Experimental Social Psychology, 50, 217–224.10.1016/j.jesp.2013.10.005

[8] Brandt, M. J., IJzerman, H., Dijksterhuis, A., Farach, F. J., Geller, J., Giner-Sorolla, R., ...Van’t Veer, A. (2014). The replication recipe: What makes for a convincing replication? Journal of Experimental Social Psychology, 50, 217–224.10.1016/j.jesp.2013.10.005

[9] Buhrmester M, Kwang T, Gosling SD. Amazon's Mechanical Turk: A new source of inexpensive, yet high-quality data? Perspectives on Psychological Science. 2011;6:3–5. doi: 10.1177/1745691610393980. - DOI - PubMed

[10] Buhrmester M, Kwang T, Gosling SD. Amazon's Mechanical Turk: A new source of inexpensive, yet high-quality data? Perspectives on Psychological Science. 2011;6:3–5. doi: 10.1177/1745691610393980. - DOI - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Evaluating CloudResearch's Approved Group as a solution for problematic data quality on MTurk

Affiliations

Evaluating CloudResearch's Approved Group as a solution for problematic data quality on MTurk

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources

Miscellaneous