Ecological prediction at macroscales using big data: Does sampling design matter?

Patricia A Soranno; Kendra Spence Cheruvelil; Boyang Liu; Qi Wang; Pang-Ning Tan; Jiayu Zhou; Katelyn B S King; Ian M McCullough; Joseph Stachelek; Meridith Bartley; Christopher T Filstrup; Ephraim M Hanks; Jean-François Lapierre; Noah R Lottig; Erin M Schliep; Tyler Wagner; Katherine E Webster

doi:10.1002/eap.2123

Ecological prediction at macroscales using big data: Does sampling design matter?

Ecol Appl. 2020 Sep;30(6):e02123. doi: 10.1002/eap.2123. Epub 2020 Apr 27.

Authors

Patricia A Soranno¹, Kendra Spence Cheruvelil^{1

2}, Boyang Liu³, Qi Wang³, Pang-Ning Tan³, Jiayu Zhou³, Katelyn B S King¹, Ian M McCullough¹, Joseph Stachelek¹, Meridith Bartley⁴, Christopher T Filstrup⁵, Ephraim M Hanks⁴, Jean-François Lapierre⁶, Noah R Lottig⁷, Erin M Schliep⁸, Tyler Wagner⁹, Katherine E Webster¹

Affiliations

¹ Department of Fisheries and Wildlife, Michigan State University, 480 Wilson Road, East Lansing, Michigan, 48824, USA.
² Lyman Briggs College, Michigan State University, 919 East Shaw Lane, East Lansing, Michigan, 48825, USA.
³ Department of Computer Science and Engineering, Michigan State University, 428 South Shaw Lane, East Lansing, Michigan, 48824, USA.
⁴ Department of Statistics, The Pennsylvania State University, 324 Thomas Building, University Park, Pennsylvania, 16802, USA.
⁵ Natural Resources Research Institute, University of Minnesota Duluth, 5013 Miller Trunk Highway, Duluth, Minnesota, 55811, USA.
⁶ Sciences Biologiques, Universite de Montreal, Pavillon Marie-Victorin, CP 6128, succursale Centre-Ville, Montreal, Quebec, H3C 3J7, Canada.
⁷ Center for Limnology Trout Lake Station, University of Wisconsin Madison, Boulder Junction, Wisconsin, 54512, USA.
⁸ Department of Statistics, University of Missouri, 146 Middlebush Hall, Columbia, Missouri, 65211, USA.
⁹ U.S. Geological Survey, Pennsylvania Cooperative Fish and Wildlife Research Unit, Pennsylvania State University, Forest Resources Building, University Park, Pennsylvania, 16802, USA.

PMID: 32160362
DOI: 10.1002/eap.2123

Abstract

Although ecosystems respond to global change at regional to continental scales (i.e., macroscales), model predictions of ecosystem responses often rely on data from targeted monitoring of a small proportion of sampled ecosystems within a particular geographic area. In this study, we examined how the sampling strategy used to collect data for such models influences predictive performance. We subsampled a large and spatially extensive data set to investigate how macroscale sampling strategy affects prediction of ecosystem characteristics in 6,784 lakes across a 1.8-million-km² area. We estimated model predictive performance for different subsets of the data set to mimic three common sampling strategies for collecting observations of ecosystem characteristics: random sampling design, stratified random sampling design, and targeted sampling. We found that sampling strategy influenced model predictive performance such that (1) stratified random sampling designs did not improve predictive performance compared to simple random sampling designs and (2) although one of the scenarios that mimicked targeted (non-random) sampling had the poorest performing predictive models, the other targeted sampling scenarios resulted in models with similar predictive performance to that of the random sampling scenarios. Our results suggest that although potential biases in data sets from some forms of targeted sampling may limit predictive performance, compiling existing spatially extensive data sets can result in models with good predictive performance that may inform a wide range of science questions and policy goals related to global change.

Keywords: data-intensive ecology; ecological context; extrapolation; interpolation; lakes; macroscale; monitoring; prediction; sampling; sampling design.

Ecological prediction at macroscales using big data: Does sampling design matter?

Authors

Affiliations

Abstract

Publication types

MeSH terms

Grants and funding