TY - JOUR
T1 - Statistical solutions for error and bias in global citizen science datasets
AU - Bird, Tomas J.
AU - Bates, Amanda E.
AU - Lefcheck, Jonathan S.
AU - Hill, Nicole A.
AU - Thomson, Russell J.
AU - Edgar, Graham J.
AU - Stuart-Smith, Rick D.
AU - Wotherspoon, Simon
AU - Krkosek, Martin
AU - Stuart-Smith, Jemina F.
AU - Pecl, Gretta T.
AU - Barrett, Neville
AU - Frusher, Stewart
PY - 2014
Y1 - 2014
N2 - Networks of citizen scientists (CS) have the potential to observe biodiversity and species distributions at global scales. Yet the adoption of such datasets in conservation science may be hindered by a perception that the data are of low quality. This perception likely stems from the propensity of data generated by CS to contain greater levels of variability (e.g., measurement error) or bias (e.g., spatio-temporal clustering) in comparison to data collected by scientists or instruments. Modern analytical approaches can account for many types of error and bias typical of CS datasets. It is possible to (1) describe how pseudo-replication in sampling influences the overall variability in response data using mixed-effects modeling, (2) integrate data to explicitly model the sampling process and account for bias using a hierarchical modeling framework, and (3) examine the relative influence of many different or related explanatory factors using machine learning tools. Information from these modeling approaches can be used to predict species distributions and to estimate biodiversity. Even so, achieving the full potential from CS projects requires meta-data describing the sampling process, reference data to allow for standardization, and insightful modeling suitable to the question of interest.
AB - Networks of citizen scientists (CS) have the potential to observe biodiversity and species distributions at global scales. Yet the adoption of such datasets in conservation science may be hindered by a perception that the data are of low quality. This perception likely stems from the propensity of data generated by CS to contain greater levels of variability (e.g., measurement error) or bias (e.g., spatio-temporal clustering) in comparison to data collected by scientists or instruments. Modern analytical approaches can account for many types of error and bias typical of CS datasets. It is possible to (1) describe how pseudo-replication in sampling influences the overall variability in response data using mixed-effects modeling, (2) integrate data to explicitly model the sampling process and account for bias using a hierarchical modeling framework, and (3) examine the relative influence of many different or related explanatory factors using machine learning tools. Information from these modeling approaches can be used to predict species distributions and to estimate biodiversity. Even so, achieving the full potential from CS projects requires meta-data describing the sampling process, reference data to allow for standardization, and insightful modeling suitable to the question of interest.
KW - biodiversity
KW - reef organisms
KW - statistical analysis
KW - surveys
KW - volunteers
UR - http://handle.uws.edu.au:8081/1959.7/uws:35679
U2 - 10.1016/j.biocon.2013.07.037
DO - 10.1016/j.biocon.2013.07.037
M3 - Article
SN - 0006-3207
VL - 173
SP - 144
EP - 154
JO - Biological Conservation
JF - Biological Conservation
ER -