Chapter 5 Evaluation
Like many ecological models, NYBEM relies on multiple variables and ecological processes, and unsurprisingly, the complexity of the model escalated to include many components, inputs, assumptions, and modules. Model evaluation is the overaching process for ensuring that numerical tools are scientifically defensible and transparently developed. Evaluation is often referred to as verification or validation, but it in fact includes a family of methods ranging from peer review to model testing to error checking (Schmolke et al. 2010). In this more general sense, evaluation should include the following (Grant and Swannack 2008): (1) assessing the reasonableness of model structure, (2) assessing functional relationships and verifying code, (3) evaluating model behavior relative to expected patterns, (4) comparing outcomes to empirical data, if possible, and (5) analyzing uncertainty in predictions. The USACE has established an ecological model certification process to ensure that planning models are sound and functional. This review process evaluates tools relative to three categories: system quality, technical quality, and usability (EC 1105-2-412).
5.1 System Quality
Ecological models must not only maintain an appropriate theoretical and technical basis, but also must be computationally accurate (S. McKay, Richards, and Swannack 2022). System quality refers to the computational integrity of a model (or modeling system). For instance, is the tool appropriately programmed, has it been verified or stress-tested, and do outcomes behave in expected ways? System quality has three primary phases for avoiding errors (quality assurance), detecting errors through formal testing (quality control), and updating models based on review and use (model update) (S. McKay, Richards, and Swannack 2022), and NYBEM was evaluated relative to each.
Quality assurance procedures were central to the NYBEM development process. First, models were version controlled via github. Second, model documentation was developed in parallel to code and also version controlled via github. Third, models were developed modularly, which facilitated separate testing and independent applications. Fourth, the computational engine of all six ecosystem-specific models flowed through two central functions (SIcalc
and HSIcalc
), which reduced versioning issues. Finally, these two main functions were adapted from a prior habitat modeling package, ecorest
.
Additionally, quality control procedures were applied to find and correct any errors. Model coding was primarily done by one team member (MPD) and then quality controlled by another (SKM), both of whom have over 10-years experience with R-programming. All functions in the nybem
package were also formally tested during every build of the model following industry best practice using the R package testthat
. Testing frameworks like testthat
allow a battery of custom tests to be created and run regularly as development proceeds to validate behavior as new capabilities are added and modified. Each function was tested with known extreme values (e.g., suitability index = 0) and other pertinent tests. Tests are recorded within a dedicated folder within the package itself for ease of access and long-term archival. This battery of tests were run iteratively during the development of each function using the devtools::test
function. R packages should also be subjected to regular diagnostic checks using the definitive R CMD check
function provided as a part of R. This large battery of tests verifies that the structure of your package is correct and is ready to be distributed and used by other users.
Documentation is a standard component of R packages to ensure that all components are clearly described for users. All model functions were documented using the strict requirements for R packaging. Vignettes (R user guides) were included to help users get started with using the model. A website of the package documentation was created using pkgdown
.
Model errors are often uncovered during peer review and/or applications (i.e., “bugs”), which can be particularly important for large-scale or complex models. The model and associated report were reviewed through USACE certification procedures, and review comments are archived here for future reference (Appendix B). The report and package will both remain available and updated through github, and any additional bugs will be amended through this long-term model update and archival process.
5.2 Technical Quality
The technical quality of a model is assessed relative to its reliance on contemporary theory, consistency with design objectives, and degree of verification and validation against independent field data. To date, NYBEM’s technical quality has revolved around three main techniques: reliance on standard methods, adaptation of existing models, and informal model review by technical stakeholders.
The overarching approach for the NYBEM uses a habitat-style, quantity-quality assessment framework. This index-based model type has been applied extensively to assess outcomes ranging from the species to ecosystem levels of hierarchy (e.g., habitat evaluation procedures and the hydrogeomorphic methods, respectively). The framework is common within the management context of the USACE and familiar to decision-makers, reviewers, and partner agencies.
The NYBEM has also drawn heavily from existing analytical models for coastal ecosystems in the region. Suitability curves were directly adopted or adapted from existing peer-reviewed models whenever possible. For instance, the marine intertidal model alone adopted suitability curves, variables, or methods from Farmer et al. (2000), USACE (2009), Avissar (2006), Nancy Jackson, Karl Nordstorm, and David R. Smith (2010), Lathrope et al. (2013), Brady and Schrading (1996), Sellars and Jolls (2007), Dunkin et al. (2016), and others. When adaptation was not feasible, methods and suitability thresholds were sought in the peer-reviewed literature (e.g., light attenuation for seagrass from Short et al. (2002)).
Transparent modeling methods and mediated modeling workshops also provide a source of technical quality for the NYBEM. A series of meetings were held with technical stakeholders representing diverse disciplines, agencies, and geographies (Appendix A). Each meeting provided attendees with opportunities to share resources, data, and ideas as well as steer the direction of model development in terms of recommending variables.
Additional examinations of technical quality should be pursued systematically to further evaluate the NYBEM. Some key elements of evaluation that should be explored include:
Habitat zones and quality should be compared to existing habitat maps for the region. For instance, the National Wetlands Inventory and analogous state data sets could provide important source of model verification data for the estuarine intertidal zone. Similarly, submerged aquatic vegetation and shellfish data sets may provide important tools for verification of estuarine subtidal models.
Sensitivity analyses could be conducted to explore the relative importance of model variables and thresholds in suitability curves.
Additional mediated modeling workshops could be pursued to continue obtaining input from regional experts.
5.3 Usability
The usability of a model can influence the repeatable and transparent application of a tool. This type of evaluation typically examines the ease of use, availability of inputs, transparency, error potential, and education of the user. As such, defining the intended user(s) is a crucial component of assessing usability. The NYBEM was developed for application by the USACE technical team of the NJBB and HATS studies. The tool is not currently intended for broader application by local sponsors, other regional teams, or by other USACE Districts. As such, there is currently no graphical user interface for the model beyond the script itself.
To date, the primary mechanism for maintaining usability is code sharing and use of reproducible research methods. R is a primary coding language for a substantial and growing proportion of research experts developing novel statistical algorithms. CRAN (Comprehensive R Archive Network) makes it simple to learn from others’ work, share your own, and receive comments on possible changes. Packaging the NYBEM improves usability by providing a system for developing software that includes documentation and tests, resulting in higher software quality and development productivity. Similarly, this document was developed with Rmarkdown to facilitate model adaptation by others.
Another simple, but important, aspect of NYBEM’s usability is the flexibility of data inputs. The model can use hydrodynamic outcomes based on empirical or modeled data, and the major environmental inputs largely derive from regional and national data sets.
Future enhancements to model usability could be pursued through means such as:
Open data sharing of all site-specific inputs and outputs for interrogation by internal and external partners.
Interactive data visualization platforms could provide an important mechanisms for reducing the barrier-to-entry on post-processing NYBEM data and increasing the utility of the tool for end-users.
Mediated modeling workshops could be expanded to address usability by both gaging user needs but also providing informal training on model application.