vignettes/a3_coastwide_detailed.Rmd
a3_coastwide_detailed.RmdThis vignette highlights the ability to generate coastwide indices
for stocks using the indexwc package.
As a case study, we’ll focus on yellowtail rockfish. The package
includes yellowtail, which is the yellowtail data from the
WCGBTS 2003 – 2023.
library(dplyr)
library(indexwc)
library(purrr)
library(sdmTMB)
library(tibble)
options(repos = c(CRAN = "https://cloud.r-project.org"))First we’ll load in the configuration file. This step is not needed –
and the formulas, etc. may be specified from scratch, but this file is
used to keep models reproducible across assessment cycles. The
configuration file brings in specific settings by survey and species
that are used to subset the data and define the parameterization of the
index estimation. This file is included as both an rda file
that is loaded with the package but is also available for users who
download the source code in the data-raw folder (download
the package to see this).
# Load from the rda file
configuration_ytk <- indexwc::configuration |>
dplyr::filter(species == "yellowtail rockfish")The configuration file contains settings you may wish to change, including attributes of filters (years, latitudes, depths), and model components (anisotropy, knots, spatiotemporal effects, etc).
The configuration includes multiple entries for a single species for separate surveys and error distributions (e.g., delta-lognormal, delta-gamma, etc.).
In this example, we are going to only run a single-index for the Northwest Fisheries Science Center (NWFSC) West Coast Groundfish Bottom Trawl (WCGBT) survey (i.e., also referred to as the “NWFSC.Combo” survey in the data NWFSC data warehouse, or R package nwfscSurvey).
configuration_ytk_wcgbt <- configuration_ytk |>
dplyr::filter(source == "NWFSC.Combo" & family == "sdmTMB::delta_gamma()")For most indices developed on the west coast of the USA, the main
formula is usually something similar to the following, where
fyear represents a fixed year effect, and
pass_scaled is factor describing whether the haul is part
of pass 1 (early season) or pass 2 (late season).
There are two unique index formula’s available for yellowtail rockfish:
configuration_ytk_wcgbt[, "formula"]
#> [1] "catch_weight ~ 0 + fyear*split_mendocino + pass_scaled"
#> [2] "catch_weight ~ 0 + fyear + pass_scaled"we are going to use the default configuration. Please see the vignette on multiple area indices for an example run with the other formulation available for yellowtail rockfish.
configuration_to_run <- configuration_ytk_wcgbt |>
dplyr::filter(
formula == "catch_weight ~ 0 + fyear + pass_scaled"
)This block pulls trawl survey from the NWFSC data warehouse and applies filters (based on latitude, depth, and year) to each observation.
data("yellowtail")
data <- yellowtail |>
dplyr::filter(
depth <= configuration_to_run$min_depth,
depth >= configuration_to_run$max_depth,
latitude >= configuration_to_run$min_latitude,
latitude <= configuration_to_run$max_latitude,
year >= configuration_to_run$min_year,
year <= configuration_to_run$max_year
)The filtered data has several thousand less rows than the raw data.
Now we can use the indexwc package to fit the model. The
indexwc package acts as a wrapper for sdmTMB
here, combining the estimation process (sdmTMB()) with the
index generation (get_index()).
A key point to highlight is that in this code, we use
sdmTMBcontrol() to pass in a list that contains the
variables above we use to map off and initialize the parameters. For
datasets without missing values, you may not want to include this
sdmTMBcontrol() line, but it may be useful to adjust the
newton_loops, etc.
# the character string needs to be parsed before used as input
family_obj <- eval(rlang::parse_expr(configuration_to_run$family))
fit <- run_sdmtmb(
data = data,
family = family_obj,
formula = configuration_to_run$formula,
n_knots = configuration_to_run$knots,
share_range = configuration_to_run$share_range,
sdmtmb_control = sdmTMB::sdmTMBcontrol(newton_loops = 3)
)In the above example, dir is specified to be NULL, so
the fitted object will be returned. But if this is a directory name, it
will be created and used to save data and the model fit.
All diagnostics and plots are generated by the
diagnose() function. This function takes a fitted object
and prediction grid as input, and optionally a directory name
(dir) for saving results to.
For simplicity and speed, we can change simplify our previous yellowtail model to be spatial only.
family_obj <- eval(rlang::parse_expr(configuration_to_run$family))
fit_simple <- run_sdmtmb(
data = data,
family = family_obj,
formula = configuration_to_run$formula,
n_knots = configuration_to_run$knots,
share_range = configuration_to_run$share_range,
spatial = "off",
spatiotemporal = "off",
anisotropy = FALSE,
sdmtmb_control = sdmTMB::sdmTMBcontrol(newton_loops = 1)
)Next, we can run the diagnostics with the California Current prediction grid
pred_grid <- sdmTMB::replicate_df(california_current_grid,
time_name = "year",
time_values = unique(data$year)
)
pred_grid$fyear <- as.factor(pred_grid$year)
diagnostics <- diagnose(
fit = fit_simple,
prediction_grid = pred_grid
)If dir is specified, output and diagnostics for all
models and predictions are stored in a folder that indexwc
creates. If no directory is specified
diagnose(dir = NULL,...) then named list is returned.
Stepping through what elements are output:
The sdmTMB::sanity() checks can be accessed with
diagnostics$sanityThe model is returned with
diagnostics$modelThe formula is returned with
diagnostics$formulaThe log-likelihood and AIC are returned with
diagnostics$loglike
diagnostics$aicA tibble with fixed and random effects (with standard error and confidence intervals) is returned with
diagnostics$effectsThe SPDE mesh plot can be accessed with
diagnostics$mesh_plotThe QQ plot (appearing as a single plot when a non-delta family is used, or 2 plots when a delta family is used) is accessed with
diagnostics$qq_plotWhen anisotropy is not included (as in our model) no plot is available, but when included it can be seen by accessing
diagnostics$anisotropy_plotA plot of the fixed effects parameters can be accessed with
diagnostics$fixed_effects_plotDensity plots are made separately for each year, and stored 2 per page in a list of lists. They can be accessed with
diagnostics$density_plots[[10]]Maps are also made of the residuals. Accessing these can be a little funny, but is done with pagination
plot1 <- diagnostics$residual_maps_by_year[[1]]
ggforce::n_pages(plot1) # check number of pages
# View page 2
print(plot1 + ggforce::facet_wrap_paginate("year", nrow = 1, ncol = 2, page = 2))For reproducibility, the diagnostics list also includes a time stamp
diagnostics$dateand the session info with
diagnostics$session_infoThe original data is returned (with residual values added) as
diagnostics$data_with_residualsFinally, predictions on the survey grid (or other data passed in) are stored in a dataframe
diagnostics$predictionsLast, we can use the calc_index_areas function to
calculate indices, with associated uncertainty. We can calculate areas
for any region in the following list:
By default, the Coastwide index and 3 state indices are generated, but these indices may be calculated simultaneously (note: there is not really a speed up of doing several at the same time, because they are done sequentially). Again we’ll leave the ‘dir’ argument set to NULL and return everything
index <- calc_index_areas(
data = fit_simple$data,
fit = fit_simple,
prediction_grid = pred_grid,
boundaries = c("Coastwide"),
cog = TRUE
)The indices may be visualized by extracting the time series plot
index$plot_indicesor more custom code could be used to make a plot based off of
index$indicesSimilarly the center of gravity (COG) estimates are in
index$cogsAs demonstrated above, there are a lot of plots and diagnostics that
can be inspected, and in many cases we might want to save everything.
The save_index_outputs function in indexwc
will write everything locally (for the directory structure, see
?save_index_outputs)