a1_single_index.Rmd
This vignette highlights the ability to generate coastwide indices
for stocks using the indexwc
package.
As a case study, we’ll focus on yellowtail rockfish
First we’ll load in the configuration file. This is bringing in
specific settings by survey and species that are used to subset the data
and define the parameterization of the index estimation. This file is
included as both an rda
file that is loaded with the
package but is also available for users who download the source code in
the data-raw
folder (download the package to see this).
# Load from the rda file
configuration_ytk <- configuration |>
dplyr::filter(species == "yellowtail rockfish")
The configuration file contains settings you may wish to change, including attributes of filters (years, latitudes, depths), and model components (anisotropy, knots, spatiotemporal effects, etc).
The configuration includes multiple entries for a single species for separate surveys and error distributions (e.g., delta-lognormal, delta-gamma, etc.).
In this example, we are going to only run a single-index for the Northwest Fisheries Science Center (NWFSC) West Coast Groundfish Bottom Trawl (WCGBT) survey (i.e., also referred to as the NWFSC.Combo survey in the data NWFSC data warehouse).
configuration_ytk_wcgbt <- configuration_ytk |>
dplyr::filter(source == "NWFSC.Combo" & family == "sdmTMB::delta_gamma()")
For most indices developed on the west coast of the USA, the main
formula is usually something similar to the following, where
fyear
represents a fixed year effect, and
pass_scaled
is factor describing whether the haul is part
of pass 1 (early season) or pass 2 (late season).
There are two unique index formula’s available for yellowtail rockfish:
configuration_ytk_wcgbt[, "formula"]
#> # A tibble: 2 × 1
#> formula
#> <chr>
#> 1 catch_weight ~ 0 + fyear*split_mendocino + pass_scaled
#> 2 catch_weight ~ 0 + fyear + pass_scaled
we are going to use the default configuration. Please see the vignette on multiple area indices for an example run with the other formulation available for yellowtail rockfish.
configuration_to_run <- configuration_ytk_wcgbt |>
dplyr::filter(
formula == "catch_weight ~ 0 + fyear + pass_scaled"
)
This block pulls trawl survey from the NWFSC data warehouse and applies filters (based on latitude, depth, and year) to each observation.
data <- configuration_to_run |>
dplyr::rowwise() |>
dplyr::mutate(
data_raw = list(format_data(eval(parse(text = fxn)))),
data_filtered = list(data_raw |>
dplyr::filter(
depth <= min_depth, depth >= max_depth,
latitude >= min_latitude, latitude <= max_latitude,
year >= min_year, year <= max_year
))
) |>
dplyr::ungroup()
The filtered data that will be used in the model can be viewed by
examining data$data_filtered[1]
.
Now we can use the indexwc
package to fit the model. The
indexwc
package acts as a wrapper for sdmTMB
here, combining the estimation process (sdmTMB()
) with the
index generation (get_index()
).
This code is optimized to fit multiple models (across stocks /
species, or model configurations). In our case, the configurations
dataframe has 1 row – but this could include multiple rows. A key point
to highlight is that in this code, we use sdmTMBcontrol()
to pass in a list that contains the variables above we use to map off
and initialize the parameters. For datasets without missing values, you
may not want to include this sdmTMBcontrol()
line, but it
may be useful to adjust the newton_loops
, etc.
Note: this code will fit a single model for each row
best <- data |>
dplyr::mutate(
# Evaluate the call in family
family = purrr::map(family, .f = ~ eval(parse(text = .x))),
# Run the model on each row in data
results = purrr::pmap(
.l = list(
data = data_filtered,
formula = formula,
family = family,
anisotropy = anisotropy,
n_knots = knots,
share_range = share_range,
spatiotemporal = purrr::map2(spatiotemporal1, spatiotemporal2, list)
),
.f = indexwc::run_sdmtmb
)
)
The above script saves the model run in the working directory. If you
would like to save the output to a different location, the
dir_main
argument should be specified in the list passed to
the function.
Output and diagnostics for all models and predictions are stored in a
folder that indexwc
creates; it’s location is
“yellowtail_rockfish/wcgbts/delta_gamma..”. The “data” folder contains
the raw data used for estimation (data.rdata
) and a plot of
the SPDE mesh is in mesh.png
. The “index” folder contains
diagnostics and predictions for 10 sets of indices. These 10 indices
represent (1) N/S of Cape Mendocino, (2) N/S of Monterrey, (3) N/S of Pt
Conception, a coastwide index, and state-specific indices. Plots
include:
density plots (density01.png
,
density02.png
, …) which represent predictions for pairs of
years
a coastwide index (index_coastwide.png
)
fixed effect parameters
(parameters_fixed_effects.png
)
QQ plot (qq.png
)
anisotropy plot of the presence/absence and catch rate models for
the spatial and spatiotemporal fields
(anisotropy.png
)
residual plots (residuals_1_01.png
,
residuals_1_02.png
, …, residuals_2_01.png
,
residuals_2_02.png
,…) where pairs of years are shown, and
separate residual plots are made for the presence-absence and positive
model components
Additional output includes:
indices generated by area (est_by_area.csv
), the
year
, est
, and se
columns are the
needed inputs for SS3 filtered down to the appropriate area
the dataframe of sdmTMB::sanity()
checks (in
sanity_data_frame.csv
)
text file with the AIC and NLL
(aic_nll.txt
)
text file with TRUE/FALSE for the hessian
(hess_logical.txt
)
Rdata file with model fits, all data objects, and parameter
estimates (sdmTMB_save.Rdata
)
Rdata file with model predictions
(predictions.Rdata
)
If all the diagnostic plots are not created, this often means that
parameter warnings were generated during the model fitting. To see what
those warnings, the ignore_fit.rds
file can be loaded into
R and calling sdmTMB::sanity(fit)
will output the parameter
warnings.