Overview

This vignette highlights the ability to generate coastwide indices for stocks using the indexwc package.

Yellowtail rockfish example

As a case study, we’ll focus on yellowtail rockfish

First we’ll load in the configuration file. This is bringing in specific settings by survey and species that are used to subset the data and define the parameterization of the index estimation. This file is included as both an rda file that is loaded with the package but is also available for users who download the source code in the data-raw folder (download the package to see this).

# Load from the rda file
configuration_ytk <- configuration |>
  dplyr::filter(species == "yellowtail rockfish")

The configuration file contains settings you may wish to change, including attributes of filters (years, latitudes, depths), and model components (anisotropy, knots, spatiotemporal effects, etc).

The configuration includes multiple entries for a single species for separate surveys and error distributions (e.g., delta-lognormal, delta-gamma, etc.).

In this example, we are going to only run a single-index for the Northwest Fisheries Science Center (NWFSC) West Coast Groundfish Bottom Trawl (WCGBT) survey (i.e., also referred to as the NWFSC.Combo survey in the data NWFSC data warehouse).

configuration_ytk_wcgbt <- configuration_ytk |>
  dplyr::filter(source == "NWFSC.Combo" & family == "sdmTMB::delta_gamma()")

Statistical model

For most indices developed on the west coast of the USA, the main formula is usually something similar to the following, where fyear represents a fixed year effect, and pass_scaled is factor describing whether the haul is part of pass 1 (early season) or pass 2 (late season).

There are two unique index formula’s available for yellowtail rockfish:

configuration_ytk_wcgbt[, "formula"]
#> # A tibble: 2 × 1
#>   formula                                               
#>   <chr>                                                 
#> 1 catch_weight ~ 0 + fyear*split_mendocino + pass_scaled
#> 2 catch_weight ~ 0 + fyear + pass_scaled

we are going to use the default configuration. Please see the vignette on multiple area indices for an example run with the other formulation available for yellowtail rockfish.

configuration_to_run <- configuration_ytk_wcgbt |>
  dplyr::filter(
    formula == "catch_weight ~ 0 + fyear + pass_scaled"
  )

Preparing the data

This block pulls trawl survey from the NWFSC data warehouse and applies filters (based on latitude, depth, and year) to each observation.

data <- configuration_to_run |>
  dplyr::rowwise() |>
  dplyr::mutate(
    data_raw = list(format_data(eval(parse(text = fxn)))),
    data_filtered = list(data_raw |>
      dplyr::filter(
        depth <= min_depth, depth >= max_depth,
        latitude >= min_latitude, latitude <= max_latitude,
        year >= min_year, year <= max_year
      ))
  ) |>
  dplyr::ungroup()

The filtered data that will be used in the model can be viewed by examining data$data_filtered[1].

Fitting the model with indexwc and sdmTMB

Now we can use the indexwc package to fit the model. The indexwc package acts as a wrapper for sdmTMB here, combining the estimation process (sdmTMB()) with the index generation (get_index()).

This code is optimized to fit multiple models (across stocks / species, or model configurations). In our case, the configurations dataframe has 1 row – but this could include multiple rows. A key point to highlight is that in this code, we use sdmTMBcontrol() to pass in a list that contains the variables above we use to map off and initialize the parameters. For datasets without missing values, you may not want to include this sdmTMBcontrol() line, but it may be useful to adjust the newton_loops, etc.

Note: this code will fit a single model for each row

best <- data |>
  dplyr::mutate(
    # Evaluate the call in family
    family = purrr::map(family, .f = ~ eval(parse(text = .x))),
    # Run the model on each row in data
    results = purrr::pmap(
      .l = list(
        data = data_filtered,
        formula = formula,
        family = family,
        anisotropy = anisotropy,
        n_knots = knots,
        share_range = share_range,
        spatiotemporal = purrr::map2(spatiotemporal1, spatiotemporal2, list)
      ),
      .f = indexwc::run_sdmtmb
    )
  )

The above script saves the model run in the working directory. If you would like to save the output to a different location, the dir_main argument should be specified in the list passed to the function.

Examining the index and diagnostics

Output and diagnostics for all models and predictions are stored in a folder that indexwc creates; it’s location is “yellowtail_rockfish/wcgbts/delta_gamma..”. The “data” folder contains the raw data used for estimation (data.rdata) and a plot of the SPDE mesh is in mesh.png. The “index” folder contains diagnostics and predictions for 10 sets of indices. These 10 indices represent (1) N/S of Cape Mendocino, (2) N/S of Monterrey, (3) N/S of Pt Conception, a coastwide index, and state-specific indices. Plots include:

  • density plots (density01.png, density02.png, …) which represent predictions for pairs of years

  • a coastwide index (index_coastwide.png)

  • fixed effect parameters (parameters_fixed_effects.png)

  • QQ plot (qq.png)

  • anisotropy plot of the presence/absence and catch rate models for the spatial and spatiotemporal fields (anisotropy.png)

  • residual plots (residuals_1_01.png, residuals_1_02.png, …, residuals_2_01.png, residuals_2_02.png,…) where pairs of years are shown, and separate residual plots are made for the presence-absence and positive model components

Additional output includes:

  • indices generated by area (est_by_area.csv), the year, est, and se columns are the needed inputs for SS3 filtered down to the appropriate area

  • the dataframe of sdmTMB::sanity() checks (in sanity_data_frame.csv)

  • text file with the AIC and NLL (aic_nll.txt)

  • text file with TRUE/FALSE for the hessian (hess_logical.txt)

  • Rdata file with model fits, all data objects, and parameter estimates (sdmTMB_save.Rdata)

  • Rdata file with model predictions (predictions.Rdata)

If all the diagnostic plots are not created, this often means that parameter warnings were generated during the model fitting. To see what those warnings, the ignore_fit.rds file can be loaded into R and calling sdmTMB::sanity(fit) will output the parameter warnings.