2 Modeling choices, notes and best practices

2.1 Selecting maximum age (population and data)

The comments below are based on a team discussion on how to select a population and data maximum age for modeling:

Choice 1, Population length bins: should extend beyond something like the 97.5% quantile of the length at age distribution for the largest age (easy to add more bins to see if it makes any difference).

Choice 2, Max age: should be large enough for growth to be asymptotic, and at least as big as the largest data age bins. It’s easy to test the impact of changing to different value, just requires updating ageing error matrices. Look at the stable age-distribution plot and examine the number of fish in the plus group. Additionally, the period of time when data are available (after extensive exploitation), the data plus group may be lower.

Choice 3, Data length bins should probably be big enough that the plus group doesn’t have much more in it than the previous bin. We should check to make sure that the selection of length and age bins are consistent with each other. Typically, we often have max length bins where there are only a few fish, but a larger proportion of data in the data age plus group.

Choice 4, Data age bins should probably be big enough that the plus group doesn’t have much more in it than the previous bin. In regards to population maximum age, there is no negative repercussions within SS3 for having a larger value, beyond a slower run time. Revising the age bins based on the data, rather than a priori rules about how to set this, may be considered “data mining”. Could create a check within the survey and PacFIN comp codes that creates a flag when the data plus group has more than a certain percentage (i.e., 5%). Also, add a warning to {r4ss} about the percentage in the population and data plus groups.

2.2 Determining prior for natural mortality

Owen’s Advice on $M$; July 6, 2022

I prefer using age data alone to estimate the natural mortality rate (see accompanying document: M2017_for_Methods_Review_Hamel; Hamel and Cope (in review)), except in cases where getting a reasonable estimate of the maximum age is problematic.
The age based prior is simply: \[\text{lognormal}(\ln(5.4/\text{max age}), 0.31).\]
The fixed value for $M$ if not estimating is even more simply the median of that prior: \[M = 5.4/\text{max age}\]
Can explore a range approaches for M estimates and priors here; reference is Cope and Hamel, in press.

2.3 Length-weight relationships

Use the nwfscSurvey::estimate_weight_length(). It is common to rely only on data from the WCGBT Survey to get the parameters used in the model under the assumption that this survey is most representative of the population as a whole. These parameters should always be fixed in the model as no data type is available in SS3 to accurately estimate them internally. Other data sources (e.g., PacFIN if weights are available, etc.) can be passed to this function to estimate weight-length relationships. The col_length and col_weight arguments should be used to specify the specific string used to identify these quantities in your data.

2.4 Estimating ageing error

See the {AgeingError} package.

2.5 Maturity

Talk to Melissa Head about maturity. In 2017, she provided estimated maturity curves based on samples from the NWFSC survey for many species.

Data from the NWFSC survey on maturity includes a column indicating mature or immature and another indicating spawning and not spawning. The latter considers all “mature” fish with over 25% atresia as not spawning (along with all immature fish). The spawning/not spawning column is the one we commonly use to estimate the maturity curve since that is really what we care about. In some cases a simple logistic will fit, but if there is much skip spawning/atresia for older/larger females, a logistic type curve which asymptotes to a lower value or a non-parametric fit is more appropriate. A column with percent atresia is also provided if you with to use a percentage other than 25% for the cutoff. Finally, the mature/immature column can be used instead if the atresia/skip spawning is taken into account in specifying the fecundity relationship.

An additional column has been added to the NWFSC survey table indicating uncertainty in the designation. This can be used to weight or exclude data.

Note: John Field has expressed concern that we are too focused on recent samples from the NWFSC survey, so if you aren’t going to include samples from past collections, think about a justification for that choice.

2.6 Fecundity

Dick et al. (2017) has estimates for the $a$ and $b$ parameters in the functional form $F = aL^b$ for many rockfish. The estimates are based on length in mm and predicted number of eggs, fitted in log-space. If you use length in cm (like in SS3), and don’t want huge spawning biomass values (eggs), you can convert the values in the paper to units of cm and millions, billions, or trillions of eggs. First, find the values “$\exp(a)$” (referred to as $a'$ below) and “$b$” from Table 6 in Dick et al. for your species (or subgenus, if no estimate for your species is reported). If your subgenus is not reported, you can use the “Sebastes” values.

The correct value of the “$a$” parameter in Stock Synthesis (using fecundity at length option 2: eggs=a*L^b) with length in cm and spawning output in millions of eggs is:

\[a = \frac{a'\cdot10^b}{1000}\]

The division is by 1 thousand instead of 1 million because recruitment in SS3 is in thousands. The value of $b$ is unchanged, and can be used directly in the assessment.

Other options include spawning output in billions of eggs via:

\[a = \frac{a'\cdot10^b}{10^6}\]

or spawning output in trillions of eggs (e.g. 2017 Yellowtail Rockfish):

\[a = \frac{a'\cdot10^b}{10^9}.\]

The {r4ss} figures in the “Bio” tab show the resulting fecundity relationship as a function of length, weight, and age so you can use that to determine whether your parameter values produce the correct final relationship by comparing to original studies on the relationship between fecundity and length.

2.7 Modeling selectivity

There was a CAPAM workshop on selectivity in 2013. Report, presentations, and recordings are available at here and the associated Fisheries Research special issue is here.

2.7.1 Conceptual reasoning for using different approaches to selectivity

The following set of circumstances might cause selectivity to be dome-shaped:

Contact selectivity causing older fish to outswim the trawl, or escape the gillnet/hooks
Incomplete spatial coverage in terms of depth or untrawlable habitat
Spatial heterogeneity in fishing intensity (see Sampson paper). This probably applies more to fishery selectivity than surveys.

Reasons for justifying asymptotic selectivity

It can help estimate $L_\infty$ and variability in growth because the mode of the length comps is often representative of where the oldest fish are piling up.
It prevents the estimation of a large amounts of cryptic biomass

2.7.2 General advice on selectivity in SS3

Start with a functional form that is commonly used (double normal)
Find some initial values using either the Excel spreadsheets or the {r4ss} widgets
Put the initial values in the model and run without estimating anything until you get a model that runs. This can be done by setting the maximum phase in the starter file to 0 or (better), by using the command line inputs: -maxfn 0 -nohess.
Read the output into R using SS_output and plot the selectivity in {r4ss} using either SS_plots(model, plot=2) or SSplotSelex(model, subplot=1), where model is the object created by SS_output().
Set the PHASE for as many parameters to negative values as possible so that you start with estimating relatively few parameters (such as parameters 1 and 3 of the double normal, which control the peak and ascending slope).

2.7.3 Guidelines for SS3 double normal initial setup

Fix parameters 5 and 6 (negative phase).
- If selectivity is thought to be zero at the youngest/smallest or the oldest/biggest fish set the value to zero (e.g -15)
- If selectivity is thought to be larger than zero at the youngest/smallest or the oldest/biggest fish set the value to -999 (does not scale the selectivity for the youngest or oldest age, independently from the normal curve).
Fix the plateau (parameter 2) to be small values (e.g. -15).
Set the initial value for the peak (parameter 1) at the age/length equal to the mode of the composition data
Set the ascending (parameter 3) and descending (parameter 4) slopes at $log(8 \cdot (a_{peak}-a_{min}))$ and $log(8 \cdot (a_{max}-a_{peak}))$ (substitute min and max lengths and length at peak when modeling length-base selectivity).
Don’t estimate selectivity at youngest age/size (parameter 5) unless there are observations of fish in the smallest age- or length-bins, either fix at -5 or -999
Use the double normal instead of the logistic for asymptotic selectivity to have flexibility of dome shape without major changes to control file. This also provides control over selectivity at the youngest age. To force a logistic shape, you can do one of the following (where parameters 2, 4, and 6 should not be estimated under any of the options):
- Fix descending slope (parameter 4) at a large number (e.g. 15)
- Alternative 1: Fix plateau (parameter 2) to a large number (e.g. 15)
- Alternative 2: Fix selectivity of the oldest age (parameter 6) at a large number (e.g. 15).

2.8 Modeling recruitment deviations

Choices to be made:
1. allow recruitment deviations or not?
2. range of years?
3. breaking into “early”, “main”, “late” vectors?
  1. early and late vectors are intended to add uncertainty to the model for years with little or no data with information about recruitment
What was done in the 2011 assessments? Graphical description is here.
Guidance on bias adjustment settings
1. Multiple simulation analyses have shown that applying the bias adjustment settings given by the r4ss::SS_fitbiasramp() perform well on average. However, there’s no guarantee that this will work well in any given circumstance. User discretion is advised.
What to do about $\sigma_R$.
1. Simulation in Methot and Taylor (2011) looked at a few options.
  1. Estimating $\sigma_R$ . Performed well under ideal circumstances.
  2. Tune $\sigma_R$ to match the observed variability in recruitment deviations. \[ {\sigma_R}^2 = \text{sd}(r')^2 \] Performed less well.
  3. Tune $\sigma_R$ so that \[{\sigma_R}^2 = \text{sd}(r')^2 + \text{mean}(\text{SE}(r'))^2\] where $\text{sd}(r')$ is the standard deviation of the vector of estimated recruitments over a range of years that seem reasonably well informed by the data, and $\text{mean}(\text{SE}(r'))$ is the mean of the estimated variability of those values. Performed best.
MCMC
1. If you can get MCMC to work for your model, all the worry about bias adjustment goes away. You would need to either estimate $\sigma_R$ in the MCMC and hope it converges, or fix it at a value that has been tuned to the MLE.
Autocorrelated recruitment deviations
1. Estimate # SR_autocorr (Stock-Recruit parameter in control file)
2. Here are results from hake in 2019 from the ss_new file: -1 1 -0.155106 0 99 0 6 0 0 0 0 0 0 0 # SR_autocorr
3. Johnson et al. (2016) estimated autocorrelation in a simulation context

2.9 Rockfish steepness profile

Up until the 2019 assessment cycle, a meta-analysis approach was used to develop a prior for steepness for West Coast rockfish species. This method was originally developed to develop this prior by Martin Dorn in 2007, and updated by Dorn in 2009/2011. It was then revised and recoded by James Thorson, who updated it in 2013/2015/2017, with Chantel Wetzel conducting the update in 2019. When the meta-analysis was updated post the 2017 assessment cycle, the estimated mean from the prior distribution was considered unreasonably high, particularly for rockfish species, and the new prior was not approved by the SSC for use in future West Coast rockfish assessments. In the instance that the SSC does not approve a new estimate (or methodology), the approved approach reverts to the previous approved estimate. In this instance that is the prior estimated for use in the 2017 assessment cycle of a mean = 0.72 with a standard deviation of 0.158.

Below is detailed information regarding the meta-analysis approach applied between 2007 - 2017:

The method is fully described in Thorson, Dorn, and Hamel (2018) Fisheries Research
The method can be replicated for any data set 2007/2009/2011/2013/2015/2017 using R package {SebastesSteepness}
The rationale for excluding a species when developing a prior for a given focal species (termed “Type-C” meta-analysis) is explained in this paper
- If you are assessing any of the following species you will need to obtain a Type-C prior: aurora, black, bocaccio, canary, chilipepper, darkblotched, gopher, splitnose, widow, yelloweye, yellowtail north
The estimated prior by year was as follows:
- 2007: mean = 0.58, sd = 0.181
- 2009: mean = 0.69, sd = 0.218
- 2011: mean = 0.76, sd = 0.170
- 2013: mean = 0.78, sd = 0.152
- 2015: mean = 0.77, sd = 0.147
- 2017: mean = 0.72, sd = 0.158

2.10 Projections (a.k.a. Forecasts)

The time-varying Pstar for your forecast.ss file can be calculated using PEPtools::get_buffer(),see PEPtools github repo.

The default sigma input to that function is 0.5 for Category 1 stocks and 1.0 for Category 2 but if uncertainty in the model (one of model$Pstar_sigma or model$OFL_sigma as output from r4ss::SS_output()–check with SSC folks on which) is larger, use that instead. The pstar is set by the Council and the adopted value can be found online table GMT015. It is recommended to get confirmation from the GMT representative as well.

Make sure you are using the right control rule for the PFMC in your forecast file: 3 # Control rule method (0: none; 1: ramp does catch=f(SSB), buffer on F; 2: ramp does F=f(SSB), buffer on F; 3: ramp does catch=f(SSB), buffer on catch; 4: ramp does F=f(SSB), buffer on catch)

Fixed catches for first 2 years of forecast (the current year and next) should be provided by GMT rep to STAR Panel. The values are input at the bottome of the SS3 forecast file and should be split by fleet in the model. Adopted OFL/ACL values are in the GMT016 table available from the PacFIN reports dashboard.

Creating the forecast table of OFLs and ABCs (ACLs if the relative stock size is below the management target) for your assessment document you can use r4ss::table_all() for all tables created by r4ss, or r4ss::table_exec_summary() for the exec summary tables including one called “projections”.

For projections with the high and low states of nature used to populate the decision table, you will need to fix the catches in the forecast file equal to those estimated by the harvest control rule in the base model. This can be done with the help of the r4ss::SS_ForeCatch() function which will gather the forecast catches by fleet from the timeseries output of the base model and format them as required to go into the forecast file for the low and high states of nature. For those models, you should set the buffer to 1.0 rather than the values calculated by PEPtools::get_buffer().