Run step-wise model selection to facilitate the exploration of several modelling configurations using Akaike information criterion (AIC).
Usage
StepwiseFn(
SearchMat,
Data,
NDataSets,
KnotAges,
MinAge,
MaxAge,
RefAge,
MaxSd,
MaxExpectedAge,
SaveFile,
EffSampleSize = 0,
Intern = TRUE,
InformationCriterion = c("AIC", "AICc", "BIC"),
SelectAges = TRUE
)
Arguments
- SearchMat
A matrix explaining stepwise model selection options. One row for each readers error and one row for each readers bias + 2 rows, one for
MinusAge
, i.e., the age where the proportion at age begins to decrease exponentially with decreasing age, and one forPlusAge
, i.e., the age where the proportion-at-age begins to decrease exponentially with increasing age.Each element of a given row is a possible value to search across for that reader. So, the number of columns of
SearchMat
will be the maximum number of options that you want to include. Think of it as several vectors stacked row-wise where shorter rows are filled in withNA
values. If reader two only has two options that the analyst wants to search over the remainder of the columns should be filled withNA
values for that row.- Data
This is the data set with the first column being an integer providing the number of otoliths that are included in the row and the subsequent columns are the reader or lab estimated ag,e where each reader/lab has a unique reading error and bias. The modeling framework allows for, at most, 15 readers, i.e., 16 columns. There should not be any identical rows in the data frame because otoliths that have the exact same read from every reader/lab should be combined into a single row with the count as the first column. If you failed to combine identical rows prior to running the model, you will be alerted with an error and the
XXX.rep
file will have a properly formatted data which can be' cut-pasted into aXXX.dat
file for use. Missing reads from a given reader/lab should be entered as-999
. Order your reader/lab columns such that similar readers/labs are located next to one another because columns to the right can mirror columns to their immediate left in terms of parameter estimates.- NDataSets
This is generally
1
and other values are not implemented.- KnotAges
Ages associated with each knot. This is a necessary input for
SigOpt = 5
orSigOpt = 6
.- MinAge
An integer, specifying the minimum possible "true" age.
- MaxAge
An integer, specifying the maximum possible "true" age.
- RefAge
An arbitrarily chosen age from which "true" age-composition fixed-effects are calculated as an offset. This has no effect on the answer but could potentially effect estimation speed.
- MaxSd
An upper bound on possible values for the standard deviation of reading error.
- MaxExpectedAge
Set to MaxAge.
- SaveFile
Directory where
agemat.exe
is located and where all ADMB intermediate and output files should be located. IfAdmbFile
is specified thenagemat.exe
is copied from that directory toSaveFile
.- EffSampleSize
Indicating whether effective sample size should be calculated. Missing values in the data matrix will cause this to be ineffective, in which case this should be set to
0
.- Intern
A logical input that controls the amount of output displayed, where
TRUE
indicates that ADMB output should be displayed in R andFALSE
leads to the suppression of this information.- InformationCriterion
A string specifying the type of information criterion that should be used to choose the best model. The default is to use AIC, though AIC corrected for small sample sizes and BIC are also available.
- SelectAges
A logical input specifying if the boundaries should be based on
MinusAge
andPlusAge
. The default isTRUE
.
Details
AIC seems like an appropriate method to select among possible
values for PlusAge
, i.e., the last row of SearchMat
, because PlusAge
determines the number of estimated fixed-effect hyperparameters that are
used to define the true proportion-at-age hyperdistribution. This
hyperdistribution is in turn used as a prior when integrating across a true
age associated with each otolith. This true age, which is a latent effect,
can be interpreted as a random effect with one for each observation. So, the
use of AIC to select among parameterizations of the fixed effects defining
this hyperdistribution is customary (Pinheiro and Bates, 2009). This was
tested for sablefish, where AIC lead to a true proportion at age that was
biologically plausible.
References
Punt, A.E., Smith, D.C., KrusicGolub, K., and Robertson, S. 2008. Quantifying age-reading error for use in fisheries stock assessments, with application to species in Australia's southern and eastern scalefish and shark fishery. Can. J. Fish. Aquat. Sci. 65: 1991-2005.
Pinheiro, J.C., and Bates, D. 2009. Mixed-Effects Models in S and S-PLUS. Springer, Germany.
See also
RunFn()
will run a single model, where this function runs multiple models.PlotOutputFn()
will help summarize the output fromRunFn()
.
Examples
example(RunFn)
#>
#> RunFn> example(SimulatorFn)
#>
#> SmltrF> # Parameters for generating data
#> SmltrF> # This represents 2 unique readers
#> SmltrF> # Row 1 -- Otoliths read only once by reader
#> SmltrF> # Row 2 -- Otoliths read twice by reader 1
#> SmltrF> # Row 2 -- Otoliths read only once by reader 2
#> SmltrF> # Row 4 -- Otoliths read twice by reader 2
#> SmltrF> # Row 5 -- Otoliths read once by reader 1 and once by reader 2
#> SmltrF> ReadsMat <- structure(matrix(
#> SmltrF+ nrow = 5, ncol = 5,
#> SmltrF+ c(
#> SmltrF+ rep(25, 5),
#> SmltrF+ 1, 1, 0, 0, 1,
#> SmltrF+ 0, 1, 0, 0, 0,
#> SmltrF+ 0, 0, 1, 1, 1,
#> SmltrF+ 0, 0, 0, 1, 0
#> SmltrF+ )
#> SmltrF+ ), dimnames = list(
#> SmltrF+ c(
#> SmltrF+ "Reader1_Only", "Reader1_DoubleReads",
#> SmltrF+ "Reader2_Only", "Reader2_DoubleReads",
#> SmltrF+ "Reader1_&_Reader2"
#> SmltrF+ ),
#> SmltrF+ c(
#> SmltrF+ "NumberOfReads",
#> SmltrF+ "Reader1", "Reader1_DoubleReads",
#> SmltrF+ "Reader2", "Reader2_DoubleReads"
#> SmltrF+ )
#> SmltrF+ ))
#>
#> SmltrF> # Generate data
#> SmltrF> set.seed(2)
#>
#> SmltrF> AgeReads <- SimulatorFn(
#> SmltrF+ Nreaders = 4, M = 0.2,
#> SmltrF+ SelexForm = "Logistic",
#> SmltrF+ SelexParams = c(5, 0.2), BiasParams = c(1, 1, 1.1, 1.1),
#> SmltrF+ ErrorParams = c(0.2, 0.2, 0.2, 0.2), ReadsMat = ReadsMat,
#> SmltrF+ RecCv = 0.6, RecAr1 = 0.8, Amax = 100
#> SmltrF+ )
#>
#> RunFn> ## Not run:
#> RunFn> ##D utils::write.csv(AgeReads,
#> RunFn> ##D file = file.path(getwd(), "Simulated_data_example.csv")
#> RunFn> ##D )
#> RunFn> ## End(Not run)
#> RunFn>
#> RunFn> ##### Format data
#> RunFn> Nreaders <- ncol(AgeReads)
#>
#> RunFn> # Change NA to -999 (which the Punt software considers missing data)
#> RunFn> AgeReads <- ifelse(is.na(AgeReads), -999, AgeReads)
#>
#> RunFn> # Potentially eliminate rows that are only read once
#> RunFn> # These rows have no information about reading error, but are potentially
#> RunFn> # informative about latent age-structure. It is unknown whether eliminating
#> RunFn> # these rows degrades estimation of error and bias, and is currently
#> RunFn> # recommended to speed up computation
#> RunFn> if (FALSE) {
#> RunFn+ KeepRow <- ifelse(
#> RunFn+ rowSums(ifelse(AgeReads == -999, 0, 1), na.rm = TRUE) <= 1,
#> RunFn+ FALSE, TRUE
#> RunFn+ )
#> RunFn+ AgeReads <- AgeReads[KeepRow, ]
#> RunFn+ }
#>
#> RunFn> # AgeReads2 is the correctly formatted data object
#> RunFn> AgeReads2 <- rMx(c(1, AgeReads[1, ]))
#>
#> RunFn> # Combine duplicate rows
#> RunFn> for (RowI in 2:nrow(AgeReads)) {
#> RunFn+ DupRow <- NA
#> RunFn+ for (PreviousRowJ in 1:nrow(AgeReads2)) {
#> RunFn+ if (all(
#> RunFn+ AgeReads[RowI, 1:Nreaders] == AgeReads2[PreviousRowJ, 1:Nreaders + 1]
#> RunFn+ )) {
#> RunFn+ DupRow <- PreviousRowJ
#> RunFn+ }
#> RunFn+ }
#> RunFn+ if (is.na(DupRow)) { # Add new row to AgeReads2
#> RunFn+ AgeReads2 <- rbind(AgeReads2, c(1, AgeReads[RowI, ]))
#> RunFn+ }
#> RunFn+ if (!is.na(DupRow)) { # Increment number of samples for previous duplicate
#> RunFn+ AgeReads2[DupRow, 1] <- AgeReads2[DupRow, 1] + 1
#> RunFn+ }
#> RunFn+ }
#>
#> RunFn> ######## Determine settings for ADMB
#> RunFn> # Define minimum and maximum ages for integral across unobserved ages
#> RunFn> MinAge <- 1
#>
#> RunFn> MaxAge <- ceiling(max(AgeReads2[, -1]) / 10) * 10
#>
#> RunFn> BiasOpt <- c(0, -1, 0, -3)
#>
#> RunFn> SigOpt <- c(1, -1, 6, -3)
#>
#> RunFn> # Necessary for SigOpt option 5 or 6
#> RunFn> KnotAges <- list(NA, NA, c(1, 10, 20, MaxAge), NA)
#>
#> RunFn> ##### Run the model (MAY TAKE 5-10 MINUTES)
#> RunFn> ## Not run:
#> RunFn> ##D fileloc <- file.path(tempdir(), "age")
#> RunFn> ##D dir.create(fileloc, showWarnings = FALSE)
#> RunFn> ##D RunFn(
#> RunFn> ##D Data = AgeReads2, SigOpt = SigOpt, KnotAges = KnotAges,
#> RunFn> ##D BiasOpt = BiasOpt,
#> RunFn> ##D NDataSets = 1, MinAge = MinAge, MaxAge = MaxAge, RefAge = 10,
#> RunFn> ##D MinusAge = 1, PlusAge = 30, SaveFile = fileloc,
#> RunFn> ##D AdmbFile = file.path(system.file("executables",
#> RunFn> ##D package = "nwfscAgeingError"
#> RunFn> ##D ), .Platform$file.sep),
#> RunFn> ##D EffSampleSize = 0, Intern = FALSE, JustWrite = FALSE, CallType = "shell"
#> RunFn> ##D )
#> RunFn> ## End(Not run)
#> RunFn>
#> RunFn>
#> RunFn>
if (FALSE) { # \dontrun{
##### Run the model (MAY TAKE 5-10 MINUTES)
fileloc <- file.path(tempdir(), "age")
dir.create(fileloc, showWarnings = FALSE)
RunFn(
Data = AgeReads2, SigOpt = SigOpt, KnotAges = KnotAges,
BiasOpt = BiasOpt,
NDataSets = 1, MinAge = MinAge, MaxAge = MaxAge, RefAge = 10,
MinusAge = 1, PlusAge = 30, SaveFile = fileloc,
AdmbFile = file.path(system.file("executables",
package = "nwfscAgeingError"
), .Platform$file.sep),
EffSampleSize = 0, Intern = FALSE, JustWrite = FALSE, CallType = "shell"
)
# Plot output
PlotOutputFn(
Data = AgeReads2, MaxAge = MaxAge,
SaveFile = fileloc, PlotType = "PDF"
)
} # }
##### Stepwise selection
# Parameters
MaxAge <- ceiling(max(AgeReads2) / 10) * 10
MinAge <- 1
##### Stepwise selection
StartMinusAge <- 1
StartPlusAge <- 30
# Define matrix explaining stepwise model selection options
# One row for each reader + 2 rows for
# PlusAge (age where the proportion-at-age begins to
# decrease exponentially with increasing age) and
# MinusAge (the age where the proportion-at-age begins to
# decrease exponentially with decreasing age)
# Each element of a given row is a possible value to search
# across for that reader
SearchMat <- array(NA,
dim = c(Nreaders * 2 + 2, 7),
dimnames = list(
c(
paste("Error_Reader", 1:Nreaders),
paste("Bias_Reader", 1:Nreaders), "MinusAge", "PlusAge"
),
paste("Option", 1:7)
)
)
# Readers 1 and 3 search across options 1-3 for ERROR
SearchMat[c(1, 3), 1:3] <- rep(1, 2) %o% c(1, 2, 3)
# Reader 2 mirrors reader 1
SearchMat[2, 1] <- -1
# Reader 4 mirrors reader 3
SearchMat[4, 1] <- -3
# Reader 1 has no BIAS
SearchMat[5, 1] <- 0
# Reader 2 mirrors reader 1
SearchMat[6, 1] <- -1
# Reader 3 search across options 0-2 for BIAS
SearchMat[7, 1:3] <- c(1, 2, 0)
# Reader 4 mirrors reader 3
SearchMat[8, 1] <- -3
# MinusAge searches with a search kernal of -10,-4,-1,+0,+1,+4,+10
SearchMat[9, 1:7] <- c(
StartMinusAge,
StartMinusAge - 10,
StartMinusAge - 4,
StartMinusAge - 1,
StartMinusAge + 1,
StartMinusAge + 4,
StartMinusAge + 10
)
SearchMat[9, 1:7] <- ifelse(SearchMat[9, 1:7] < MinAge,
NA, SearchMat[9, 1:7]
)
# PlusAge searches with a search kernal of -10,-4,-1,+0,+1,+4,+10
SearchMat[10, 1:7] <- c(
StartPlusAge,
StartPlusAge - 10,
StartPlusAge - 4,
StartPlusAge - 1,
StartPlusAge + 1,
StartPlusAge + 4,
StartPlusAge + 10
)
SearchMat[10, 1:7] <- ifelse(SearchMat[10, 1:7] > MaxAge,
NA, SearchMat[10, 1:7]
)
# Run model selection
# This outputs a series of files
# 1. "Stepwise - Model loop X.txt" --
# Shows the AIC/BIC/AICc value for all different combinations
# of parameters arising from changing one parameter at a time
# according to SearchMat during loop X
# 2. "Stepwise - Record.txt" --
# The Xth row of IcRecord shows the record of the
# Information Criterion for all trials in loop X,
# while the Xth row of StateRecord shows the current selected values
# for all parameters at the end of loop X
# 3. Standard plots for each loop
# WARNING: One run of this stepwise model building example can take
# 8+ hours, and should be run overnight
if (FALSE) { # \dontrun{
StepwiseFn(
SearchMat = SearchMat, Data = AgeReads2,
NDataSets = 1, MinAge = MinAge, MaxAge = MaxAge,
RefAge = 10, MaxSd = 40, MaxExpectedAge = MaxAge + 10,
SaveFile = fileloc, InformationCriterion = c("AIC", "AICc", "BIC")[3]
)
} # }