Pull biological data from PacFIN (PACFIN_MARTS.comprehensive_bds_comm).

PullBDS.PacFIN(
  pacfin_species_code,
  username = getUserName("PacFIN"),
  password = ask_password(),
  savedir = getwd(),
  verbose = TRUE
)

Arguments

pacfin_species_code

A vector of strings specifying the PacFIN species code(s) you are interested in. This has sometimes been referred to as "SPID" in legacy sql scripts. An example for sablefish would be pacfin_species_code = "SABL". Lists of species codes in hierarchical order, by organization, and alphabetically organized can be found on the PacFIN website. Often you will want to include nominal species categories. Where, nominal (i.e., existing in name only) means information for a given species that is "derived" from non-species specific information, e.g., species complexes that are split out by species compositions like "nominal aurora rockfish" which would be ARR1. For some functions, these nominal categories can automatically be added, see the argument addnominal.

username

Most often, this is a string containing your username for the database of interest. You can use getUserName() if you prefer to not enter this argument and assume the default search and/or rules for finding your username will work. This is the default behavior if you leave username as a missing argument, i.e., username <- getUserName(database = database). Sometimes this search will fail because of legacy rules, which are unknown to the development team, that were used to create your username. Please email the maintainer of this package if you need more functionality here.

password

Most often, this is a string containing your password for the database of interest. You can use the function ask_password() if you would prefer to be prompted for your password. Please do not share this password with anyone or push code to a repository that has your password saved in it.

savedir

A file path to the directory where the results will be saved. The default is the current working directory. The path can be relative or absolute.

verbose

A logical specifying if output should be written to the screen or not. Good for testing and exploring your data but can be turned off when output indicates information that you already know. The printing of output to the screen does not affect any of the returned objects. The default is to always print to the screen, i.e., verbose = TRUE.

Value

An RData file is saved to the disk and the pulled data are returned as an invisible() data frame. The saved data can be read back in using load(), but note that upon loading, the object will be named bds.pacfin, which is its name inside of the .RData file, and thus, the object will retain this name within your work space unless you rename it. The data are in their raw, form i.e., just as they were extracted from PacFIN, form and will need to be cleaned prior to their use in downstream functions using cleanPacFIN().

Details

Data structure

Upon downloading, the data are changed from a long table to a wide table using the combination of unique FISH_ID and AGE_SEQUENCE_NUMBER. This change from long to wide allows for rows equating to a single fish with columns containing information about all measurements for that fish. Multiple age reads and information about those reads such as age reader will be in the columns. The age read number, e.g., 1, 2, 3, 4, ..., is pasted onto the column name separated by an underscore. So, the maximum number you see is the maximum number of times an otolith was read in your data set. Not all double reads are currently available within PacFIN and users should contact the ageing labs if they wish to inform ageing-error matrices.

AGE_COUNT is a somewhat cryptic column name and does not always make sense when compared to AGE_SEQUENCE_NUMBER. It was determined that the former is useful to identify how many potential agers were exposed to this fish. For example, if AGE_SEQUENCE_NUMBER has a maximum value of three for a given FISH_ID, then you can expect AGE_COUNT to be three for all three rows in the PacFIN database for that fish. This is not always true though. Sometimes, not all AGE_SEQUENCE_NUMBERs are present and they can skip numbers for a given FISH_ID, and in this case, AGE_COUNT will be the maximum AGE_SEQUENCE_NUMBER for a given FISH_ID.

FINAL_FISH_AGE_IN_YEARS is known as the best age for a given fish. This will not always match an age reader or be a number determinable from the individual age reads in AGE_IN_YEARS. Patrick explained to me that when age reads do not agree, particularly for younger fish, then the senior reader will work together with the junior reader to determine an agreed-upon age. Other times, the senior reader's value will always be used, or it could be that together they determine that they were both wrong and a new age is proposed as the resolved age. Nevertheless, it can be quite messy and there is no way to predict the best age.

FISH_WEIGHT_GUTTED is typically only available for a small subset of samples that were sampled "purposively" by Washington state. E.g., if a fish is weighed whole and then headed and gutted and weighed again, then there would be two rows with the same FISH_ID but different FISH_WEIGHT entries in the PacFIN BDS table. The downloaded data are reshaped such that this second gutted weight is placed in FISH_WEIGHT_GUTTED and the fish is represented in a single row. Granted, these purposive samples should not be used in an assessment of the population status but they are included in the download for completeness.

Searching for species

Values passed to PACFIN_SPECIES_CODE are searched for using regular expression matching, which is different than the exact matching that is done is PullCatch.PacFIN(). The use of pattern matching allows for species codes with mistakes like leading and trailing spaces to be found. This is doable in the biological data because data for nominal species codes are few. In my experiences these mistakes in the species codes are more common for PacFIN species codes that are three letters rather than the standard four letters.

See also

  • cleanColumns() to change to legacy column names

  • cleanPacFIN() to subset the data frame to those records that should be used within West Coast assessments of marine populations

Author

John R. Wallace and Kelli F. Johnson

Examples

if (FALSE) { # \dontrun{
# You will be asked for your password
pd <- PullBDS.PacFIN(pacfin_species_code = "POP")
} # }