PullBDS.PacFIN.Rd
Pull biological data from PacFIN (PACFIN_MARTS.comprehensive_bds_comm
).
PullBDS.PacFIN(
pacfin_species_code,
username = getUserName("PacFIN"),
password = ask_password(),
savedir = getwd(),
verbose = TRUE
)
A vector of strings specifying the PacFIN species
code(s) you are interested in. This has sometimes been referred to as
"SPID"
in legacy sql scripts. An example for sablefish would be
pacfin_species_code = "SABL"
. Lists of species codes in hierarchical order,
by
organization,
and alphabetically organized
can be found on the PacFIN website. Often you
will want to include nominal species categories. Where, nominal (i.e.,
existing in name only)
means information for a given species that is "derived" from non-species
specific information, e.g., species complexes that are split out by species
compositions like "nominal aurora rockfish" which would be ARR1
. For some
functions, these nominal categories can automatically be added, see the
argument addnominal
.
Most often, this is a string containing your username for the
database of interest. You can use getUserName()
if you prefer to not
enter this argument and assume the default search and/or rules for finding
your username will work. This is the default behavior if you leave
username
as a missing argument, i.e., username <- getUserName(database = database)
. Sometimes this search will fail because of legacy rules, which
are unknown to the development team, that were used to create your
username. Please email the maintainer of this package if you need more
functionality here.
Most often, this is a string containing your password for
the database of interest. You can use the function ask_password()
if you
would prefer to be prompted for your password. Please do not share this
password with anyone or push code to a repository that has your password
saved in it.
A file path to the directory where the results will be saved. The default is the current working directory. The path can be relative or absolute.
A logical specifying if output should be written to the
screen or not. Good for testing and exploring your data but can be turned
off when output indicates information that you already know. The printing
of output to the screen does not affect any of the returned objects. The
default is to always print to the screen, i.e., verbose = TRUE
.
An RData
file is saved to the disk and the pulled data are returned as an
invisible()
data frame. The saved data can be read back in using load()
,
but note that upon loading, the object will be named bds.pacfin
, which is
its name inside of the .RData
file, and thus, the object will retain this
name within your work space unless you rename it. The data are in their raw,
form i.e., just as they were extracted from PacFIN, form and will need to be
cleaned prior to their use in downstream functions using cleanPacFIN()
.
Upon downloading, the data are changed from a long table to a wide table
using the combination of unique FISH_ID
and AGE_SEQUENCE_NUMBER
. This
change from long to wide allows for rows equating to a single fish with
columns containing information about all measurements for that fish. Multiple
age reads and information about those reads such as age reader will be in the
columns. The age read number, e.g., 1, 2, 3, 4, ..., is pasted onto the
column name separated by an underscore. So, the maximum number you see is the
maximum number of times an otolith was read in your data set. Not all double
reads are currently available within PacFIN and users should contact the
ageing labs if they wish to inform ageing-error matrices.
AGE_COUNT
is a somewhat cryptic column name and does not always make sense
when compared to AGE_SEQUENCE_NUMBER
. It was determined that the former is
useful to identify how many potential agers were exposed to this fish.
For example, if AGE_SEQUENCE_NUMBER
has a maximum value of three for a
given FISH_ID
, then you can expect AGE_COUNT
to be three for all three
rows in the PacFIN database for that fish. This is not always true though.
Sometimes, not all AGE_SEQUENCE_NUMBER
s are present and they can skip
numbers for a given FISH_ID
, and in this case, AGE_COUNT
will be the
maximum AGE_SEQUENCE_NUMBER
for a given FISH_ID
.
FINAL_FISH_AGE_IN_YEARS
is known as the best age for a given fish.
This will not always match an age reader or be a number determinable
from the individual age reads in AGE_IN_YEARS
. Patrick explained to me
that when age reads do not agree, particularly for younger fish, then
the senior reader will work together with the junior reader to determine
an agreed-upon age. Other times, the senior reader's value will always
be used, or it could be that together they determine that they were both
wrong and a new age is proposed as the resolved age
. Nevertheless,
it can be quite messy and there is no way to predict the best age.
FISH_WEIGHT_GUTTED
is typically only available for a small subset of
samples that were sampled "purposively" by Washington state. E.g., if a
fish is weighed whole and then headed and gutted and weighed again, then
there would be two rows with the same FISH_ID
but different FISH_WEIGHT
entries in the PacFIN BDS table. The downloaded data are reshaped such that
this second gutted weight is placed in FISH_WEIGHT_GUTTED
and the fish is
represented in a single row. Granted, these purposive samples should not be
used in an assessment of the population status but they are included in the
download for completeness.
Values passed to PACFIN_SPECIES_CODE
are searched for using regular
expression matching, which is different than the exact matching that is done
is PullCatch.PacFIN()
. The use of pattern matching allows for species codes
with mistakes like leading and trailing spaces to be found. This is doable in
the biological data because data for nominal species codes are few. In my
experiences these mistakes in the species codes are more common for PacFIN
species codes that are three letters rather than the standard four letters.
cleanColumns()
to change to legacy column names
cleanPacFIN()
to subset the data frame to those records that should be
used within West Coast assessments of marine populations
if (FALSE) { # \dontrun{
# You will be asked for your password
pd <- PullBDS.PacFIN(pacfin_species_code = "POP")
} # }