cleanPacFIN.Rd
Clean raw PacFIN data to remove unsuitable samples if CLEAN = TRUE
and
convert units of measured quantities to work with downstream functions.
Raw data are meant to be inclusive of everything from PacFIN so users can
explore all that is available, but this means that raw data will ALWAYS
include information that is not appropriate for use in
US West Coast stock assessments.
cleanPacFIN(
Pdata,
keep_INPFC = lifecycle::deprecated(),
keep_gears,
keep_sample_type = c("M"),
keep_sample_method = "R",
keep_length_type,
keep_age_method = NULL,
keep_missing_lengths = lifecycle::deprecated(),
keep_states = c("WA", "OR", "CA"),
CLEAN = TRUE,
spp = NULL,
verbose = TRUE,
savedir = NULL
)
A data frame returned from PullBDS.PacFIN()
containing
biological samples. These data are stored in the Pacific Fishieries
Information Network (PacFIN) data warehouse, which originated in 2014 and
are pulled using sql calls.
Deprecated. Areas are now defined using different methods.
A character vector including the gear types you want
to label as unique fleets. Order matters and will define fleet numbering.
If the argument is missing, which is the default, then all found gear
groups are maintained and ordered alphabetically. For more details see
getGearGroup()
, which lists a link where you can find the available gear
groupings and how they link to "GRID"
within your data. The vector
supplied to this argument should consist of only options available in
unique(GearTable[["GROUP"]])
.
GRID
is a legacy term from PacFIN, now identified as PACFIN_GEAR_CODE
in the biological and fish ticket data, where GR is short for gear and ID
is short for identification. Typical entries will include character values
such as HKL
, POT
, TWL
, where the latter is short for all non-shrimp
trawls and TWS
is shrimp trawls. Other gear identification codes and
their definitions include DRG
which is dredge gear, MSC
which is all
other miscellaneous gear such as diving or river trawls, NET
which is all
non-trawl net gear, NTW
which is non-trawl gear, and TLS
which is
trolling gear. As a special case, MID
is available for spiny dogfish to
extract mid-water trawl data as a separate fleet.
A vector of character values specifying the types of
samples you want to keep. The default is to keep c("M")
. Available types
include market (M), research (R), special request (S), and commercial
on-board (C). There are additional samples without a SAMPLE_TYPE
, but
they are only kept if you include NA
in your call. All sample types from
California are assigned to M
. Including commercial on-board samples is
not recommended because they might also be in WCGOP data and would lead to
double counting.
A vector of character values specifying the types
of sampling methods you want to keep. The default is to keep "R"
,
which refers to samples that were sampled randomly. Available types include
random (R), stratified (S), systematic (N), purposive (P), and special (X).
As of February 17, 2021, Washington is the only state with a sample type of
""
, and it was limited to two special samples of yelloweye rockfish.
A vector of character values specifying the types of
length samples to keep. There is no default value, though users will
typically want to keep c("", "F", "A")
, but should also think about using
c("", "F", "A", NA)
. Note that types other than those listed below can be
present, especially if you are dealing with a skate.
A
is alternate length,
D
is dorsal length,
F
is fork length,
S
is standard length, and
T
is total length.
A vector of ageing methods to retain in the data. All
fish aged with methods other than those listed will no longer be considered
aged. A value of NULL
, the default, will keep all ageing methods.
However, a vector of c("B", "BB", S", "", NA, 1, 2)
will keep all unaged
fish and those that were aged with break and burn and surface reads. You do
not really need to include such a verbose vector of values though because
numbers are converted to appropriate character codes in getAge.
Therefore, something like c("B", "S")
would be sufficient to keep all
break and burn and surface reads.
Deprecated. Just subset them using
is.na(Pdata[, 'length']) after running
cleanPacFIN` if you want to remove
lengths, though there is no need because the package accommodates keeping
them in.
A vector of states that you want to keep, where each state
is defined using a two-letter abbreviation, e.g., WA
. The default is to
keep data from all three states, keep_states = c("WA", "OR", "CA")
. Add
'UNK'
to the vector if you want to keep data not assigned to a state.
A logical value used when you want to remove data from the input
data set. The default is TRUE
. Where the opposite returns the original
data with additional columns and reports on what would have been removed.
A character string giving the species name to ensure that the
methods are species specific. Leave NULL
if generic methods work for
your species. Currently, sablefish is the only species with
species-specific code.
A logical specifying if output should be written to the
screen or not. Good for testing and exploring your data but can be turned
off when output indicates information that you already know. The printing
of output to the screen does not affect any of the returned objects. The
default is to always print to the screen, i.e., verbose = TRUE
.
A file path to the directory where the results will be saved. The default is NULL.
The input data filtered for desired areas and record types specified, with added columns
year: initialized from SAMPLE_YEAR
fleet: initialized to 1
fishery: initialized to 1
season: initialized to 1. Change using getSeason
state: initialized from SOURCE_AGID. Change using getState
length: length in mm, where NA
indicates length is not available
lengthcm: floored cm from FORK_LENGTH when available, otherwise FISH_LENGTH
geargroup: the gear group associated with each GRID
weightkg: fish weight in kg from FISH_WEIGHT and FISH_WEIGHT_UNITS
The original fields in the returned data are left untouched, with the exception of
SEX
: modified using nwfscSurvey::codify_sex()
and upon return will
only include character values such that fish with an unidentified sex are
now "U"
.
Age: the best ages to use going forward rather than just the first age read.
The data are put through various tests before they are returned
and the results of these tests are stored in the CLEAN
column.
Thus, sometimes it is informative to run cleanPacFIN(CLEAN = FALSE)
and use frequency tables to inspect which groups of data will be removed
from the data set when you change the code to be CLEAN = FALSE
.
For example, many early length compositions do not have information on
the weight of fish that were sampled, and thus, there is no way to infer
how much the entire sample weighed or how much the tow/trip weighed.
Therefore, these data cannot be expanded and are removed using
CLEAN = TRUE
. Some stock assessment authors or even previous
versions of this very code attempted to use adjacent years to inform
weights. The number of assumptions for this was great and state
representatives discouraged inferring data that did not exist.
The values created as new columns are for use by other functions in this package.
In particular, fishyr
and season
are useful if there are multiple
seasons (e.g., winter and summer, as in the petrale sole assessment), and the
year is adjusted so that "winter" occurs in one year, rather than across two.
The fleet
, fishery
, and state
columns are meant for use in
stratifying the data according to the particulars of an assessment.