cleanPacFIN.Rd
Clean raw PacFIN data to remove unsuitable samples if CLEAN = TRUE
and
convert units of measured quantities to work with downstream functions.
Raw data are meant to be inclusive of everything from PacFIN so users can
explore all that is available, but this means that raw data will ALWAYS
include information that is not appropriate for use in
US West Coast stock assessments.
cleanPacFIN(
Pdata,
keep_INPFC = lifecycle::deprecated(),
keep_gears,
keep_sample_type = c("M"),
keep_sample_method = "R",
keep_length_type,
keep_age_method = NULL,
keep_missing_lengths = lifecycle::deprecated(),
keep_states = c("WA", "OR", "CA"),
CLEAN = TRUE,
spp = NULL,
verbose = TRUE,
savedir
)
A data frame of biological samples
originating from the
Pacific Fishieries Information Network (PacFIN) data warehouse,
which originated in 2014. Data are pulled using sql calls, see
PullBDS.PacFIN()
.
Deprecated. Areas are now defined using different methods.
A character vector including only the gear types you want
to label as unique fleets. Order matters and will define fleet numbering.
If the argument is missing, which is the default, then all found gear groups
are maintained and ordered alphabetically. For more details see
getGearGroup that lists a web link for where you can find the
available gear groupings and how they link to "GRID"
within your data.
GRID is a legacy term from PacFIN, now identified as
PACFIN_GEAR_CODE in the biological and fish ticket data, where
GR is short for gear and
ID is short for identification.
Typical entries will include character values such as HKL
, POT
, TWL
,
where the latter is short for all non-shrimp trawls and
TWS
is shrimp trawls.
Other gear identification codes and their definitions include
DRG
which is dredge gear,
MSC
which is all other miscellaneous gear such as diving or river trawls,
NET
which is all non-trawl net gear,
NTW
which is non-trawl gear, and
TLS
which is trolling gear.
As a special case, MID
is available for spiny dogfish to extract
mid-water trawl data as a separate fleet.
A vector of character values specifying the types of
samples you want to keep. The default is to keep c("M")
. Available
types include market (M), research (R), special request (S), and
commercial on-board (C). There are additional samples without a SAMPLE_TYPE
,
but they are only kept if you include NA
in your call.
All sample types from California are assigned to M
.
Including commercial on-board samples is not recommended because
they might also be in WCGOP data and would lead to double counting.
A vector of character values specifying the types of
sampling methods you want to keep. The default is to keep "R"
, which
refers to samples that were sampled randomly. Available types include
random (R), stratified (S), systematic (N), purposive (P), and special (X).
As of February 17, 2021,
Washington is the only state with a sample type of ""
, and it was limited
to two special samples of yelloweye rockfish.
A vector of character values specifying the types of
length samples to keep. There is no default value, though users will typically
want to keep c("", "F", "A")
, but should also think about using
c("", "F", "A", NA)
. Note that types other than those listed below
can be present, especially if you are dealing with a skate.
A
is alternate length,
D
is dorsal length,
F
is fork length,
S
is standard length, and
T
is total length.
A vector of ageing methods to retain in the data. All fish
aged with methods other than those listed will no longer be considered aged.
A value of NULL
, the default, will keep all ageing methods. However,
a vector of c("B", "BB", S", "", NA, 1, 2)
will keep all unaged fish and those
that were aged with break and burn and surface reads. You do not really need
to include such a verbose vector of values though because numbers are converted
to appropriate character codes in getAge. Therefore, something like
c("B", "S")
would be sufficient to keep all break and burn and surface reads.
Deprecated. Just subset them using
is.na(Pdata[, 'length']) after running
cleanPacFIN` if you want to remove
lengths, though there is no need because the package accommodates keeping them in.
A vector of states that you want to keep, where each state
is defined using a two-letter abbreviation, e.g., WA
. The default is to keep
data from all three states, keep_states = c("WA", "OR", "CA")
.
Add 'UNK'
to the vector if you want to keep data not assigned to a state.
A logical value used when you want to remove data from the input
data set. The default is TRUE
. Where the opposite returns the original
data with additional columns and reports on what would have been removed.
A character string giving the species name to
ensure that the methods are species specific. Leave NULL
if generic methods work for your species.
Currently, sablefish is the only species with species-specific code.
A logical specifying if output should be written to the
screen or not. Good for testing and exploring your data but can be turned
off when output indicates information that you already know. The printing
of output to the screen does not affect any of the returned objects. The
default is to always print to the screen, i.e., verbose = TRUE
.
A file path to the directory where the results will be saved. The default is the current working directory. The path can be relative or absolute.
The input data filtered for desired areas and record types specified, with added columns
year: initialized from SAMPLE_YEAR
fleet: initialized to 1
fishery: initialized to 1
season: initialized to 1. Change using getSeason
state: initialized from SOURCE_AGID. Change using getState
length: length in mm, where NA
indicates length is not available
lengthcm: floored cm from FORK_LENGTH when available, otherwise FISH_LENGTH
geargroup: the gear group associated with each GRID
weightkg: fish weight in kg from FISH_WEIGHT and FISH_WEIGHT_UNITS
The original fields in the returned data are left untouched, with the exception of
SEX
: modified using nwfscSurvey::codify_sex()
and upon return will
only include character values such that fish with an unidentified sex are
now "U"
.
Age: the best ages to use going forward rather than just the first age read.
The data are put through various tests before they are returned
and the results of these tests are stored in the CLEAN
column.
Thus, sometimes it is informative to run cleanPacFIN(CLEAN = FALSE)
and use frequency tables to inspect which groups of data will be removed
from the data set when you change the code to be CLEAN = FALSE
.
For example, many early length compositions do not have information on
the weight of fish that were sampled, and thus, there is no way to infer
how much the entire sample weighed or how much the tow/trip weighed.
Therefore, these data cannot be expanded and are removed using
CLEAN = TRUE
. Some stock assessment authors or even previous
versions of this very code attempted to use adjacent years to inform
weights. The number of assumptions for this was great and state
representatives discouraged inferring data that did not exist.
The values created as new columns are for use by other functions in this package.
In particular, fishyr
and season
are useful if there are multiple
seasons (e.g., winter and summer, as in the petrale sole assessment), and the
year is adjusted so that "winter" occurs in one year, rather than across two.
The fleet
, fishery
, and state
columns are meant for use in
stratifying the data according to the particulars of an assessment.