source
getState.Rd
Create a categorical column that specifies which state each row is from.
getState(Pdata, source = c("AGENCY_CODE", "SOURCE_AGID"), verbose = TRUE)
A data frame of biological samples
originating from the
Pacific Fishieries Information Network (PacFIN) data warehouse,
which originated in 2014. Data are pulled using sql calls, see
PullBDS.PacFIN()
.
The column name where state information is located in
Pdata
. See the function call for options, where only the first
value will be used.
A logical specifying if output should be written to the
screen or not. Good for testing and exploring your data but can be turned
off when output indicates information that you already know. The printing
of output to the screen does not affect any of the returned objects. The
default is to always print to the screen, i.e., verbose = TRUE
.
The input data frame is returned with an additional column,
state
, which is filled with two-character values identifying the state or
a three-character value UNK
for all rows that do not have an assigned state.
All rows are returned, but users should pay attention to the warning that is
returned for rows that have no state id.
With the creation of the comprehensive bds table in PacFIN, the column called
SAMPLE_AGENCY
was deprecated; more specifically, the column is
available but filled with NULL
values.
Thus, PacFIN.Utilities no longer maintains SAMPLE_AGENCY
that was being converted
to SOURCE_AGID
and all identification of
state records should be based on AGENCY_CODE
or SOURCE_AGID
, where the latter is
just the converted column name (see cleanColumns).
AGENCY_CODE
is a column created by PacFIN to identify which agency provided
the data. In the 2019 sablefish data there were four unique values in AGENCY_CODE
,
C - CDFW
O - ODFW
W - WDFW
M - SAMPLE_AGENCY == NMFS Tiburon
samples from 1997.
None of the M
samples were in the comprehensive table as of 2021.
It is no longer advisable as of February 14, 2021 to create states based on
PSMFC_CATCH_AREA_CODE
or PSMFC_ARID
because areas are not mutually
exclusive to a state. Previous code set areas 1[a-z]
to Washington,
2[a-z]
to Oregon, and 3[a-z]
to California.
The PacFIN documentation
suggests that the following area codes can be assigned to the following states:
WA: 1C, 2A, 2B, 2C, 2E, 2F, 3A, 3B, 3C, 3N, 3S
OR: 1C, 2A, 2B, 2C, 2E, 2F, 3A, 3B, CS
CA: 1A, 1B, 1C Rather than supporting one or the other, users are now left to deciphering which PSMFC areas they want to assign to which states on there own. Hopefully, the guidance above will be helpful. If you see the need to use PSMFC_CATCH_AREA_CODE to set states please contact the package maintainer.
cleanPacFIN calls getState
.
data <- data.frame(
AGENCY_CODE = rep(c("W", "O", "C"), each = 2),
info = 1:6
)
testthat::expect_true(
all(getState(data)[["state"]] == rep(c("WA", "OR", "CA"), each = 2))
)
#>
#> There are 0 records for which the state (i.e., 'CA', 'OR', 'WA')
#> could not be assigned and were labeled as 'UNK'.
#>
#> CA OR WA
#> 2 2 2