Create a categorical column that specifies which state each row is from.

getState(Pdata, source = c("AGENCY_CODE", "SOURCE_AGID"), verbose = TRUE)

Arguments

Pdata

A data frame of biological samples originating from the Pacific Fishieries Information Network (PacFIN) data warehouse, which originated in 2014. Data are pulled using sql calls, see PullBDS.PacFIN().

source

The column name where state information is located in Pdata. See the function call for options, where only the first value will be used.

verbose

A logical specifying if output should be written to the screen or not. Good for testing and exploring your data but can be turned off when output indicates information that you already know. The printing of output to the screen does not affect any of the returned objects. The default is to always print to the screen, i.e., verbose = TRUE.

Value

The input data frame is returned with an additional column, state, which is filled with two-character values identifying the state or a three-character value UNK for all rows that do not have an assigned state. All rows are returned, but users should pay attention to the warning that is returned for rows that have no state id.

Details

With the creation of the comprehensive bds table in PacFIN, the column called SAMPLE_AGENCY was deprecated; more specifically, the column is available but filled with NULL values. Thus, PacFIN.Utilities no longer maintains SAMPLE_AGENCY that was being converted to SOURCE_AGID and all identification of state records should be based on AGENCY_CODE or SOURCE_AGID, where the latter is just the converted column name (see cleanColumns). AGENCY_CODE is a column created by PacFIN to identify which agency provided the data. In the 2019 sablefish data there were four unique values in AGENCY_CODE,

  • C - CDFW

  • O - ODFW

  • W - WDFW

  • M - SAMPLE_AGENCY == NMFS Tiburon samples from 1997. None of the M samples were in the comprehensive table as of 2021.

It is no longer advisable as of February 14, 2021 to create states based on PSMFC_CATCH_AREA_CODE or PSMFC_ARID because areas are not mutually exclusive to a state. Previous code set areas 1[a-z] to Washington, 2[a-z] to Oregon, and 3[a-z] to California. The PacFIN documentation suggests that the following area codes can be assigned to the following states:

  • WA: 1C, 2A, 2B, 2C, 2E, 2F, 3A, 3B, 3C, 3N, 3S

  • OR: 1C, 2A, 2B, 2C, 2E, 2F, 3A, 3B, CS

  • CA: 1A, 1B, 1C Rather than supporting one or the other, users are now left to deciphering which PSMFC areas they want to assign to which states on there own. Hopefully, the guidance above will be helpful. If you see the need to use PSMFC_CATCH_AREA_CODE to set states please contact the package maintainer.

See also

cleanPacFIN calls getState.

Examples

data <- data.frame(
  AGENCY_CODE = rep(c("W", "O", "C"), each = 2),
  info = 1:6
)
testthat::expect_true(
  all(getState(data)[["state"]] == rep(c("WA", "OR", "CA"), each = 2))
)
#> 
#> There are 0 records for which the state (i.e., 'CA', 'OR', 'WA')
#> could not be assigned and were labeled as 'UNK'.
#> 
#> CA OR WA 
#>  2  2  2