Load/Clean Excel sheet — loadExcel • baytrends

Load and clean one sheet from an Excel file

loadExcel(
  file = NA,
  sheet = 1,
  folder = ".",
  pk = NA,
  remDup = TRUE,
  remNAcol = TRUE,
  remNArow = TRUE,
  convDates = TRUE,
  tzSel = "America/New_York"
)

Arguments

file: file (can use wildcards, e.g., "*.xlsx")
sheet: worksheet name to load from Excel file
folder: folder (i.e., directory to look in, can use relative path )
pk: vector of columns that form the primary key for data set
remDup: logical field indicating whether duplicate rows are deleted
remNAcol: logical field indicating whether columns with all NA are deleted
remNArow: logical field indicating whether rows with all NA are deleted
convDates: vector or logical field indicating whether date-like columns should be converted to POSIXct format (see details)
tzSel: time zone to use for date conversions (default: "America/New_York")

Value

Returns data frame

Details

This function reads in a single sheet from an Excel file using readxl::read_excel to load the data

After reading data in with readxl::read_excel, some specific additional steps are implemented:

1. Double quotes are removed from beginning and ending of all fields. The purpose is to maintaion leading zeroes (e.g., USGS flow gages). To effectively use this functionality, data maintained in a spreadsheet would be enclosed in quotes (e.g., "01578310"). If exported to csv or txt files the field would be in triple quotes (e.g., """01578310"""). Any column read in as integer is converted to numeric.

2. Rows and columns with no data (i.e., all NA) are deleted unless default settings for remNAcol and remNArow are changed to FALSE.

3. Completely duplicate rows are deleted unless default setting for remDup is changed to FALSE.

4. If a primary key (either single or multiple columns) is selected, the function enforces the primary key by deleting duplicate entries based on the primary key. Columns corresponding to the primary key (when specified) are moved to the first columns.

5. If convDates is a vector (i.e., c('beginDate', 'endDate')), then a date conversion to as.POSIXct is attempted for the corresponding columns found in the input file. If TRUE, then a date conversion is attempted for all columns found in the input file with 'date' in the name, If FALSE, no date conversion is attempted.

Some other common time zones include the following: America/New_York, America/Chicago, America/Denver, America/Los_Angeles, America/Anchorage, America/Honolulu, America/Jamaica, America/Managua, America/Phoenix, America/Metlakatla

A brief table reporting the results of the import are printed.

Note that columns containing just F, T, FALSE, TRUE are stored as logical fields