This function creates a seasonally detrended salinity data set for selected stations. The created data set is used to support application of GAMs that include a hydrologic term as one of the independent variables. The output from this function should be stored as an .rda file for repeated use with baytrends.
detrended.salinity(
df.sal,
dvAvgWinSel = 30,
lowess.f = 0.2,
minObs = 40,
minObs.sd = 10
)
data frame with salinty data (required variables in data frame are: station, date, layer, and salinity)
Averaging window (days) selection for pooling data to compute summary statistics
lowess smoother span applied to computed standard deviation (see Details). This gives the proportion of points which influence the smooth at each value. Larger values give more smoothness.
Minimum number of observations for performing analysis (default is 40)
Minimum number of observations in averaging window for calculation of the standard deviation (default is 10)
Returns a list of seasonally detrended salinity data. You should save the resulting list as salinity.detrended for use with baytrends. This function also creates diagnostic plots that can be saved to a report when this function is called from an .Rmd script.
This function returns a list of seasonally detrended salinity and companion statistics; and relies on a user supplied data frame that contains the following variables: station, date, layer, and salinity. See structure of sal data in example below.
It is the user responsibility to save the resulting list as salinity.detrended for integration with baytrends.
For the purposes of baytrends, it is expected that the user would identify a data set with all salinity data that are expected to be evaluated so that a single data file is created. The following computation steps are performed:
1) Extract the list of stations, minimum year, and maximum year in data set. Initialize the salinity.detrended list with this information along with meta data documenting the retrieval parameters.
2) Downselect the input data frame to only include data where the layer is equal to 'S', 'AP', 'BP' or 'B'.
3) Average the 'S' and 'AP' salinity data; and the 'B' and 'BP salinity data together to create average salinity values for SAP (surface and above pycnocline) and BBP (bottom and below pycnocline), respectively. These values are stored as the variables, salinity.SAP and salinity.BBP together with the date and day of year (doy) in a data frame corresponding to the station ID.
4) For each station/layer combination with atleast minObs observations, a seasonal GAM, i.e., gamoutput <- gam(salinity ~ s(doy, bs='cc')) is evaluated and the predicted values stored in the above data frame as salinity.SAP.gam and salinity.BBP.gam.
5) The GAM residuals, i.e., "residuals(gamoutput)" are extracted and stored as the variable, SAP or BBP in the above data frame. (These are the values that are used for GAMs that include salinity.)
6) After the above data frame is created and appended to the list salinity.detrended, the following four (4) additional data frames are created for each station.
mean – For each doy (i.e., 366 days of year), the mean across all years for each value of d. Since samples are not collected on a daily basis it is necessary to aggregate data from within a +/- one-half of dvAvgWinSel-day window around d. (This includes wrapping around the calendar year. That is, the values near the beginning of the year, say January 2, would include values from the last part of December and the first part of January. The variables in the mean data frame are doy, SAP, and BBP.
sd – For each doy (i.e., 366 days of year), the standard deviation across all years for each value of d. (See mean calculations for additional details.)
nobs – For each doy (i.e., 366 days of year), the number of observations across all years for each value of d. (See mean calculations for additional details.)
lowess.sd – Lowess smoothed standard deviations. It is noted that some stations do not include regular sampling in all months of the year or for other reasons have few observations from which to compute standard deviations. Through visual inspection of plots, we found that the standard deviation could become unstable when the number of observations is small. For this reason, when the number of observations is less than minObs.sd, the corresponding value of lowess.sd is removed and interpolated from the remaining observations.
The above four data frames (mean, sd, nobs, and lowess.sd) are created, they are added to a list using a station.sum naming convention and appended to the list salinity.detrended.
if (FALSE) { # \dontrun{
# Show Example Dataset (sal)
str(sal)
# Define Function Inputs
df.sal <- sal
dvAvgWinSel <- 30
lowess.f <- 0.2
minObs <- 40
minObs.sd <- 10
# Run Function
salinity.detrended <- detrended.salinity(df.sal, dvAvgWinSel,
lowess.f, minObs, minObs.sd)
} # }