Replies: 1 comment 1 reply
-
In response to item 3 above
Assuming we want to predict a daily average, we could use the high-frequency sample sites to derive an error model that results from using discrete samples as the daily average value. We can treat the daily average from the high-frequency sample records as the true daily average, and then compute the error obtained from using only one sample per day from the continuous record. That would allow us to estimate the error distribution associated with using daily discrete samples to represent the daily average values. It might be conditional on site features and could vary by season and/or flow conditions. But even with those nuances, we would have an estimated error distribution that we can use to describe the tolerable model error for the discrete samples. |
Beta Was this translation helpful? Give feedback.
-
A few questions with the theme of how to integrate the discrete and continuous samples for prediction models. I'd expect these questions have come up for other projects and would appreciate lessons learned and insights. @jsadler2 @aappling-usgs
Originally posted by @lekoenig in Add daily nwis data to project pipeline #8
ActivityStartTime.Time
for discrete samples to be useful? About 6% (~6000 samples) do not have a time recorded.Here are my thoughts:
Static prediction model:
Discrete sites: If they have NAs, we can safely drop them. Same as having no sample info for that time.
Continuous sites - daily average: NAs can be dropped. We may also consider dropping days if they have few measurements on a given day. Could implement a cutoff for number of non-NA time steps required to compute a daily average.
Time-aware prediction model:
Discrete sites: Pad at the timestep of prediction? It seems like that would increase file size by a lot. I'd expect the code would be faster if we check that these discrete sites have data in each timestep.
Continuous sites: Yes, I think padding with NAs is useful at the timestep of prediction. For example, if we predict hourly and have 4 NAs in that hour, we set to NA. If we have at least one measurement, we use the average of those measurements. Could create a function that accepts the timestep and timeseries and outputs the timeseries on the simulation timestep.
Beta Was this translation helpful? Give feedback.
All reactions