Replies: 8 comments 33 replies
-
@lmullany Sounds great! @nickreich and I were talking about how this effort ties in nicely to the ongoing infrastructure work for cloud-enabled hubs. Because there are no active submissions to worry about, we could use the data you're creating to test the cloud sync and parquet conversions without the worry of disrupting a live hub. Pushing this historic data to the cloud would also provide a tangible talking point for our conversations about cloud data access, optimization, etc Very excited! |
Beta Was this translation helpful? Give feedback.
-
@bsweger @nickreich: the initial version of the hubverse-formatted flusight1 repo is here: https://github.com/lmullany/flusight1_hub This should be considered preliminary. There are a number of issues that we might want to sort through including:
Maybe we can setup a time to discuss further |
Beta Was this translation helpful? Give feedback.
-
Very exciting--thanks, @lmullany ! I'll await the more specific issues before chiming in, but in the meantime, I'm continuing to codify the function that converts model-output data to parquet (add adds columns to round_id, team, and model) so it will trigger automatically when a hub's data is synced to the cloud. This dovetails nicely with the work you're doing. Once we have an official hubverse-compatible version of the flusight archive, we can add the cloud infrastructure pieces to make the data accessible via S3. 🚀 |
Beta Was this translation helpful? Give feedback.
-
@bsweger the current version of the repo is structured like this: Note that Also note that within the sub-folder, we have a list of parquet files, each one labeled as "YYYY-EW", where EW is the epiweek. This 7 character string is also the Does the constant column have to be included?.. Or can the |
Beta Was this translation helpful? Give feedback.
-
Comment Moved to thread |
Beta Was this translation helpful? Give feedback.
-
fyi @annakrystalli , I'm not sure that snippet is correct for convert epi weeks to a date. I use the following for that. Perhaps there is a better built in function that I'm not aware of. #' Get start date of (epi) given epiweek and epiyear
#'
#' Function returns the date of the first day of the epiweek defined by
#' user provided epiweek (`ew`) and epiyear (`ey`)
date_from_epiweekyear <- function(ey,ew) {
# internal function gets the start date of the first week of the
# give epi year
f_epi_week_start <- function(y) {
j4 = strptime(paste0(y, "-01-04"), "%Y-%m-%d")
dplyr::if_else(j4$wday == 7, as.Date(j4), as.Date(j4) - j4$wday)
}
# get the start date of this epi year
start_1 = f_epi_week_start(ey)
# check that the epi week passed by user does not exceed the
# maximum available for this epi year
max_week = (f_epi_week_start(ey+1)-start_1)/7
if(ew>max_week) {
"The provided epiweek number is not valid for this year"
break
}
# return the start date of this epiweek/epiyear
return(as.Date(start_1+(ew-1)*7))
} |
Beta Was this translation helpful? Give feedback.
-
Question: how does the group feel about changing model name to all lower case (or all upper case). Similarly for team name.
Not a requirement, but would look a bit more standard, and would help a bit as we are renaming / merging different misspellings, etc. |
Beta Was this translation helpful? Give feedback.
-
Definitely in favour of all lower case personally! |
Beta Was this translation helpful? Give feedback.
-
We are going to try to convert one or more achived/no-long-active hubs to hubverse format. The first one that we will attempt to do this for is the 2015-2019 FluSight hub.
Beta Was this translation helpful? Give feedback.
All reactions