Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to Optionally get site.info if not present #3324

Open
wants to merge 59 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 43 commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
cb41971
Accept site.data as function input
Sweetdevil144 Jul 3, 2024
8b82f2b
Pass site.data via args
Sweetdevil144 Jul 3, 2024
8c998fd
Read Command Line arguments
Sweetdevil144 Jul 3, 2024
7a5a848
Update str_ns declaration
Sweetdevil144 Jul 3, 2024
74365fd
Update site$lon
Sweetdevil144 Jul 3, 2024
a0ae689
Update if-else block for site.id
Sweetdevil144 Jul 4, 2024
18dd086
Add helper function to get new.site
Sweetdevil144 Jul 5, 2024
7a69113
Roxygenise the documentations
Sweetdevil144 Jul 5, 2024
604634b
Update Function Name
Sweetdevil144 Jul 5, 2024
0f154e5
Export dependency to DB rather than utils
Sweetdevil144 Jul 5, 2024
fae37a8
Update test case
Sweetdevil144 Jul 5, 2024
cb42033
Update 'exampes'
Sweetdevil144 Jul 5, 2024
5ee450a
Return null when site$id is not present
Sweetdevil144 Jul 6, 2024
e37b93b
Remove NULL calling
Sweetdevil144 Jul 6, 2024
144ce9c
Fixing 'no global binding err'
Sweetdevil144 Jul 7, 2024
54ce644
update global variable handling
Sweetdevil144 Jul 8, 2024
6d76455
Remove global binding changes
Sweetdevil144 Jul 8, 2024
ca0b50e
Update global variables
Sweetdevil144 Jul 8, 2024
8082194
Ipdate all instances of latlon
Sweetdevil144 Jul 8, 2024
65cdea7
Roxygenise documents
Sweetdevil144 Jul 8, 2024
acb8507
Usage Updates added
Sweetdevil144 Jul 8, 2024
59f451a
Doc Updates added
Sweetdevil144 Jul 8, 2024
f0ca425
Update test files
Sweetdevil144 Jul 9, 2024
e59c981
Merge branch 'PecanProject:develop' into siteID-refactor
Sweetdevil144 Jul 9, 2024
d66503f
Refactor get.new.site function to generate site IDs and handle missin…
Sweetdevil144 Jul 9, 2024
6d0b730
Update site utilizations
Sweetdevil144 Jul 9, 2024
f5fac44
Update documentations
Sweetdevil144 Jul 9, 2024
9d328eb
Refactor Errors in check
Sweetdevil144 Jul 9, 2024
45fc781
Update conn pass
Sweetdevil144 Jul 9, 2024
16a9b93
Merge branch 'develop' into siteID-refactor
Sweetdevil144 Jul 11, 2024
6a4b131
Merge branch 'PecanProject:develop' into siteID-refactor
Sweetdevil144 Jul 13, 2024
798d09f
Merge branch 'PecanProject:develop' into siteID-refactor
Sweetdevil144 Jul 13, 2024
5ba853d
Merge branch 'develop' into siteID-refactor
Sweetdevil144 Jul 18, 2024
e2cf4f1
Update ic_process
Sweetdevil144 Jul 18, 2024
18cc1ae
Update Logger statements
Sweetdevil144 Jul 19, 2024
3ee93a5
Merge branch 'develop' into siteID-refactor
Sweetdevil144 Jul 25, 2024
e7be402
Merge branch 'PecanProject:develop' into siteID-refactor
Sweetdevil144 Jul 31, 2024
a314897
Merge branch 'develop' into siteID-refactor
Sweetdevil144 Jul 31, 2024
92b9b06
Merge branch 'develop' into siteID-refactor
Sweetdevil144 Aug 5, 2024
e76124d
Merge branch 'develop' into siteID-refactor
Sweetdevil144 Aug 9, 2024
dff7ab8
Merge branch 'develop' into siteID-refactor
Sweetdevil144 Aug 12, 2024
903efc9
Merge branch 'develop' into siteID-refactor
Sweetdevil144 Aug 14, 2024
14950aa
Merge branch 'develop' into siteID-refactor
Sweetdevil144 Aug 16, 2024
d392847
Update base/db/R/get.new.site.R
Sweetdevil144 Aug 21, 2024
0ac5b39
remove browndog support from benchmark
Sweetdevil144 Aug 28, 2024
2639e88
Merge branch 'develop' into siteID-refactor
Sweetdevil144 Aug 28, 2024
308380e
Merge branch 'develop' into siteID-refactor
Sweetdevil144 Aug 31, 2024
9edc268
Refactor changes from review
Sweetdevil144 Aug 31, 2024
a7c4b7d
Merge branch 'develop' into siteID-refactor
Sweetdevil144 Sep 3, 2024
59b293e
Implement temporary hashing to generate UID
Sweetdevil144 Sep 5, 2024
b685c58
Merge branch 'develop' into siteID-refactor
Sweetdevil144 Sep 5, 2024
65b976d
Remove unrequired changes which were accidently pushed to this branch
Sweetdevil144 Sep 5, 2024
99e4345
Refactor Dependency chart
Sweetdevil144 Sep 5, 2024
a4025a8
Using openssl for hashing lat/lon
Sweetdevil144 Sep 5, 2024
f6390eb
Merge branch 'develop' into siteID-refactor
Sweetdevil144 Sep 15, 2024
5135e82
Merge branch 'develop' into siteID-refactor
Sweetdevil144 Sep 21, 2024
c411a6e
Merge branch 'develop' into siteID-refactor
Sweetdevil144 Oct 7, 2024
0009c6e
Merge branch 'develop' into siteID-refactor
Sweetdevil144 Nov 20, 2024
9039c23
Merge branch 'develop' into siteID-refactor
Sweetdevil144 Nov 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions base/db/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ export(derive.traits)
export(dplyr.count)
export(fancy_scientific)
export(get.id)
export(get.new.site)
export(get.trait.data)
export(get.trait.data.pft)
export(get_postgres_envvars)
Expand Down
153 changes: 153 additions & 0 deletions base/db/R/get.new.site.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
#-------------------------------------------------------------------------------
# Copyright (c) 2012 University of Illinois, NCSA.
# All rights reserved. This program and the accompanying materials
# are made available under the terms of the
# University of Illinois/NCSA Open Source License
# which accompanies this distribution, and is available at
# http://opensource.ncsa.illinois.edu/license.html
#-------------------------------------------------------------------------------
Sweetdevil144 marked this conversation as resolved.
Show resolved Hide resolved

##' Get new site info using provided site information
Sweetdevil144 marked this conversation as resolved.
Show resolved Hide resolved
##'
##' @title Get New Site Info
##' @param site a dataframe with site information including id, lat, lon, and time_zone.
##' @param con Database connection object.
##' @param latlon Optional global latlon object which will store updated lat/lon.
##' @return a dataframe with new site information on lat, lon and time_zone
##' @export
##' @author Abhinav Pandey
##'
##' @examples
##' get.new.site(site=data.frame(id=1,lat=40.1,lon=-88.2,time_zone="UTC"),con=NULL,latlon=NULL)

get.new.site <- function(site, con = NULL, latlon = NULL) {
if (is.null(con)) {
PEcAn.logger::logger.debug("DB connection is closed. Trying to generate a new site ID or use pre-existing one.")
# No DB connection present. Generate a new ID using one of below steps:

if (is.null(site$id) | is.na(site$id)) {
if ((!is.null(site$lat) && !is.null(site$lon)) |
(!is.na(site$lat) && !is.na(site$lon))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird little gotcha here -- the !is.null will fire when lat and lon are NA, and the !is.na will return a missing value when lat and lon are null.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want && instead of |

) {
site.id <- generate_siteID(site$lat, site$lon)
new.site <- data.frame(
id = as.numeric(site.id),
lat = site$lat,
lon = site$lon
)
str_ns <- paste0(new.site$lat, "-", new.site$lon)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to use a string other than "-" to concatenate here, to avoid confusion between separator characters and negative lon values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced current logic with a "_" sign

} # lat/lon present but ID is absent
# Use Pre-existing normalised lat/lon
else {
PEcAn.logger::logger.warn("Site dataframe does not have an id column")
site.id <- generate_new_siteID()
PEcAn.logger::logger.info(paste0("Generated siteID:", site.id))
# Optionally, create a new site data frame with the random ID
new.site <- data.frame(
id = site.id,
lat = NULL,
lon = NULL
)
str_ns <- paste0(new.site$id %/% 1e+09, "-", new.site$id %% 1e+09)
# We used a different str_ns as identifier here due to absence of lat/lon
}
# ID as well as lat/lon absent.
# Return a WARN as we will be unable to identify such an instance due to lack of information.
# We'll try to Generate a new ID similar to previous ones.
} else {
if ((!is.null(site$lat) && !is.null(site$lon)) |
(!is.na(site$lat) && !is.na(site$lon))
) {
new.site <- data.frame(
id = as.numeric(site$id),
lat = site$lat,
lon = site$lon
)
str_ns <- paste0(new.site$lat, "-", new.site$lon)
} # siteId as well as lat/lon present
else {
new.site <- data.frame(
id = site$id,
lat = NULL,
lon = NULL
)
str_ns <- paste0(new.site$id %/% 1e+09, "-", new.site$id %% 1e+09)
} # siteId present but lat/lon absent
}
} else {
# Check if site dataframe has an id column
if (is.null(site$id) | is.na(site$id)) {
PEcAn.logger::logger.warn("Site dataframe does not have an id column. Generating a unique ID")
site.id <- generate_new_siteID()
PEcAn.logger::logger.info(paste0("Generated siteID:", site.id))
if ((!is.null(site$lat) && !is.null(site$lon)) |
(!is.na(site$lat) && !is.na(site$lon))
) {
new.site <- data.frame(
id = as.numeric(site.id),
lat = site$lat,
lon = site$lon
)
str_ns <- paste0(site$lat, "-", site$lon)
} else {
new.site <- data.frame(
id = site.id,
lat = NULL,
lon = NULL
)
str_ns <- paste0(new.site$id %/% 1e+09, "-", new.site$id %% 1e+09)
# We used a different str_ns as identifier here due to absence of lat/lon
}
} else {
# setup site database number, lat, lon and name and copy for format.vars if new input
if ((!is.null(site$lat) && !is.null(site$lon)) |
(!is.na(site$lat) && !is.na(site$lon))
) {
new.site <- data.frame(
id = as.numeric(site$id),
lat = site$lat,
lon = site$lon
)
str_ns <- paste0(site$lat, "-", site$lon)
} else {
latlon <- query.site(site$id, con = con)[c("lat", "lon")]
new.site <- data.frame(
id = as.numeric(site$id),
lat = latlon$lat,
lon = latlon$lon
)
str_ns <- paste0(new.site$lat, "-", new.site$lon)
}
}
}

site.info <- list(new.site = new.site, str_ns = str_ns)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q in live review: why this structure? A: to match what's expected at met.process line 165

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Detailed Explanation : Current structure is followed to match data flow of what information is later on utilised in met.process for other function calls


return(site.info)
}

# Function to generate a normalized site ID using lat / lon
# Example usage
# siteID <- generate_siteID(-34.9284989, 138.6007456)
generate_siteID <- function(lat, lon) {
# Normalize latitude and longitude
norm_lat <- lat + 90 # Shift range to [0, 180]
norm_lon <- lon + 180 # Shift range to [0, 360]

# Scale normalized values (Assuming 4 digits each for lat and lon)
scaled_lat <- as.integer((norm_lat / 180) * 9999) # Scaled latitude
scaled_lon <- as.integer((norm_lon / 360) * 9999) # Scaled longitude

# Create a unique ID by concatenating scaled values
siteID <- as.integer(paste0(scaled_lat, scaled_lon))

# Return the siteID
return(siteID)
}
Sweetdevil144 marked this conversation as resolved.
Show resolved Hide resolved

# Function to generate a new siteID (db-less runs ONLY)
generate_new_siteID <- function() {
# Generate a random number. Assuming higher order integers to increase randomness in IDs
random_id <- sample(10000:99999999, 1)
return(as.numeric(random_id))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want IDs to be deterministic -- don't need this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you suggest a method to perform so? I am unable to determine one right now :) !!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One approach I suggest is as follow : Hashing to generate a unique id using lat and lon of the site. This would ensure greater precision upto 8 decimal points.

27 changes: 27 additions & 0 deletions base/db/man/get.new.site.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion base/workflow/R/do_conversions.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
##' @author Ryan Kelly, Rob Kooper, Betsy Cowdery, Istem Fer

do_conversions <- function(settings, overwrite.met = FALSE, overwrite.fia = FALSE, overwrite.ic = FALSE) {

if (PEcAn.settings::is.MultiSettings(settings)) {
return(PEcAn.settings::papply(settings, do_conversions))
}
Expand Down Expand Up @@ -52,7 +53,7 @@ do_conversions <- function(settings, overwrite.met = FALSE, overwrite.fia = FALS
# IC conversion : for now for ED only, hence the css/pss/site check
# <useic>TRUE</useic>
if (ic.flag) {
settings <- PEcAn.data.land::ic_process(settings, input, dir = dbfiles, overwrite = overwrite.ic)
settings <- PEcAn.data.land::ic_process(settings, input, dir = dbfiles, overwrite = overwrite.ic, site = settings$run$site)
Sweetdevil144 marked this conversation as resolved.
Show resolved Hide resolved
needsave <- TRUE
}

Expand Down
11 changes: 6 additions & 5 deletions modules/data.atmosphere/R/met.process.R
Original file line number Diff line number Diff line change
Expand Up @@ -161,11 +161,12 @@ met.process <- function(site, input_met, start_date, end_date, model,
}

# setup site database number, lat, lon and name and copy for format.vars if new input
latlon <- PEcAn.DB::query.site(site$id, con = con)[c("lat", "lon")]
new.site <- data.frame(id = as.numeric(site$id),
lat = latlon$lat,
lon = latlon$lon)
str_ns <- paste0(new.site$id %/% 1e+09, "-", new.site$id %% 1e+09)
latlon <- NULL
site.info <- PEcAn.DB::get.new.site(site, con=con, latlon = latlon)

# extract new.site and str_ns from site.info
new.site <- site.info$new.site
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hypothesis to confirm before acting: No downstream object uses new.site$id, can remove it here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new.site$id is later on passed on to download.raw.met.module for further conversions in convert_input function. I guess for that instance we will have to keep it rather than duplicating the code.

str_ns <- site.info$str_ns

if (is.null(format.vars$lat)) {
format.vars$lat <- new.site$lat
Expand Down
4 changes: 2 additions & 2 deletions modules/data.atmosphere/tests/testthat/test.met.process.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
test_that("`met.process` able to call .download.raw.met.module based on met process stage params", {
test_that("'met.proces' able to call .download.raw.met.module based on met process stage params", {
input_met <- list(source = 'CRUNCEP', id = '1')

mockery::stub(met.process, 'PEcAn.DB::db.open', 1)
Expand All @@ -8,7 +8,7 @@ test_that("`met.process` able to call .download.raw.met.module based on met proc
mockery::stub(met.process, 'PEcAn.DB::query.format.vars', list())
mockery::stub(met.process, 'PEcAn.DB::dbfile.check', list(id = 1))
mockery::stub(met.process, 'assign', 1)
mockery::stub(met.process, 'PEcAn.DB::query.site', list(lat = 0, lon = 0))
mockery::stub(met.process, 'PEcAn.DB::get.new.site', list(new.site = data.frame(id = 1, lat = 0, lon = 0), str_ns = "0-0"))
mockery::stub(met.process, 'met.process.stage', list(download.raw = TRUE, met2cf = FALSE, standardize = FALSE, met2model = FALSE))

mocked_res <- mockery::mock(1)
Expand Down
26 changes: 18 additions & 8 deletions modules/data.land/R/ic_process.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,14 @@
##' @param input Taken from settings$run$inputs. This should include id, path, and source
##' @param dir settings$database$dbfiles
##' @param overwrite Default = FALSE. whether to force ic_process to proceed
##' @param site Current site information
##'
##' @author Istem Fer, Hamze Dokoohaki
ic_process <- function(settings, input, dir, overwrite = FALSE){
ic_process <- function(settings, input, dir, overwrite = FALSE, site = settings$run$site){
Sweetdevil144 marked this conversation as resolved.
Show resolved Hide resolved


#--------------------------------------------------------------------------------------------------#
# Extract info from settings and setup
site <- settings$run$site
model <- list()
model$type <- settings$model$type
model$id <- settings$model$id
Expand Down Expand Up @@ -49,15 +49,25 @@ ic_process <- function(settings, input, dir, overwrite = FALSE){
con <- PEcAn.DB::db.open(dbparms$bety)
on.exit(PEcAn.DB::db.close(con), add = TRUE)

#grab site lat and lon info
latlon <- PEcAn.DB::query.site(site$id, con = con)[c("lat", "lon")]
# setup site database number, lat, lon and name and copy for format.vars if new input
new.site <- data.frame(id = as.numeric(site$id),
lat = latlon$lat,
lon = latlon$lon)
latlon <- NULL
if(is.null(site$lat) | is.null(site$lon)) {
site.info <- PEcAn.DB::get.new.site(site, con=con, latlon = latlon)

new.site$name <- settings$run$site$name
# extract new.site and str_ns from site.info
new.site <- site.info$new.site
str_ns <- site.info$str_ns
} else {
latlon <- list(lon = site$lat, lon=site$lon)
new.site <- data.frame(
id = as.numeric(site$id),
lat = site$lat,
lon = site$lon
)
str_ns <- paste0(site$lat, "-", site$lon)
}

new.site$name <- settings$run$site$name

str_ns <- paste0(new.site$id %/% 1e+09, "-", new.site$id %% 1e+09)

Expand Down
15 changes: 8 additions & 7 deletions modules/data.land/R/soil_process.R
Original file line number Diff line number Diff line change
Expand Up @@ -28,13 +28,14 @@ soil_process <- function(settings, input, dbfiles, overwrite = FALSE,run.local=T
# set up bety connection
con <- PEcAn.DB::db.open(dbparms$bety)
on.exit(PEcAn.DB::db.close(con), add = TRUE)
# get site info
latlon <- PEcAn.DB::query.site(site$id, con = con)[c("lat", "lon")]
new.site <- data.frame(id = as.numeric(site$id),
lat = latlon$lat,
lon = latlon$lon)

str_ns <- paste0(new.site$id %/% 1e+09, "-", new.site$id %% 1e+09)

# setup site database number, lat, lon and name and copy for format.vars if new input
latlon <- NULL
site.info <- PEcAn.DB::get.new.site(site, con=con, latlon = latlon)

# extract new.site and str_ns from site.info
new.site <- site.info$new.site
str_ns <- site.info$str_ns
Sweetdevil144 marked this conversation as resolved.
Show resolved Hide resolved

outfolder <- file.path(dbfiles, paste0(input$source, "_site_", str_ns))

Expand Down
4 changes: 3 additions & 1 deletion modules/data.land/man/ic_process.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading