Skip to content

Commit

Permalink
Merge branch 'develop' into siteID-refactor
Browse files Browse the repository at this point in the history
  • Loading branch information
Sweetdevil144 authored Aug 16, 2024
2 parents 903efc9 + bb2cda9 commit 14950aa
Show file tree
Hide file tree
Showing 3 changed files with 200 additions and 67 deletions.
45 changes: 45 additions & 0 deletions CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Contributor Covenant Code of Conduct

**Our Pledge**

In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.

**Our Standards**

Examples of behavior that contributes to creating a positive environment include:

* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members

Examples of unacceptable behavior by participants include:

* The use of sexualized language or imagery and unwelcome sexual attention or advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a professional setting



**Our Responsibilities**

Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.

Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.

**Scope**

This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.

**Enforcement**

Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at pecanproj[at]gmail.com. All complaints will be reviewed and investigated and will result in a response that is deemed necessary and appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.

Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.

**Attribution**

This Code of Conduct is adapted from the [Contributor Covenant](http://contributor-covenant.org/) version 1.4, available at [http://contributor-covenant.org/version/1/4](http://contributor-covenant.org/version/1/4/).
15 changes: 7 additions & 8 deletions DEV-INTRO.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,6 @@ You can copy the [`docker/env.example`](docker/env.example) file as .env in your
cp docker/env.example .env
```


The variables we want to modify are:

- `COMPOSE_PROJECT_NAME`, the prefix for all containers. Set this to "pecan".
Expand Down Expand Up @@ -181,13 +180,13 @@ Next copy the R packages from a container to volume `pecan_lib`. This is not rea

You can copy all the data using the following command. This will copy all compiled packages to your local machine.

```
```bash
docker run -ti --rm -v pecan_R_library:/rlib pecan/base:develop cp -a /usr/local/lib/R/site-library/. /rlib/
```

If you have set a custom UID or GID in your `.env`, change ownership of these files as described above for the data volume. E.g. if you use the same UID in the containers as on your host machine, run:

```
```bash
docker run -ti --rm -v pecan_R_library:/rlib pecan/base:develop chown -R "$(id -u):$(id -g)" /rlib/
```

Expand All @@ -210,7 +209,7 @@ For Windows
copy docker\web\config.docker.php web\config.php
```

## PEcAn Development
## PEcAn Development Setup

To begin development we first have to bring up the full PEcAn stack. This assumes you have done once the steps above. You don\'t need to stop any running containers, you can use the following command to start all containers. At this point you have PEcAn running in docker.

Expand Down Expand Up @@ -239,13 +238,13 @@ R CMD ../web/workflow.R --settings docker.sipnet.xml

A better way of doing this is developed as part of GSOC, in which case you can leverage of the restful interface defined, or using the new R PEcAn API package.

# PEcAn URLs
## PEcAn URLs

You can check the RabbitMQ server used by pecan using <https://rabbitmq.pecan.localhost> on the same server that the docker stack is running on. You can use rstudio either with <http://server/rstudio> or at <http://rstudio.pecan.localhost>. To check the traefik dashboard you can use <http://traefik.pecan.localhost>.

If the stack is running on a remote machine, you can use ssh and port forwarding to connect to the server. For example `ssh -L 8000:localhost:80` will allow you to use <http://rabbitmq.pecan.localhost:8000/> in your browser to connect to the remote PEcAn server RabbitMQ.

# Directory Structure
## Directory Structure

Following are the main folders inside the pecan repository.

Expand Down Expand Up @@ -281,9 +280,9 @@ Some of the docker build files. The Dockerfiles for each model are placed in the

Small scripts that are used as part of the development and installation of PEcAn.

# Advanced Development Options
## Advanced Development Options

## Reset all containers/database
### Reset all containers/database

If you want to start from scratch and remove all old data, but keep your pecan checked out folder, you can remove the folders where you have written the data (see `folders` below). You will also need to remove any of the docker managed volumes. To see all volumes you can do `docker volume ls -q -f name=pecan`. If you are sure, you can either remove them one by one, or remove them all at once using the command below. **THIS DESTROYS ALL DATA IN DOCKER MANAGED VOLUMES.**.

Expand Down
207 changes: 148 additions & 59 deletions modules/assim.sequential/R/downscale_function.R
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,38 @@ SDA_downscale_preprocess <- function(data_path, coords_path, date, carbon_pool)
return(list(input_data = input_data, site_coordinates = site_coordinates, carbon_data = carbon_data))
}

##' @title Create folds function
##' @name create_folds
##' @author Sambhav Dixit
##'
##' @param y Vector. A vector of outcome data or indices.
##' @param k Numeric. The number of folds to create.
##' @param list Logical. If TRUE, returns a list of fold indices. If FALSE, returns a vector.
##' @param returnTrain Logical. If TRUE, returns indices for training sets. If FALSE, returns indices for test sets.
##' @details This function creates k-fold indices for cross-validation. It can return either training or test set indices, and the output can be in list or vector format.
##'
##' @description This function generates k-fold indices for cross-validation, allowing for flexible output formats.
##'
##' @return A list of k elements (if list = TRUE), each containing indices for a fold, or a vector of indices (if list = FALSE).

create_folds <- function(y, k, list = TRUE, returnTrain = FALSE) {
n <- length(y)
indices <- seq_len(n)
folds <- split(indices, cut(seq_len(n), breaks = k, labels = FALSE))

if (!returnTrain) {
folds <- folds # Test indices are already what we want
} else {
folds <- lapply(folds, function(x) indices[-x]) # Return training indices
}

if (!list) {
folds <- unlist(folds)
}

return(folds)
}

##' @title SDA Downscale Function
##' @name SDA_downscale
##' @author Joshua Ploshay, Sambhav Dixit
Expand Down Expand Up @@ -140,84 +172,141 @@ SDA_downscale <- function(preprocessed, date, carbon_pool, covariates, model_typ
predictions[[i]] <- stats::predict(models[[i]], test_data)
}
} else if (model_type == "cnn") {
# Define k_folds and num_bags
k_folds <- 5
num_bags <- 5

# Reshape input data for CNN
x_train <- keras3::array_reshape(x_train, c(nrow(x_train), 1, ncol(x_train)))
x_test <- keras3::array_reshape(x_test, c(nrow(x_test), 1, ncol(x_test)))

for (i in seq_along(carbon_data)) {
# Define the CNN model architecture
# Used dual batch normalization and dropout as the first set of batch normalization and dropout operates on the lower-level features extracted by the convolutional layer, the second set works on the higher-level features learned by the dense layer.
model <- keras3::keras_model_sequential() |>
# 1D Convolutional layer: Extracts local features from input data
keras3::layer_conv_1d(filters = 64, kernel_size = 1, activation = 'relu', input_shape = c(1, length(covariate_names))) |>
# Batch normalization: Normalizes layer inputs, stabilizes learning, reduces internal covariate shift
keras3::layer_batch_normalization() |>
# Dropout: Randomly sets some of inputs to 0, reducing overfitting and improving generalization
keras3::layer_dropout(rate = 0.3) |>
# Flatten: Converts 3D output to 1D for dense layer input
keras3::layer_flatten() |>
# Dense layer: Learns complex combinations of features
keras3::layer_dense(units = 64, activation = 'relu') |>
# Second batch normalization: Further stabilizes learning in deeper layers
keras3::layer_batch_normalization() |>
# Second dropout: Additional regularization to prevent overfitting in final layers
keras3::layer_dropout(rate = 0.3) |>
# Output layer: Single neuron for regression prediction
keras3::layer_dense(units = 1)
all_models <- list()

# Learning rate scheduler
lr_schedule <- keras3::learning_rate_schedule_exponential_decay(
initial_learning_rate = 0.001,
decay_steps = 1000,
decay_rate = 0.9
)
# Create k-fold indices
fold_indices <- create_folds(y = seq_len(nrow(x_train)), k = k_folds, list = TRUE, returnTrain = FALSE)

# Compile the model
model |> keras3::compile(
loss = 'mean_squared_error',
optimizer = keras3::optimizer_adam(learning_rate = lr_schedule),
metrics = c('mean_absolute_error')
)

# Early stopping callback
early_stopping <- keras3::callback_early_stopping(
monitor = 'val_loss',
patience = 10,
restore_best_weights = TRUE
)
#initialise operations for each fold
for (fold in 1:k_folds) {
cat(sprintf("Processing ensemble %d, fold %d of %d\n", i, fold, k_folds))

# Split data into training and validation sets for this fold
train_indices <- setdiff(seq_len(nrow(x_train)), fold_indices[[fold]])
val_indices <- fold_indices[[fold]]

x_train_fold <- x_train[train_indices, , drop = FALSE]
y_train_fold <- y_train[train_indices, i]
x_val_fold <- x_train[val_indices, , drop = FALSE]
y_val_fold <- y_train[val_indices, i]

# Create bagged models for this fold
fold_models <- list()
for (bag in 1:num_bags) {
# Create bootstrap sample
bootstrap_indices <- sample(1:nrow(x_train_fold), size = nrow(x_train_fold), replace = TRUE)
x_train_bag <- x_train_fold[bootstrap_indices, ]
y_train_bag <- y_train_fold[bootstrap_indices]

# Define the CNN model architecture
# Used dual batch normalization and dropout as the first set of batch normalization and
model <- keras3::keras_model_sequential() |>
# Layer Reshape : Reshape to fit target shape for the convolutional layer
keras3::layer_reshape(target_shape = c(ncol(x_train), 1, 1), input_shape = ncol(x_train)) |>
# 1D Convolutional layer: Extracts local features from input data
keras3::layer_conv_2d(
filters = 32,
kernel_size = c(3, 1),
activation = 'relu',
padding = 'same'
) |>
# Flatten: Converts 3D output to 1D for dense layer input
keras3::layer_flatten() |>
# Dense layer: Learns complex combinations of features
keras3::layer_dense(
units = 64,
activation = 'relu',
kernel_regularizer = keras3::regularizer_l2(0.01)
) |>
# Batch normalization: Normalizes layer inputs, stabilizes learning, reduces internal covariate shift
keras3::layer_batch_normalization() |>
# Dropout: Randomly sets some of inputs to 0, reducing overfitting and improving generalization
keras3::layer_dropout(rate = 0.3) |>
# Dense layer: Learns complex combinations of features
keras3::layer_dense(
units = 32,
activation = 'relu',
kernel_regularizer = keras3::regularizer_l2(0.01)
) |>
# Batch normalization: Further stabilizes learning in deeper layers
keras3::layer_batch_normalization() |>
# Dropout: Additional regularization to prevent overfitting in final layer
keras3::layer_dropout(rate = 0.3) |>
# Output layer: Single neuron for regression prediction
keras3::layer_dense(
units = 1,
kernel_regularizer = keras3::regularizer_l2(0.01)
)

# Learning rate scheduler
lr_schedule <- keras3::learning_rate_schedule_exponential_decay(
initial_learning_rate = 0.001,
decay_steps = 1000,
decay_rate = 0.9
)

# Early stopping callback
early_stopping <- keras3::callback_early_stopping(
monitor = 'loss',
patience = 10,
restore_best_weights = TRUE
)

# Train the model
model |> keras3::fit(
x = x_train,
y = y_train[, i],
epochs = 500, # Increased max epochs
batch_size = 32,
validation_split = 0.2,
callbacks = list(early_stopping),
verbose = 0
)
# Compile the model
model |> keras3::compile(
loss = 'mean_squared_error',
optimizer = keras3::optimizer_adam(learning_rate = lr_schedule),
metrics = c('mean_absolute_error')
)

# Store the trained model
models[[i]] <- model
# Train the model
model |> keras3::fit(
x = x_train_bag,
y = y_train_bag,
epochs = 500,
batch_size = 32,
callbacks = list(early_stopping),
verbose = 0
)

#CNN predictions
cnn_predict <- function(model, newdata, scaling_params) {
# Store the trained model for this bag in the fold_models list
fold_models[[bag]] <- model
}

# Add fold models to all_models list
all_models <- c(all_models, fold_models)
}

# Store all models for this ensemble
models[[i]] <- all_models

# Use all models for predictions
cnn_ensemble_predict <- function(models, newdata, scaling_params) {
newdata <- scale(newdata, center = scaling_params$mean, scale = scaling_params$sd)
newdata <- keras3::array_reshape(newdata, c(nrow(newdata), 1, ncol(newdata)))
predictions <- stats::predict(model, newdata)
return(as.vector(predictions))
predictions <- sapply(models, function(m) stats::predict(m, newdata))
return(rowMeans(predictions))
}

# Create a prediction raster from covariates
prediction_rast <- terra::rast(covariates)

# Generate spatial predictions using the trained model
maps[[i]] <- terra::predict(prediction_rast, model = models[[i]],
fun = cnn_predict,
fun = cnn_ensemble_predict,
scaling_params = scaling_params)

# Make predictions on held-out test data
predictions[[i]] <- cnn_predict(models[[i]], x_data[-sample, ], scaling_params)
predictions[[i]] <- cnn_ensemble_predict(models[[i]], x_data[-sample, ], scaling_params)

}
} else {
stop("Invalid model_type. Please choose either 'rf' for Random Forest or 'cnn' for Convolutional Neural Network.")
Expand Down

0 comments on commit 14950aa

Please sign in to comment.