Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static #3

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open

Static #3

wants to merge 23 commits into from

Conversation

yonicd
Copy link
Collaborator

@yonicd yonicd commented Feb 2, 2024

Pull Request

initial version of clindata package.

todo:

  • construct api for caching
  • write function documentation
  • pass initial r cmd check locally
  • write unit tests
  • write vignettes
  • pass complete r cmd check

Copy link

github-actions bot commented Feb 2, 2024

Unit Tests Summary

11 tests   11 ✅  0s ⏱️
 2 suites   0 💤
 1 files     0 ❌

Results for commit 47c5f30.

♻️ This comment has been updated with latest results.

@yonicd
Copy link
Collaborator Author

yonicd commented Feb 6, 2024

@danielinteractive @wlandau I am still adding a few more unit tests for the api, but the simulation scripts are ready to be looked at if you have time.

I set up the target pipelines in a way that scenarios to generate various flavors of the data can be easily added by either us or other users.

The data from {mmrm} is currently being stored in inst/benchmark

Copy link
Collaborator

@danielinteractive danielinteractive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @yonicd ! cool stuff

DESCRIPTION Outdated
Version: 0.0.1
Date: 2023-11-09
Authors@R: person("Jonathan","Sidi" , , "[email protected]", role = c("aut", "cre"))
Description: Open clinical data for openpharma packages.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Description: Open clinical data for openpharma packages.
Description: Open clinical data for use in R packages.

make it more general?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about lean more into the idea that the package is a vehicle to transport reproducible simulations of clinical trial data? That has more potential to gain traction i think

Copy link

@wlandau wlandau Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about non-simulated datasets from real trials? From our original discussion that led to this package, I think that would also be valuable for scope because it removes the burden from packages like mmrm and brms.mmrm from having to prove that the data are publicly available. Also, it is helpful to test on datasets which were not explicitly generated from any model we have. We would get a better sense of robustness and flexibility, especially under slight misspecification.

Copy link
Collaborator Author

@yonicd yonicd Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the work on the package I have come to understand those weren't "real" data sets. They were realizations of simulated script. But I am to keep the option for real data if we come across it. Those datasets are being stored in the package (inst/benchmark) to allow for mmrm/brms.mmrm to depend on it.

But I think in our last conversation in the group we talked about brms.mmrm can just use a seed on the simulation script from clindata simulated set and detach from the mmrm static dataset.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue of storing real data iirc was the scalability of the size of the package. This led to the caching api.

DESCRIPTION Outdated Show resolved Hide resolved
R/api_cache.R Outdated Show resolved Hide resolved
R/api_cache.R Outdated Show resolved Hide resolved
R/api_cache.R Outdated Show resolved Hide resolved
@@ -1,188 +1,48 @@
# r.pkg.template
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recommend to have a source README.Rmd and then knit that that README.md

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the documentation that I need to add for usage. I will convert to Rmd once there is something to render :)

_pkgdown.yml Show resolved Hide resolved
_targets.R Outdated Show resolved Hide resolved
expected <- "Hello, Tim"
expect_identical(result, expected)
})
testthat::describe("caching logic", {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would recommend to load the testthat package and then avoid the many prefixes with its name in the tests

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

old habits... adding ns to every fn as a reflex

@@ -0,0 +1,35 @@
testthat::describe("data api logic", {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting why not use test_that()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is just a convention i am used to. i like it more for creating hierarchies in reporting. It lets me get a better reporting structure when tests fail

https://testthat.r-lib.org/reference/describe.html

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, good to know! looks like a nice idea

@wlandau
Copy link

wlandau commented Feb 6, 2024

Interesting approach to generating fixed datasets with targets pipelines. I wonder though, since the bcva and fev datasets are simulated and the simulation code runs quickly, would it be easier to instead expose these as functions that the user could run to generate a dataset on the fly?

It's neat that you cover MCAR and MAR in data-raw/pipelines/helpers.R

@yonicd
Copy link
Collaborator Author

yonicd commented Feb 6, 2024

@wlandau thanks. My initial intention was to make an example pipeline for people to "fork" and then add/extend their own simulation sets/designs. I thought that embedding it inside the package would create another hurdle for non pkg dev to join. Maybe the data agnostic fns can be put in the pkg?

The adv that i can see working through targets is that it enforces structure and is scalable and efficient (but you know that ;))

@wlandau
Copy link

wlandau commented Feb 6, 2024

Yeah, I think if there are functions that generate datasets and do not require any fixed clinical data to start with, then I think those would make sense to document and export as usual.

move _targets into data-raw
add _targets.yaml to accomodate non root deployment
clean up license
update targets pipelines to depend on package
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants