Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checking formatting of input data frame #27

Closed
caitlinch opened this issue Dec 2, 2024 · 4 comments
Closed

Checking formatting of input data frame #27

caitlinch opened this issue Dec 2, 2024 · 4 comments
Assignees
Labels

Comments

@caitlinch
Copy link
Collaborator

New function checkInputData() in script checkInputData.R.
New tests for checkInputData() in script checkInputData.R (see #26)

Function includes the following checks:
Warnings
Check whether empty rows are present
Check whether rows containing only NA are present
Check whether rows have one or more missing value

Errors
Check that the result column is present
Check that the poolSize column is present
Check that results column values are numeric or integer
Check that poolSize column values are numeric or integer
Check that result column contains only 0 and 1

@caitlinch caitlinch self-assigned this Dec 2, 2024
@caitlinch caitlinch added the tests label Dec 2, 2024
@caitlinch
Copy link
Collaborator Author

For hierarchical data, want to check that the Site column has unique variables.
Difficult - can't just check names as even if the Site column has unique names for each location, there could be multiple pools for that location.
Check how Angus handles this within PoolTestR

@caitlinch
Copy link
Collaborator Author

caitlinch commented Dec 3, 2024

Managing hierarchical data:

  • Update doco - users should enter hierarchical variables in order of largest to smallest
  • Add check to test for uniqueness in hierarchical variables (each Site should only appear in one Village, each Village should only appear in each Region)
    • Should work no matter how many columns are input
    • Check if something similar is implemented in PoolTools
  • Add new helper function to create unique identifiers for each location, given the full hierarchical sampling scheme
  • Document helper function
  • Add tests for helper function
  • Add examples for helper function
  • Export helper function (make it available to users)

@caitlinch
Copy link
Collaborator Author

Internal functions to check data: CheckInputData(), CheckClusterVars()
Exported function for users to check data with hierarchical/clustered sampling scheme: PrepareClusterData()

@caitlinch
Copy link
Collaborator Author

See file tests/testthat/test-CheckInputData.R on test-suite branch for completed tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant