R inference error in calculating likelihood when data is missing/NA #286
Labels
bug
Defects or errors in the code.
high priority
High priority.
r-inference
Relating to the R inference package.
Milestone
Describe the bug
Inconsistent/unwanted behaviour when there are NAs in the ground truth fitting data. Currently in classical R inference in
inference_slot.R
~line 270 the observations/fitting data is read in and all NAs are replaced with 0. This is not wanted behaviour (we want to maintain the NAs - to be dealt with later when we apply any aggregation to a certain time period we require, i.e. to a week).There is a downstream issue if ALL values are NAs for a given subpopulation-outcome combination (eg in Disparities round there is no Latino population in North Carolina and so all ground truth values are NAs). There is an error when calculating the likelihood then.
To Reproduce
Using ground truth data file from Disparities round
This gives a statistic
But this needs to be compared to a simulation that has values presumably for each date so the lengths of the variables (1) do not match in order to calculate the likelihood, and (2) if it was just all NAs to reflect the data, it cannot compute the likelihood.
Expected behavior
I am not entirely sure what behaviour we want here.
Currently the workflow is:
remove_na: TRUE
)re: 3 - there is an error in the logic here I think (?)
In the example above (code from
logLikStat
function in Rinference
package) Ifadd_one = TRUE
thenand there is an error in calculate
rc
If
add_one=FALSE
theneval
is a vector of 1's, and we end up withrc
being a vector of NAs, but this will give us a likelihood of NA anyway.I'm lost about what exactly should happen here instead. 🫠 Will think about this more but adding it here for the moment.
The text was updated successfully, but these errors were encountered: