You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The software package is dealing currently with tabular data only. However, there is one important aspect that has not been dealt with categorical variables.
To improve this:
We need to add detection of categorical variables in the features, covariate and factors file.
Apply correct handling of theses variables. A commonly used strategy is conversion to one-hot encoding.
In terms of age modelling we should ensure that these are appropriately treated in the scaler.
Another aspect of data handling is data imputation. Currently, any subject with missing data in any of the files submitted is discarded. However, some basic imputation strategies could be implemented.
The text was updated successfully, but these errors were encountered:
We should also allow when naming multiple systems that when we have missing data for one subject for a system but not for another system we should only remove the subject when calculating the age model of that specific system.
We have also found a new bug/problem. If you upload a .csv with an index that is not numeric an error is thrown. We should test and fix so that files that have a first column named subject with values sub001, sub002, sub003, ... work. Otherwise we should specify that files should have a column called ID (this will avoid less problems and in loading .csv ID column should be made the index). However, we should still ensure that the indices can be random numbers or alphanumeric values.
When looking at at clinical factors we should not be removing all the subjects that have NaN in a factor. This is because in many studies some subjects have some tests and others others. We are therefore reducing drastically the number of subjects. I would go for an approach where we report the number of subjects used in each factor but keep as many as possible. Imputation here would not be a good strategy.
The software package is dealing currently with tabular data only. However, there is one important aspect that has not been dealt with categorical variables.
To improve this:
Another aspect of data handling is data imputation. Currently, any subject with missing data in any of the files submitted is discarded. However, some basic imputation strategies could be implemented.
The text was updated successfully, but these errors were encountered: