You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm wondering if you've considered allowing the user to pass in a separate training sets for the xgboost model vs the survival model?
For example, in XGBSEStackedWeibull, the current state is this:
Train xgboost on X_train, y_train
Predict back on X_train using model from (1), resulting in risk scores
Train Weibull AFT model with risk scores from (2) and y_train
I'm proposing this:
Train xgboost on X_train, y_train
Predict risk scores of X_train_2 using model from (1)
Train Weibull AFT model using risk scores from (2) and y_train_2
The rationale for having different datasets used between the models is that it reduces the chance of overfitting. I've found that the risk scores that come out of step 2 are indicating a tighter relationship between risk score and y_train than there actually is, by nature of the fact that we are predicting back on the dataset that the xgboost model was trained on (and then re-relating things to the original outcome variable, y_train).
Thanks for the awesome package
The text was updated successfully, but these errors were encountered:
Thanks for the suggestion, @crew102 .
We are currently working on a way to replace the 1st step xgboost model for a pre-trained one.
Both XGBSEDebiasedBCE and XGBSEStackedWeibull modules will be able to use this feature, which will cover your use case.
I'm wondering if you've considered allowing the user to pass in a separate training sets for the xgboost model vs the survival model?
For example, in XGBSEStackedWeibull, the current state is this:
I'm proposing this:
The rationale for having different datasets used between the models is that it reduces the chance of overfitting. I've found that the risk scores that come out of step 2 are indicating a tighter relationship between risk score and y_train than there actually is, by nature of the fact that we are predicting back on the dataset that the xgboost model was trained on (and then re-relating things to the original outcome variable, y_train).
Thanks for the awesome package
The text was updated successfully, but these errors were encountered: