Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to pass separate training data sets in for xgboost model vs survival model #56

Open
crew102 opened this issue Jul 28, 2022 · 1 comment · May be fixed by #60
Open

Ability to pass separate training data sets in for xgboost model vs survival model #56

crew102 opened this issue Jul 28, 2022 · 1 comment · May be fixed by #60
Labels
enhancement New feature or request next minor release

Comments

@crew102
Copy link

crew102 commented Jul 28, 2022

I'm wondering if you've considered allowing the user to pass in a separate training sets for the xgboost model vs the survival model?

For example, in XGBSEStackedWeibull, the current state is this:

  1. Train xgboost on X_train, y_train
  2. Predict back on X_train using model from (1), resulting in risk scores
  3. Train Weibull AFT model with risk scores from (2) and y_train

I'm proposing this:

  1. Train xgboost on X_train, y_train
  2. Predict risk scores of X_train_2 using model from (1)
  3. Train Weibull AFT model using risk scores from (2) and y_train_2

The rationale for having different datasets used between the models is that it reduces the chance of overfitting. I've found that the risk scores that come out of step 2 are indicating a tighter relationship between risk score and y_train than there actually is, by nature of the fact that we are predicting back on the dataset that the xgboost model was trained on (and then re-relating things to the original outcome variable, y_train).

Thanks for the awesome package

@crew102 crew102 added the enhancement New feature or request label Jul 28, 2022
@davivieirab
Copy link
Contributor

Thanks for the suggestion, @crew102 .
We are currently working on a way to replace the 1st step xgboost model for a pre-trained one.
Both XGBSEDebiasedBCE and XGBSEStackedWeibull modules will be able to use this feature, which will cover your use case.

@davivieirab davivieirab linked a pull request Aug 8, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request next minor release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants