Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Scikit-Survival models that is compatible with Sklearn? #174

Open
WeijiaZhang24 opened this issue Jun 6, 2022 · 4 comments
Open

Comments

@WeijiaZhang24
Copy link

Is is possible to export models trained using Scikit-Survival (sksurv)?
This is the repos for sksurv: https://github.com/sebp/scikit-survival

sksurv contains a RandomSurvivalForest algorithm which extend RandomForest to right-censored survival data.
In standard RandomForest, the regression target y is a number, but in survival data , the labels are in the form of [time, event_indicator]. If event_indicator==1, then time is the same as y (event is observed); however, when event_indicator == 0, we only know taht y>time (event is not observed up to the observed time).

Any help would be appreciated!

@vruusmann
Copy link
Member

Is is possible to export models trained using Scikit-Survival (sksurv)?

I'm going to explore your earlier XGBoost example a bit in order to gain a better understanding about the state-of-the-art in survival analysis.

The fundamental problem here is that "survival" appears to be a different endpoint than "regression".

The PMML specification does not provide a dedicated "survival" mining function type: https://dmg.org/pmml/v4-4-1/GeneralStructure.html#xsdType_MINING-FUNCTION

The obvious fix would be to define a new mining function type ourselves. I guess it's safe to say today that it's not reasonable to count on Data Mining Group's help here, because they're largely non-operational (still waiting to receive an initial feedback on some feature requests that I posted to them 1+ year ago).

sksurv contains a RandomSurvivalForest algorithm which extend RandomForest to right-censored survival data.

The JPMML-SkLearn library already provides a PMML converter for the RandomForest class.

RandomSurvivalForest and RandomForest should use identical tree ensemble data structures. Therefore, it would be build a PMML converter for RandomSurvivalForest by simply applying some post-processing to RandomForest prediction.

@vruusmann vruusmann transferred this issue from jpmml/sklearn2pmml Jun 6, 2022
@vruusmann
Copy link
Member

The PMML specification currently defines a "survival" endpoint for linear models (jump to the "Cox Regression Model Explanation and Examples" section):
https://dmg.org/pmml/v4-4-1/GeneralRegression.html

This approach should be generalizable to other model types (eg. decision tree ensembles).

@WeijiaZhang24
Copy link
Author

WeijiaZhang24 commented Jun 6, 2022

I found that in an older version of the R package "pmml", it can export the Random Survival Forest consctructed by an old version of "randomForesSRC" package. I'm not sure why the later versions of pmml R package removed this function.

I can help with Python, R codes related to survival analysis, but I'm not familiar with PMML format...
Here're the working R codes to replicate this older version transformer. (The document for pmml 1.5.4 can be found at https://mran.microsoft.com/snapshot/2018-02-12/web/packages/pmml/pmml.pdf"

install.packages("remotes")
library("remotes")
install_version("randomForestSRC", "2.5.0")
install_version("pmml", "1.5.4")
library(pmml)
library(randomForestSRC)

data(veteran)
veteran.out <- rfsrc(Surv(time, status)~., data = veteran, ntree = 5, forest = TRUE, membership = TRUE)
pmml.rfsrc(veteran.out)

@vruusmann
Copy link
Member

I found that in an older version of the R package "pmml", it can export the Random Survival Forest consctructed by an old version of "randomForesSRC" package.

This converter was using some proprietary super-hackish way of encoding the "survival" transformation.

Basically, it was a tool for enriching the standard randomForest object with some extra information. Just for information purposes - these models could not be evaluated by other PMML engines.

I'm just saying that it might be worthwhile to take some time and design a proper and future-proof extension to the latest PMML standard.

When speaking about RandomSurvivalForest, then I believe that the JPMML software stack can already do 90% of what is required (pre-processing, decision tree ensemble data structure). Just need to design the missing 10% part, which takes care about post-processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants