-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mboost - Test Predict Function - stmboost #3
Comments
@avinashbarnwal Can you also post the error log? |
@hcho3, please find the error -
|
What is the content of |
These are internal calls. I am not sure about the internals. |
Try adding some printing statements in the lines in the error trace. |
It is throwing an error when we are calling -
|
IMO the example is still not minimal enough. does the error message happen on all data sets or just this one? |
example(stmboost) works for me, > example(stmboost)
|
this is more minimal
|
here is another MRE with simulated data. library(tram)
library(tbm)
N <- 100
set.seed(1)
x <- runif(N)
y <- 10+30*x+rnorm(N)
y.lower <- y-runif(N)
y.upper <- y+runif(N)
plot(y ~ x)
segments(x, y.lower, x, y.upper)
df <- data.frame(x, y.lower, y.upper)
f <- survival::Surv(y.lower, y.upper, type="interval2") ~ x
m_mlt <- Survreg(f, data = df, dist = "lognormal")
bm <- stmboost(m_mlt, formula = f, data = df,control = boost_control(mstop=200,nu=0.01,trace=TRUE),method = quote(mboost::mboost))
q = seq(from = min(df$y.lower), to = max(df$y.upper), length.out = 100)
sessionInfo()
d = predict(bm, newdata = df, type = "density", q = q) output:
|
Thanks @tdhock. |
Torsten's new code works for me @avinashbarnwal library(tram)
library(tbm)
N <- 100
set.seed(1)
x <- runif(N, min = -.5, max = .5)
### I made this a log-normal DGP
y <- exp(1+2*x+rnorm(N))
### smaller than 0 is not allowed
y.lower <- pmax(y-runif(N), 0.001)
y.upper <- y+runif(N)
plot(y ~ x)
segments(x, y.lower, x, y.upper)
df <- data.frame(x, y.lower, y.upper)
### saver: generate a new variable with a simple name
### because the deparsed call might be a trouble maker
df$y <- survival::Surv(y.lower, y.upper, type="interval2")
### the model is UNCONDITIONAL (no x here)
### (you can have conditional models, but this is more complex)
m_mlt <- Survreg(y ~ 1, data = df, dist = "lognormal")
### but boosting needs the x
f <- y ~ x
bm <- stmboost(m_mlt, formula = f, data = df,control = boost_control(mstop=200,nu=0.01,trace=TRUE),method = quote(mboost::mboost))
q = seq(from = min(df$y.lower), to = max(df$y.upper), length.out = 10)
sessionInfo()
### this is dim(grid) x dim(df), so obs are in columns !!!
dim(d <- predict(bm, newdata = df, type = "density", q = q)) Above is the contents of toby.R script from Torsten. Below is the output of running that in R:
|
Hi Prof. @tdhock, I am not sure why we have different y~1 and y~x. m_mlt <- Survreg(y ~ 1, data = df, dist = "lognormal") I generally keep the same formula for both the places.
I am feeling we should make software, where we can reproduce these things more clearly. Things are very hay-wire. |
Hi Prof. @tdhock, I have updated the minimal reproducible example code and I have changed.
to
Then it gave the same error as before.
Also for
d is null. Check this notebook - https://github.com/avinashbarnwal/aftXgboostPaper/blob/master/src/R/testing/testing_mboost/mboost_MRE1.ipynb I am not sure whats the difference between y~x and y~1 in Survreg. |
Hi Prof. @tdhock, Please let me know if you have got the time to look into this issue. |
I think the two formulas are for different purposes. In Torsten's email he indicated that y ~ 1 should be used in one case and not the other. So please do that in your code. |
Thanks for the update Prof. @tdhock. I will use it accordingly. But i am confused what the difference? Do you think we should ask Prof. Torsten and also predicted values from y~1 are NAs? |
I don't know what is the difference, you should ask Torsten for clarification. but I don't think it is relevant to what you are doing. I think you should just copy/modify the code he sent us, and that should be fine for baseline predictions. |
Yes, but even using y~1 predictions are nulls. |
I don't understand / can't replicate the nulls you describe. When I do library(tram)
library(tbm)
N <- 100
set.seed(1)
x <- runif(N, min = -.5, max = .5)
### I made this a log-normal DGP
y <- exp(1+2*x+rnorm(N))
### smaller than 0 is not allowed
y.lower <- pmax(y-runif(N), 0.001)
y.upper <- y+runif(N)
plot(y ~ x)
segments(x, y.lower, x, y.upper)
df <- data.frame(x, y.lower, y.upper)
### saver: generate a new variable with a simple name
### because the deparsed call might be a trouble maker
df$y <- survival::Surv(y.lower, y.upper, type="interval2")
m_mlt <- Survreg(
y ~ 1, #DO NOT CHANGE THIS.
data = df,
dist = "lognormal")
bm <- stmboost(
m_mlt,
formula = y ~ x, #CHANGE x to all input variables.
data = df,
control = boost_control(mstop=200,nu=0.01,trace=TRUE),
method = quote(mboost::mboost))
pred.mat <- predict(
bm,
newdata = df,
type = "density",
q = 0.5)#help(predict.mlt) q: quantiles at which to evaluate the model
pred.mat[, 1:5]
sessionInfo() I get the following result:
|
the important part with the real-valued predictions is
|
Hi Prof. @tdhock, I am still finding null values. Below is the code -
I am not sure how you are getting different results? |
Hi Prof. @tdhock , We are running on a different OS. I am using Mac and you are using Windows. Also, my version for the tram is latest - tram_0.3-1. You are using tram_0.3-0. Please let me know if you think these are the reasons for difference in results. |
Hi Prof. @tdhock, I have been able to reproduce the minimal reproducible example in Linux. This is AWS EC2 provided by @hcho3. |
Hi Prof. @tdhock and @hcho3,
Please find the minimal reproducible example to test the predict function of mboost -
The text was updated successfully, but these errors were encountered: