-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot produce Test(ll) results locally #5
Comments
Sorry for the delay, I've been unable to check github these last couple of days. Regarding
Does this answer your question? Have you tried run Thank you for the kind words on the Kaggle thread. |
Thanks for the reply After writing, I did test using libFM on cli only and had the same problem. Basically the -out isn't producing predictions that relate to the test(LL) I'm 100% sure this is a user error on i/o usage as I don't think libFM is I'll continue to investigate, FYI. I also opened a thread on this, in the
|
I saw that thread on the Which data are you using to produce the 8.13 value? Could you post the output from that run? You are saying that you used "test predictions against the test label". Shouldn't you be using train data against predictions? Remember that each time you run |
Like a fool, i didnt set the seed in the script so reproduction of the For this case i am trying to produce the predictions that give rise to the So in theory if i run a logloss on the predictions and the labels from the On Thu, 31 Mar 2016 at 11:23 João Ferreira Loff [email protected]
|
But the test(LL) are specific for the train/test data you are working with in that moment. From my knowledge, logloss is just an error measure between the predictions and real values. Even if you train a model with 0.5 logloss error, if you then use the same train data but with a test data that might be skewed (say the train data you have is skewed towards false values, and the prediction values you have is skewed towards true values), then you might get a higher logloss value. Does this help anything at all? |
You are correct in the way logloss works, you've actually encapsulated the "the test(LL) are specific for the train/test data you are working with in This is exactly what i want to produce, the final output test(LL) from the #Iter= 0 Train=0.666074 Test=0.668503 Test(ll)=0.665911 using the It is the final stage of using these probabilities and the label (y_test) Does this make sense? On Thu, 31 Mar 2016 at 12:03 João Ferreira Loff [email protected]
|
But are we talking about using the same data (train and test) giving different values? Did you change the train or the test? |
I've added random_state to get reproducible results for the data
This should give the output:Loading train... Out[59]:logloss = 7.3850759798108481 So in this reproducible example 0.46 != 7.385 On Thu, 31 Mar 2016 at 13:39 João Ferreira Loff [email protected]
|
I guess the key part here is why the line |
I haven't used sklearn's
|
Yes, thats correct. If we pretend there is only one class we can get the same result, akin to the logloss im using in the example above. `In[66]: log_loss(["spam", "ham", "ham", "spam"], [[ .9], [.1], [.2], [.65]]) Out[66]: 0.21616187468057912` |
I guess that this also yields the same result: I guess its either how libfm is doing logloss or sklearn? Have you tried computing logloss manually (actually implementing yourself the function, or doing in pen&paper) to compare results? I.e. what actually is the correct result from the two? |
Yes, import scipy as sp
def logloss(act, pred):
epsilon = 1e-15
pred = sp.maximum(epsilon, pred)
pred = sp.minimum(1-epsilon, pred)
ll = sum(act*sp.log(pred) + sp.subtract(1,act)*sp.log(sp.subtract(1,pred)))
ll = ll * -1.0/len(act)
return ll OK lets close this issue, as it's clearly something with libFM that im not doing correctly to get the same result. Thanks very much for you time looking at this |
Feel free to chat if you want someone to discuss that issue with :) |
I'm experiencing the same problem: log_loss reported by libFM is incorrect. |
But it's a libFM problem and not from pywFM correct? Are you getting the same output from libFM (without python wrapper)? |
Yes I wrote about it on the libfm github. They're using log10 not log to compute the loss. |
Ok great! Could you link that issue here for future reference? |
Hi,
I've been testing pywFM package and my question involves understanding how the model.prediction links to the information that is produced in the output
My specific example: If i run libFM with a train and test dataset, i can see in the output test(ll) drops to 0.515385, if i take the predictions and run the test predictions against the test label i get logloss values of 8.134375875846, where i should get 0.515385
For clarity please see the thread i started on Kaggle which also enables you to download the data and reproduce the error.
Full example code: https://www.kaggle.com/c/bnp-paribas-cardif-claims-management/forums/t/19319/help-with-libfm/110652#post110652
The text was updated successfully, but these errors were encountered: