Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot produce Test(ll) results locally #5

Closed
mpearmain opened this issue Mar 29, 2016 · 19 comments
Closed

Cannot produce Test(ll) results locally #5

mpearmain opened this issue Mar 29, 2016 · 19 comments

Comments

@mpearmain
Copy link

Hi,

I've been testing pywFM package and my question involves understanding how the model.prediction links to the information that is produced in the output

My specific example: If i run libFM with a train and test dataset, i can see in the output test(ll) drops to 0.515385, if i take the predictions and run the test predictions against the test label i get logloss values of 8.134375875846, where i should get 0.515385

For clarity please see the thread i started on Kaggle which also enables you to download the data and reproduce the error.

Full example code: https://www.kaggle.com/c/bnp-paribas-cardif-claims-management/forums/t/19319/help-with-libfm/110652#post110652

@jfloff
Copy link
Owner

jfloff commented Mar 31, 2016

Sorry for the delay, I've been unable to check github these last couple of days.

Regarding model.prediction and the output: all the output that you see from pywFM is the one that you would see form using libfm. Regarding the variables that pywFM outputs, here is a rundown:

  • predictions: taken from the -out file option from libFM. I do some processing just to convert this file into an array
  • global_bias, weights, pairwise_interactions: these 3 are taken from the model file that libfm produces if you pass the -save_model flag (more info here). I do some processing here to split the 3 outputs (given by the same file) into 3 variables.
  • rlog: taken from the csv produced form libfm, and loaded as a pandas df

Does this answer your question?

Have you tried run libfm (without the wrapper) to see if the results differ from the ones with pywFM? Which date did you use to get the 8.13 value?

Thank you for the kind words on the Kaggle thread.

@mpearmain
Copy link
Author

Thanks for the reply

After writing, I did test using libFM on cli only and had the same problem.

Basically the -out isn't producing predictions that relate to the test(LL)
as I would expect.

I'm 100% sure this is a user error on i/o usage as I don't think libFM is
broken :)

I'll continue to investigate, FYI. I also opened a thread on this, in the
libFM Google group
On Thu, 31 Mar 2016 at 11:00, João Ferreira Loff [email protected]
wrote:

Sorry for the delay, I've been unable to check github these last couple of
days.

Regarding model.prediction and the output: all the output that you see
from pywFM is the one that you would see form using libfm. Regarding the
variables that pywFM outputs, here is a rundown:

  • predictions: taken from the -out file option from libFM. I do some
    processing just to convert this file into an array
  • global_bias, weights, pairwise_interactions: these 3 are taken from
    the model file that libfm produces if you pass the -save_model flag
    (more info here
    srendle/libfm@19db0d1).
    I do some processing here to split the 3 outputs (given by the same file)
    into 3 variables.
  • rlog: taken from the csv produced form libfm, and loaded as a pandas
    df

Does this answer your question?

Have you tried run libfm (without the wrapper) to see if the results
differ from the ones with pywFM? Which date did you use to get the 8.13
value?

Thank you for the kind words on the Kaggle thread.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#5 (comment)

@jfloff
Copy link
Owner

jfloff commented Mar 31, 2016

I saw that thread on the libFM user group, that's why I asked if you had compared with libFM alone (not with pywFM).

Which data are you using to produce the 8.13 value? Could you post the output from that run? You are saying that you used "test predictions against the test label". Shouldn't you be using train data against predictions?

Remember that each time you run libFM you are running a new model. There is a way to use the same model on a new prediction set, but I haven't done that. Is that what you are looking for?

@mpearmain
Copy link
Author

Like a fool, i didnt set the seed in the script so reproduction of the
results isnt easy (i need it in the train_test_split of the data).
However simply downloading the data and running the script will highlight
the problem (even if results are not the same).

For this case i am trying to produce the predictions that give rise to the
test(LL) while the model is training. In theory (unless i am very much
mistaken) model.prediction is (as you've stated) the same as the -out
flag from cli. Therefore as we passed both train and test to libFM the
output predictions should be on the test set that was supplied.

So in theory if i run a logloss on the predictions and the labels from the
test set i should have the same logloss value as produced in the printed
output. This is the crux of the problem, i dont get anything like a close
match. (running in standalone mode gives rise to the same problem, i.e its
not actually a pywFM issue)

On Thu, 31 Mar 2016 at 11:23 João Ferreira Loff [email protected]
wrote:

I saw that thread on the libFM user group, that's why I asked if you had
compared with libFM alone (not with pywFM).

Which data are you using to produce the 8.13 value? Could you post the
output from that run? You are saying that you used "test predictions
against the test label"
. Shouldn't you be using train data against
predictions?

Remember that each time you run libFM you are running a new model. There
is a way to use the same model on a new prediction set, but I haven't done
that. Is that what you are looking for?


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#5 (comment)

@jfloff
Copy link
Owner

jfloff commented Mar 31, 2016

But the test(LL) are specific for the train/test data you are working with in that moment. From my knowledge, logloss is just an error measure between the predictions and real values. Even if you train a model with 0.5 logloss error, if you then use the same train data but with a test data that might be skewed (say the train data you have is skewed towards false values, and the prediction values you have is skewed towards true values), then you might get a higher logloss value.

Does this help anything at all?

@mpearmain
Copy link
Author

You are correct in the way logloss works, you've actually encapsulated the
problem im trying to solve in your first sentence:

"the test(LL) are specific for the train/test data you are working with in
that moment."

This is exactly what i want to produce, the final output test(LL) from the
model that has been built and applied to the test set that was given. i.e
running the simple example yields:

#Iter= 0 Train=0.666074 Test=0.668503 Test(ll)=0.665911
#Iter= 1 Train=0.693502 Test=0.694918 Test(ll)=0.606683
#Iter= 2 Train=0.707983 Test=0.693256 Test(ll)=0.570645
#Iter= 3 Train=0.730493 Test=0.711274 Test(ll)=0.526459
#Iter= 4 Train=0.731135 Test=0.711274 Test(ll)=0.513271
#Iter= 5 Train=0.692782 Test=0.714686 Test(ll)=0.515833
#Iter= 6 Train=0.702832 Test=0.70419 Test(ll)=0.516339
#Iter= 7 Train=0.698818 Test=0.7097 Test(ll)=0.514831
#Iter= 8 Train=0.709859 Test=0.707076 Test(ll)=0.515032
#Iter= 9 Train=0.714223 Test=0.711624 Test(ll)=0.515385

using the model.prediction via pywFM or -out via cli with libFM should
provide me with the predicted probabilities (between 0 and 1, which it
does) that the libFM model that was just built has predicted for the test
set provided.

It is the final stage of using these probabilities and the label (y_test)
that doesnt return the same result (0.515385 in this case)

Does this make sense?

On Thu, 31 Mar 2016 at 12:03 João Ferreira Loff [email protected]
wrote:

But the test(LL) are specific for the train/test data you are working with
in that moment. From my knowledge, logloss is just an error measure between
the predictions and real values. Even if you train a model with 0.5 logloss
error, if you then use the same train data but with a test data that might
be skewed (say the train data you have is skewed towards false values, and
the prediction values you have is skewed towards true values), then you
might get a higher logloss value.

Does this help anything at all?


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#5 (comment)

@jfloff
Copy link
Owner

jfloff commented Mar 31, 2016

But are we talking about using the same data (train and test) giving different values? Did you change the train or the test?

@mpearmain
Copy link
Author

I've added random_state to get reproducible results for the data
(downloadable form kaggle) so you can see the issue.

    import pandas as pd
    import pywFM  # Using the python wrapper https://github.com/jfloff/pywFM
    from sklearn.metrics import log_loss
    from sklearn.cross_validation import train_test_split

    random_seed = 1234

    print('Load data...')
    train = pd.read_csv("./input/train.csv")
    target = train['target'].values
    train = train.drop(['ID', 'target'], axis=1)
    test = pd.read_csv("./input/test.csv")
    id_test = test['ID'].values
    test = test.drop(['ID'], axis=1)

    print('Clearing...')
    for (train_name, train_series), (test_name, test_series) in
zip(train.iteritems(), test.iteritems()):
        if train_series.dtype == 'O':
            # for objects: factorize
            train[train_name], tmp_indexer = pd.factorize(train[train_name])
            test[test_name] = tmp_indexer.get_indexer(test[test_name])
            # but now we have -1 values (NaN)
        else:
            # for int or float: fill NaN
            tmp_len = len(train[train_series.isnull()])
            if tmp_len > 0:
                # print "mean", train_series.mean()
                train.loc[train_series.isnull(), train_name] = -9999
                # and Test
            tmp_len = len(test[test_series.isnull()])
            if tmp_len > 0:
                test.loc[test_series.isnull(), test_name] = -9999

    xtrain, xtest, ytrain, ytest = train_test_split(train, target,
train_size=0.9, random_state=1234)

    clf = pywFM.FM(task='classification',
                   num_iter=10,
                   init_stdev=0.1,
                   k2=5,
                   learning_method='mcmc',
                   verbose=False,
                   silent=False)

    model = clf.run(x_train=xtrain, y_train=ytrain, x_test=xtest,
y_test=ytest)
    log_loss(ytest, model.predictions, eps=1e-15)

This should give the output:

Loading train...
has x = 0
has xt = 1
num_rows=102888 num_values=12582078 num_features=131 min_target=0
max_target=1
Loading test...
has x = 0
has xt = 1
num_rows=11433 num_values=1397626 num_features=131 min_target=0 max_target=1
#relations: 0
Loading meta data...
logging to /var/folders/44/q92fcr8n26gc377b_x_4g85m0000gp/T/tmp_jMdqT
#Iter= 0 Train=0.750719 Test=0.755795 Test(ll)=0.49171
#Iter= 1 Train=0.749942 Test=0.75597 Test(ll)=0.491415
#Iter= 2 Train=0.731115 Test=0.75492 Test(ll)=0.484581
#Iter= 3 Train=0.745646 Test=0.75597 Test(ll)=0.4842
#Iter= 4 Train=0.726314 Test=0.750634 Test(ll)=0.477684
#Iter= 5 Train=0.717489 Test=0.750809 Test(ll)=0.474122
#Iter= 6 Train=0.70846 Test=0.743112 Test(ll)=0.469061
#Iter= 7 Train=0.716293 Test=0.745561 Test(ll)=0.464739
#Iter= 8 Train=0.706069 Test=0.738476 Test(ll)=0.46401
#Iter= 9 Train=0.728258 Test=0.740488 Test(ll)=0.464135
Writing FM model...

Out[59]:logloss = 7.3850759798108481

So in this reproducible example 0.46 != 7.385

On Thu, 31 Mar 2016 at 13:39 João Ferreira Loff [email protected]
wrote:

But are we talking about using the same data (train and test) giving
different values? Did you change the train or the test?


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#5 (comment)

@mpearmain
Copy link
Author

I guess the key part here is why the line
log_loss(ytest, model.predictions, eps=1e-15)
doesnt equal the final Test(ll)

@jfloff
Copy link
Owner

jfloff commented Mar 31, 2016

I haven't used sklearn's log_loss, but from documentation, it appears that the predictions needs to be in a specific format:

y_pred : array-like of float, shape = (n_samples, n_classes)
              Predicted probabilities, as returned by a classifier’s predict_proba method.
(...)
>>> log_loss(["spam", "ham", "ham", "spam"], [[.1, .9], [.9, .1], [.8, .2], [.35, .65]])
0.21616...

@mpearmain
Copy link
Author

Yes, thats correct.

If we pretend there is only one class we can get the same result, akin to the logloss im using in the example above.

`In[66]: log_loss(["spam", "ham", "ham", "spam"], [[ .9], [.1], [.2], [.65]])

Out[66]: 0.21616187468057912`

@jfloff
Copy link
Owner

jfloff commented Mar 31, 2016

I guess that this also yields the same result: log_loss(["spam", "ham", "ham", "spam"], [.9, .1, .2, .65]) ?

I guess its either how libfm is doing logloss or sklearn? Have you tried computing logloss manually (actually implementing yourself the function, or doing in pen&paper) to compare results? I.e. what actually is the correct result from the two?

@mpearmain
Copy link
Author

Yes,
It'd fairly trivial to implement.

import scipy as sp
def logloss(act, pred):
    epsilon = 1e-15
    pred = sp.maximum(epsilon, pred)
    pred = sp.minimum(1-epsilon, pred)
    ll = sum(act*sp.log(pred) + sp.subtract(1,act)*sp.log(sp.subtract(1,pred)))
    ll = ll * -1.0/len(act)
    return ll

OK lets close this issue, as it's clearly something with libFM that im not doing correctly to get the same result.

Thanks very much for you time looking at this

@jfloff
Copy link
Owner

jfloff commented Mar 31, 2016

Feel free to chat if you want someone to discuss that issue with :)

@erlendd
Copy link

erlendd commented Sep 23, 2016

I'm experiencing the same problem: log_loss reported by libFM is incorrect.

@jfloff
Copy link
Owner

jfloff commented Sep 26, 2016

But it's a libFM problem and not from pywFM correct? Are you getting the same output from libFM (without python wrapper)?

@erlendd
Copy link

erlendd commented Sep 26, 2016

Yes I wrote about it on the libfm github. They're using log10 not log to compute the loss.

@jfloff
Copy link
Owner

jfloff commented Sep 26, 2016

Ok great! Could you link that issue here for future reference?

@erlendd
Copy link

erlendd commented Sep 26, 2016

srendle/libfm#21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants