Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Lda.php #46

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

hamzarabbani
Copy link

There's an error on line 243. Which is resolved with try catch.

@angeloskath
Copy link
Owner

Hi,

Thanks for the pull request! Could you perhaps provide a test that fails without the change? If you don't feel like providing one I will look into it more when I have next week.

Thanks again, it may take some time but I will probably merge it.

Angelos

@hamzarabbani
Copy link
Author

hamzarabbani commented May 13, 2017

Simple example to generate error:

use NlpTools\FeatureFactories\DataAsFeatures;
use NlpTools\Tokenizers\WhitespaceTokenizer;
use NlpTools\Documents\TokensDocument;
use NlpTools\Documents\TrainingSet;
use NlpTools\Models\Lda;

if (file_exists($argv[1])
    $docs = file($argv[1]);
else
    die(1);
$tok = new WhitespaceTokenizer();
$tset = new TrainingSet();
foreach ($docs as $f) {
    $tset->addDocument(
        '', // the class is not used by the lda model
        new TokensDocument(
            $tok->tokenize(
                file_get_contents($f)
            )
        )
    );
}
$lda = new Lda(
    new DataAsFeatures(), // a feature factory to transform the document data
    5, // the number of topics we want
    1, // the dirichlet prior assumed for the per document topic distribution
    1  // the dirichlet prior assumed for the per word topic distribution
);
// run the sampler 50 times
$lda->train($tset,50);
print_r(
    $lda->getDocumentsPerTopicsProbabilities(10)
);

It will generate error. Because at line 243 in Lda.php it is written.

$count_topics_docs = array();
         foreach ($this->count_docs_topics as $doc=>$topics) {
             foreach ($topics as $t=>$c)
                $count_topics_docs[$doc][$t]++;
         }

$count_topics_docs[$doc][$t] has not been assigned any value at this point so we just cannot add ++ to it.
I solved this problem by using try catch

$count_topics_docs = array();
         foreach ($this->count_docs_topics as $doc=>$topics) {
             foreach ($topics as $t=>$c){
                 try {
                     $count_topics_docs[$doc][$t]++;
                 } catch (\Exception $ex){
                     $count_topics_docs[$doc][$t] = 1;
                 }
             }

There's another issue as well. On line 250 & 252 it is using variable named $limit_words. Which is not defined. It should have been $limit_docs (as this variable is being passed into parameter).

if ($limit_words>0) {
 arsort($p);
 $p = array_slice($p,0,$limit_words,true); // true to preserve the keys
}

to

if ($limit_docs > 0) {
 arsort($p);
 $p = array_slice($p,0,$limit_docs,true); // true to preserve the keys
}

@angeloskath
Copy link
Owner

So, do you care about getting the merge or should I just go ahead and fix it?

I am thinking the following if you care to do it yourself for some reason

  • if (!isset(...)) {} would be more PHP-like
  • Let's fix the $limit_docs as well
  • Your example should be made into a test case

Tell me if you want to do this and thanks again for the issue and pull request!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants