Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending 'lattice-compose.cc' to compose with ark of fsts, #4692

Merged
merged 4 commits into from
Jan 31, 2022

Conversation

KarelVesely84
Copy link
Contributor

  I am thinking this can be done with a string arg called e.g.
  "--compose-with-fst", defaulting to "auto" which is the old behavior,
  meaning: rspecifier=lats, rxfilename=FST; and true/True or false/False
  is FST or lattice respectively.
  • I added there possibility of rho-composition, which is useful for
    biasing lattices with word-sequences. Thanks to rho-composition,
    the biasing graph does not need to contain all words from lexicon.

  • Would you be interested in an example how to use this?
    (i.e. create graphs from text file with python script
    using openfst as library, but that would need to change
    build of openfst to enable python extensions)

  • Also which 'egs' recipe would be convenient to use it with?

- This is a follow-up of kaldi-asr#4571

- Refactoring 'lattice-compose.cc' to support composition with ark
  of fsts, so that it is done as Dan suggested before:

  I am thinking this can be done with a string arg called e.g.
  "--compose-with-fst", defaulting to "auto" which is the old behavior,
  meaning: rspecifier=lats, rxfilename=FST; and true/True or false/False
  is FST or lattice respectively.

- I added there possibility of rho-composition, which is useful for
  biasing lattices with word-sequences. Thanks to rho-composition,
  the biasing graph does not need to contain all words from lexicon.

- Would you be interested in an example how to use this?
  (i.e. create graphs from text file with python script
   using openfst as library, but that would need to change
   build of openfst to enable python extensions)

- Also which 'egs' recipe would be convenient to use it with?
@@ -121,7 +164,23 @@ int main(int argc, char *argv[]) {
}
}
delete fst2;
} else {
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Would you mind formatting the else in a more standard way that won't upset auto-formatters?
E.g.

} else {
  /* comment here */
}

.. I see why you did what you did, but let's be consistent with other code.
Are you confident that this won't break existing code?

@@ -39,22 +39,34 @@ int main(int argc, char *argv[]) {
"or lattices with FSTs (rspecifiers are assumed to be lattices, and\n"
"rxfilenames are assumed to be FSTs, which have their weights interpreted\n"
"as \"graph weights\" when converted into the Lattice format.\n"
"Or, rspecifier can be ark of biasing FSTs, see --compose-with-fst=true.\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole message is not very clear, especially with the modification. Can write:

Depending on the command-line arguments, either composes lattices with lattices,
or lattices with a single FST or multiple FSTs (whose weights are interpreted as "graph weights").

Then change the lines below to:

Usage: lattice-compose [options] <lattice-rspecifier1> "
        "<lattice-rspecifier2|fst-rxfilename2|fst-rspecifier2> <lattice-wspecifier>\n"
 If the 2nd arg is an rspecifier, it is interpreted by default as a table of lattices,
 or as a table of FSTs if you specify --compose-with-fst=true.

{ // convert 'compose_with_fst' to lowercase to support: true, True, TRUE
std::string tmp_lc(compose_with_fst);
std::transform(compose_with_fst.begin(), compose_with_fst.end(),
tmp_lc.begin(), ::tolower); // lc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tmp_lc.begin(), ::tolower); // lc
compose_with_fst.begin(), std::tolower); // lc

It can be an in-place operation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a strange behavior, the std::tolower version produces error:

/usr/local/include/c++/9.4.0/bits/stl_algo.h:4369:5: note:   template argument deduction/substitution failed:
lattice-compose.cc:87:71: note:   candidate expects 5 arguments, 4 provided
   87 |       std::transform(str.begin(), str.end(), str.begin(), std::tolower);  // lc

i found that i can go-around it by wrapping it into lambda:
[](unsigned char c){ return std::tolower(c);}
but that looks quite complicated compared to ::tolower which compiles well,
and i saw it used in: util/parse-options.cc and nnet/nnet-component.cc ...

* arg2 is rspecifier that contains a table of lattices
* - composing arg1 lattices with arg2 lattices
*/
else if (not arg2_is_rxfilename &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
else if (not arg2_is_rxfilename &&
else if (!arg2_is_rxfilename &&

@jtrmal
Copy link
Contributor

jtrmal commented Jan 27, 2022 via email

@KarelVesely84
Copy link
Contributor Author

KarelVesely84 commented Jan 27, 2022

it seems like the symbol std::tolower is overloaded, there is unary variant we want:
int tolower( int ch ) in <cctype>
the other is two-arg:
charT tolower( charT ch, const locale& loc ) from <locale>
so, the std::transform does not know how to resolve this... (if we wanted unary or two-arg functor)

@KarelVesely84
Copy link
Contributor Author

the messages are:

/usr/local/include/c++/9.4.0/bits/stl_algo.h:4332:5: note:   template argument deduction/substitution failed:
lattice-compose.cc:84:71: note:   couldn’t deduce template parameter ‘_UnaryOperation’
   84 |       std::transform(str.begin(), str.end(), str.begin(), std::tolower);  // lc

and

/usr/local/include/c++/9.4.0/bits/stl_algo.h:4369:5: note:   template argument deduction/substitution failed:
lattice-compose.cc:84:71: note:   candidate expects 5 arguments, 4 provided
   84 |       std::transform(str.begin(), str.end(), str.begin(), std::tolower);  // lc

@KarelVesely84
Copy link
Contributor Author

okay, i'll put there the lambda, i did not find any better solution...
(like wrapping std::tolower to resolve the overloading...)

@KarelVesely84
Copy link
Contributor Author

interesting, this one works too:
std::transform(str.begin(), str.end(), str.begin(), (int(*)(int))std::tolower); // lc

@KarelVesely84
Copy link
Contributor Author

KarelVesely84 commented Jan 27, 2022

For the testing, i did test the variant: ark:lattices - ark:fsts, RhoComposition, and it works well.

For the other combinations ark:lattices - ark:lattices, ark:lattices - fst and did not test it,
but i did not change their code that was already there (just RhoCompose was added in the if-then-else).
And there are 3 composition types: Compose, PhiCompose and RhoCompose.

So, out of these 9 combinations, should i test everything?
Or, just some of them?

@KarelVesely84
Copy link
Contributor Author

KarelVesely84 commented Jan 27, 2022

also, normally, in lattices there is normally no phi or rho symbol, but the PR code allows it...
(could there be possibly someone introducing phi/rho into a lattice manually ?)

@KarelVesely84
Copy link
Contributor Author

I did some of the tests. I did the variants:

  • lattice o fsts (rho composition, phi composition)
  • lattice o 1fst (rho composition)
  • lattices o lattices

In all these cases it produces lattice ark of about the same size as the input.
Plus for "lattice o fsts (rho composition)" it produced 1% WER improvement
on my air-traffic communication task.

Otherwise, all the previous remarks to the code sholud be resolved now.
K.

po.Register("write-compact", &write_compact, "If true, write in normal (compact) form.");
po.Register("phi-label", &phi_label, "If >0, the label on backoff arcs of the LM");
po.Register("rho-label", &rho_label,
"If >0, the label to forward fst1 paths not present in biasing graph fst2 "
"(rho is input and output symbol on special arc in biasing graph)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps could specify that rho is like phi but the label is rewritten to the specific symbol.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, done

@danpovey
Copy link
Contributor

OK, I'm going to merge this so that we have some forward movement.
It might be more ideal to have an example script, but we can work on that later.

@danpovey danpovey merged commit fe230a0 into kaldi-asr:master Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants