-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On the fly contextual adaptation, #4571
base: master
Are you sure you want to change the base?
Conversation
## Set up features. | ||
if [ -f $srcdir/online_cmvn ]; then online_cmvn=true | ||
else online_cmvn=false; fi | ||
|
||
if ! $online_cmvn; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
online_cmvn
is used only once.
## Set up features. | |
if [ -f $srcdir/online_cmvn ]; then online_cmvn=true | |
else online_cmvn=false; fi | |
if ! $online_cmvn; then | |
## Set up features. | |
if [[ ! -f $srcdir/online_cmvn ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i prefer the script not to deviate too much from steps/nnet3/decode.sh
, people might be willing to diff against it.
but, yes it could be shorter and the original idea was that '$online_cmvn' might be used in more than one place...
This is a super interesting observation, thanks @vesis84!
Very related to #4564. While we're trying to squeeze 5% out of it... |
Hm....This is interesting. IIRC, the "ConstFst" is actually faster to read ,write and access than "VectorFast" as it has less fragmentation, so the "ReadFstKaldiGeneric" related functions are added and cast the "VectorFst" HCLG to "ConstFst" one when the graph is built. For the specific iterators are faster than generic iterators, I remembered that's why the codes as follows are added
|
@LvHang, AFAICR, Also, it's important to do the comparison at the optimization turned to 11: A recent compiler, like GCC 10.2 or Clang 11 or 12, |
fst::ILabelCompare<StdArc> ilabel_comp; | ||
ArcSort(fst2, ilabel_comp); | ||
} | ||
/* // THIS MAKES ALL STATES FINAL STATES! WHY? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fishy line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the ρ-matcher treats ε specially? I vaguely remember it doesn't. If this is so, and it falls under the "everything else" category, I can guess why it might happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@galv What is wrong with that ? Do you react on the present ArcSort
or the disabled PropagateFinal
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
He may have been referring to the comment
// THIS MAKES ALL STATES FINAL STATES! WHY?
...
It is only relevant if the --phi-label option was specified (0 is not allowed for phi label),
in which case it will use backoff semantics.
I think it makes sense to keep that commented-out code since the phi stuff is supported.
It is intended that all states in G.fst should be final, because EOS is allowed from all states.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. PropagateFinal() fixes the semantics of EOS, because it is done as a final-prob, not an arc, so the phi-matcher does not naturally handle it. (In k2 this kind of problem could not happen because final-probs are done as a special transition to the superfinal-state, so we wouldn't need special-purpose code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @vesis84 the main concern is that you are trying to commit code with a commented out block that you apparently don't understand, from a software engineering point of view.
Also, I thought it was odd that you say that PropagateFinal makes all states final, but fst2 is expected to be a single-state FST based on what you wrote. In this case, I would expect you to set the only state in the FST to be final anyway. Unfortunately, I don't know how phi-symbol matching works, though, so I can't really say more about what your specific problem may be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In FST composition with ρ-matcher, when there are the ε symbols the output of the composition should be the same regardless if ε is removed or not. On the other hand, for me the runtime of composition was faster, when i removed ε from the boosting graph B in the HCLG o B composition.
It might be that there are many ε in HCLG output symbols, and it is better
to remove ε from B so that only one of the composition arguments has ε.
timer_compose.Reset(); | ||
|
||
// RmEpsilon saved 30% of composition runtime... | ||
// - Note: we are loading 2-state graphs with eps back-link to the initial state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that these are one state graphs, with a bunch of self loops on that single state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another reason why I would want to see a recipe to create an appropriating boost fst.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, it could be 1 state, and i made it as 2 state with <eps>
link to state 1, so that the figures look better.
semantically, it is identical...
Is it a good idea to merge this PR as is without an example WFST that this is known to work with? Think about how there was an example recipe with the lookahead decoding recipe PR. @LvHang @kkm000 I suspect something much, much simpler is going on. FST Composition creates a VectorFst as output, and that VectorFst is used only once (to decode a single utterance). Why convert to ConstFst if you're only going to run the Fst just once? Keep in mind that decoding with FSTs typically reads <10% of the FST states, so the overhead of conversion (with touches 100% of states and 100% of arcs) is not worth it here. |
Hi, thank you for the feedback. I adopted some of the suggestions. Some other suggestion are pointing into the 'old' code that was taken, copied and modified. Other thing is that, sometimes the code seems to be 'over-explicit', I'll process your inputs and re-open later. |
// latbin/lattice-compose-fsts.cc | ||
|
||
// Copyright 2020 Brno University of Technology; Microsoft Corporation | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this file actually intended to be added? It seess to be the same as lattice-compose, with some relatively small changes, and essentially the same interface; I don't understand why it is a new program instead of a modification to the old one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, they are similar, the original lattice-compose
does compose 'N lattices' with 'N lattices', or 'N lattices' with '1 FST'...
yes, it could be told externally that there are rxpecifier has 'N lattices' by default or 'N FSTs' with some boolean option like '--read-fsts-from-rxspec', but that would be lead to a more complicated interface, which goes against the principle of 'copy and modify' to keep things clear that we were using in the past...
and yes i'll do the change, if it is needed
"The FST weights are interpreted as \"graph weights\" when converted into the Lattice format.\n" | ||
"\n" | ||
"Usage: lattice-compose-fsts [options] lattice-rspecifier1 " | ||
"(fst-rspecifier2|fst-rxfilename2) lattice-wspecifier\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking this can be done with a string arg called e.g. "--compose-with-fst", defaulting to "auto" which is the old behavior, meaning: rspecifier=lats, rxfilename=FST; and true/True or false/False is FST or lattice respectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to cosult about this PropagateFinal(phi_label, fst2)
.
Is it needed there ?
I did not want its behavior that all the states in the FST graph were becoming final states
(2nd arg of FST composition) and the contextual adaptation was not working well due to that...
It was maybe helpful when composing lattices with lattices,
to make them better 'match' and avoid empty outputs?
Could that be also disabled by an CLI option (--propagate-final=false) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather modify the existing program if it can be done while clearly not affecting existing usages, just to avoid bloat.
The final propagation is needed to have expected behavior when the FST represents a backoff n-gram language model, because if you don't have that, the proper semantics of backing off to lower order states doesn't happen when it comes to final-probs. I'm not sure what your topology is.. perhaps you can describe it?
Readability = good. I have not seen anyone annoyed because he had too little trouble reading code :) I've been annoyed by unreadable code myself. Unless you are referring to the comment about a stack object destroyed at end of scope, which is really excessive, I'd say. One needs to know basic C++ to read code.
I think that we should do what we say we do and follow the coding style, even though we have old code which doesn't, and fix the old code when we touch it. Or stop saying that we have the standard. Being of two minds in this respect is the worst. One's logical order of includes is another's illogical order. Alphabetical order is unambiguous. Imagine I edit something unrelated in this file and also casually rearrange includes because I think my order is more logical. You then rearrange them again. Last thing I want to see is the battle of commits. Unless you have any idea how to unambiguously standardize the "logical order," it's better to stick tight to written guidelines. One day I'll unleash clang-format on the codebase, and it will ultimately squish all style arguments. Which is a good thing, as I want to spend the first day of the rest of my life kaldying, not arguing about our coding style. :) |
that ClassifierRspecifier() call has no side effects, at least if last 2
args are null, so that should be OK.
…On Wed, Jun 23, 2021 at 3:04 AM kkm000 ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/nnet3bin/nnet3-latgen-faster-compose.cc
<#4571 (comment)>:
> + if (word_syms_filename != "") {
+ word_syms.reset(fst::SymbolTable::ReadText(word_syms_filename));
+ if (!word_syms)
+ KALDI_ERR << "Could not read symbol table from file "
+ << word_syms_filename;
+ }
+
+ double tot_like = 0.0;
+ kaldi::int64 frame_count = 0;
+ int num_success = 0, num_fail = 0;
+ // this compiler object allows caching of computations across
+ // different utterances.
+ CachingOptimizingCompiler compiler(am_nnet.GetNnet(),
+ decodable_opts.optimize_config);
+
+ KALDI_ASSERT(ClassifyRspecifier(hclg_fst_rxfilename, NULL, NULL) == kNoRspecifier);
Oh. This is a bug. Don't put any code inside the ASSERT. It compiles to
nothing if NDEBUG is defined.
https://github.com/kaldi-asr/kaldi/blob/9d235864c3105c3b72feb9f19a219dbae08b3a41/src/base/kaldi-error.h#L184-L194
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#4571 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO3IHP2XVO43JZTANVTTUDNCLANCNFSM4653Y6WA>
.
|
Okay, i give up for the moment. And Dan's comment about extending One more thing, when i was going through Google papers, i noticed they are using sigma-matcher for contextual adaptation. I am not sure yet, how difficult would it be to add support of sigma-composition to kaldi for composing |
@vesis84 I am quite interested in this feature, and appreciate you making the PR. Regardless of whether it gets merged right now, I would be curious to see an full example including the "adaptation graph" FST. Even seeing a pre-prepared FST would be enlightening, if you don't want to release the code to generate them. |
This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open. |
- This is a follow-up of kaldi-asr#4571 - Refactoring 'lattice-compose.cc' to support composition with ark of fsts, so that it is done as Dan suggested before: I am thinking this can be done with a string arg called e.g. "--compose-with-fst", defaulting to "auto" which is the old behavior, meaning: rspecifier=lats, rxfilename=FST; and true/True or false/False is FST or lattice respectively. - I added there possibility of rho-composition, which is useful for biasing lattices with word-sequences. Thanks to rho-composition, the biasing graph does not need to contain all words from lexicon. - Would you be interested in an example how to use this? (i.e. create graphs from text file with python script using openfst as library, but that would need to change build of openfst to enable python extensions) - Also which 'egs' recipe would be convenient to use it with?
I'm going to see if I can find someone to address the issues in this PR. I think it is an important topic that we should pursue. |
… support RhoMatcher (#4692) * Extending 'lattice-compose.cc' to compose with ark of fsts, - This is a follow-up of #4571 - Refactoring 'lattice-compose.cc' to support composition with ark of fsts, so that it is done as Dan suggested before: I am thinking this can be done with a string arg called e.g. "--compose-with-fst", defaulting to "auto" which is the old behavior, meaning: rspecifier=lats, rxfilename=FST; and true/True or false/False is FST or lattice respectively. - I added there possibility of rho-composition, which is useful for biasing lattices with word-sequences. Thanks to rho-composition, the biasing graph does not need to contain all words from lexicon. - Would you be interested in an example how to use this? (i.e. create graphs from text file with python script using openfst as library, but that would need to change build of openfst to enable python extensions) - Also which 'egs' recipe would be convenient to use it with? * lattice-compose.cc, resolving remarks from PR #4692 * fixing issue in std::transform with std::tolower, suggesting variant of overloaded function * lattice-compose, extending the rho explanation
Adding suggestions from code review. Co-authored-by: Cy 'kkm' Katsnelson <[email protected]>
Ahoj, @vesis84, why? This is a very useful feature. Was it merged in another PR? |
Hi, The remainder is the HCLG boosting part. It worked for me in the paper with noisy input data: I was doing the composition here: The HCLG boosting can also be dangerous. If the boosting values are too big, Are you sure, you want to continue with the HCLG boosting ? K. |
… On Tue, May 10, 2022 at 11:06 AM Karel Vesely ***@***.***> wrote:
Reopened #4571 <#4571>.
—
Reply to this email directly, view it on GitHub
<#4571 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACUKYXZ22GTCJQ5AXDDJZL3VJJ3QBANCNFSM4653Y6WA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
I think that boosting individual words would be a great thing already! We
used Kaldi decoder to automate customer service for multiple clients,
including clothing manufacturers and road assistance. The road assistance
application sometimes spit out stuff like "I need to change attire." A
simple lexicon boosting would have resolved that.
More significant HCLG changes on the fly would be likely expensive.
|
yeah, I agree
…On Tue, Oct 11, 2022 at 5:09 AM kkm000 ***@***.***> wrote:
I think that boosting individual words would be a great thing already! We
used Kaldi decoder to automate customer service for multiple clients,
including clothing manufacturers and road assistance. The road assistance
application sometimes spit out stuff like "I need to change attire." A
simple lexicon boosting would have resolved that.
More significant HCLG changes on the fly would be likely expensive.
—
Reply to this email directly, view it on GitHub
<#4571 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACUKYX3KKI2RCRQJ7Y4UE2TWCUVFJANCNFSM4653Y6WA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
@vesis84 do you have the bandwidth to work on this? it would be a welcome feature |
adding code for doing 'lattice boosting' and 'HCLG boosting' to do on-the fly contextual adaptation by WFST composition