Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong tabulation in window mode #79

Open
sjanssen2 opened this issue Apr 27, 2021 · 1 comment
Open

wrong tabulation in window mode #79

sjanssen2 opened this issue Apr 27, 2021 · 1 comment
Labels
documentation Improvements or additions to documentation

Comments

@sjanssen2
Copy link
Member

sjanssen2 commented Apr 27, 2021

In principle, a given sub-word should always evaluate to the same value. However, there are rare cases where this does not hold. Currently, these are hidden in the rnalib.c library. Energy functions hl_energy, dl_energy, dr_energy and ext_mismatch_energy do not only operate on the provided terminals, but also look ahead, and back in order to identify the base that shall dangle onto the stem (the stem is the provided terminal). Typically, this is the very next character in the input sequence.

aaaACcaaagGaaa
...*({...})...

dl_energy(<4,5>, x, <10,11>): closing stem is <4,11>=CcaaagC (structure: ([...])), dangling base is <3,4>=A (structure *).

However, if we are in an alignment, the next/prev character might be a GAP - and thus the energy function needs to continue its search:

aaaACcaaagGaaa#
-a--CcaaagGaaa#
...*({...})...

dl_energy for the second sequence will identify <1,2> instead of <3,4> as the dangling base. This is correct in general.

But if we generate code for --window-mode we need to make sure that dl_energy cannot look outside of its current window. Consider the three windows 1, 2 and 3:

aaaACcaaagGaaa#
-a--CcaaagGaaa#
...*({...})...
11111111111
 22222222222
  33333333333

In window 1, dl_energy will still identify <1,2> as the dangling base. Same holds for window 2. But window 3 does not have a dangling base at all. It would be wrong to use <1,2> for dangling, since this base is not within window 3.

If we now consider that sub-words of windows are stored in tables to speed up computation, we will see that the sub-word for <3,11> causes different behaviour of dl_energy, which means that it can maybe evaluate to different values, depending on the position of the window. If this is the case, the concept of backtracing will break. Since it first evaluates the "score" of each sub-word in the forward phase and depends on re-identifing this score in the backtrace phase.

scratch that:
We can avoid this issue but explicitly not tabulating non-terminals whose production rules contain affected energy functions in their algebra functions. However, it would be great to
a) warn the use of these specific circumstances when compiling
b) provide a mechanism for a blacklist of non terminals to not be tabulated.

Not tabulating the directly affected non-terminal only defers the issue to the next above non-terminal that has to re-use the result of the first non-terminal, but also is tabulated to gain a DP schema. I've also checked that not applying the choice function will suffer from the same argument.

Thus, I currently see no way to prevent this effect within window mode.

@sjanssen2 sjanssen2 added enhancement New feature or request documentation Improvements or additions to documentation and removed enhancement New feature or request labels Apr 27, 2021
@sjanssen2
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant