You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In principle, a given sub-word should always evaluate to the same value. However, there are rare cases where this does not hold. Currently, these are hidden in the rnalib.c library. Energy functions hl_energy, dl_energy, dr_energy and ext_mismatch_energy do not only operate on the provided terminals, but also look ahead, and back in order to identify the base that shall dangle onto the stem (the stem is the provided terminal). Typically, this is the very next character in the input sequence.
aaaACcaaagGaaa
...*({...})...
dl_energy(<4,5>, x, <10,11>): closing stem is <4,11>=CcaaagC (structure: ([...])), dangling base is <3,4>=A (structure *).
However, if we are in an alignment, the next/prev character might be a GAP - and thus the energy function needs to continue its search:
aaaACcaaagGaaa#
-a--CcaaagGaaa#
...*({...})...
dl_energy for the second sequence will identify <1,2> instead of <3,4> as the dangling base. This is correct in general.
But if we generate code for --window-mode we need to make sure that dl_energy cannot look outside of its current window. Consider the three windows 1, 2 and 3:
In window 1, dl_energy will still identify <1,2> as the dangling base. Same holds for window 2. But window 3 does not have a dangling base at all. It would be wrong to use <1,2> for dangling, since this base is not within window 3.
If we now consider that sub-words of windows are stored in tables to speed up computation, we will see that the sub-word for <3,11> causes different behaviour of dl_energy, which means that it can maybe evaluate to different values, depending on the position of the window. If this is the case, the concept of backtracing will break. Since it first evaluates the "score" of each sub-word in the forward phase and depends on re-identifing this score in the backtrace phase.
scratch that: We can avoid this issue but explicitly not tabulating non-terminals whose production rules contain affected energy functions in their algebra functions. However, it would be great to
a) warn the use of these specific circumstances when compiling
b) provide a mechanism for a blacklist of non terminals to not be tabulated.
Not tabulating the directly affected non-terminal only defers the issue to the next above non-terminal that has to re-use the result of the first non-terminal, but also is tabulated to gain a DP schema. I've also checked that not applying the choice function will suffer from the same argument.
Thus, I currently see no way to prevent this effect within window mode.
The text was updated successfully, but these errors were encountered:
In principle, a given sub-word should always evaluate to the same value. However, there are rare cases where this does not hold. Currently, these are hidden in the rnalib.c library. Energy functions
hl_energy
,dl_energy
,dr_energy
andext_mismatch_energy
do not only operate on the provided terminals, but also look ahead, and back in order to identify the base that shall dangle onto the stem (the stem is the provided terminal). Typically, this is the very next character in the input sequence.dl_energy(<4,5>, x, <10,11>)
: closing stem is<4,11>
=CcaaagC
(structure:([...])
), dangling base is<3,4>
=A
(structure*
).However, if we are in an alignment, the next/prev character might be a GAP
-
and thus the energy function needs to continue its search:dl_energy
for the second sequence will identify<1,2>
instead of<3,4>
as the dangling base. This is correct in general.But if we generate code for
--window-mode
we need to make sure thatdl_energy
cannot look outside of its current window. Consider the three windows1
,2
and3
:In window
1
,dl_energy
will still identify<1,2>
as the dangling base. Same holds for window2
. But window3
does not have a dangling base at all. It would be wrong to use<1,2>
for dangling, since this base is not within window3
.If we now consider that sub-words of windows are stored in tables to speed up computation, we will see that the sub-word for
<3,11>
causes different behaviour ofdl_energy
, which means that it can maybe evaluate to different values, depending on the position of the window. If this is the case, the concept of backtracing will break. Since it first evaluates the "score" of each sub-word in the forward phase and depends on re-identifing this score in the backtrace phase.scratch that:
We can avoid this issue but explicitly not tabulating non-terminals whose production rules contain affected energy functions in their algebra functions. However, it would be great toa) warn the use of these specific circumstances when compiling
b) provide a mechanism for a blacklist of non terminals to not be tabulated.
Not tabulating the directly affected non-terminal only defers the issue to the next above non-terminal that has to re-use the result of the first non-terminal, but also is tabulated to gain a DP schema. I've also checked that not applying the choice function will suffer from the same argument.
Thus, I currently see no way to prevent this effect within window mode.
The text was updated successfully, but these errors were encountered: