Fixes #68: in the binary quiz case, weird things happen in updateRecall
. Either you get very wrong answers or exceptions are thrown. We can fix this by calculating moments in the log domain.
If you're testing old quizzes, this version will differ for models in the affected regions. Compare:
import ebisu
print(ebisu.updateRecall((531,531, 37.98), 0, 1, 24.0))
# old: (36.55688622754491, 36.886227544910184, 38.089740065719965)
# new: # (531.9583078300888, 531.9583078290626, 37.920753773390835)
(We actually already figured this out in the JavaScript version: fasiha/ebisu.js#21 and fasiha/ebisu.js#24.)
updateRecall
can now take floating point quizzes between 0 and 1 (inclusive) as successes
, to handle the case when your quiz isn’t quite correct or wasn’t fully incorrect. Varying this number between 0 and 1 will smoothly vary the halflife of the updated model. Under the hood, there's a noisy-Bernoulli statistical model: check for the math here.
A new function has been added to the API, rescaleHalflife
, for those cases when the halflife of a flashcard is just wrong and you need to multiply it by ten (so you see it less often) or divide it by two (so you see it more often).
updateRecall
will by default rebalance models, so that updated models will have α=β
(to within machine precision) and t
will be the halflife.
(This does have a performance impact, so I may have to flip the default to not always rebalance in a future release if this turns out to be problematic.)
(This means running updateRecall
in 2.1.0 with the same inputs will yield different numbers than 2.0.0. However, the differences are statistically very minor: they both represent very nearly the same probabilistic belief on recall.)
Closes long-standing issues #23 and #31—thank you to all participants who weighed in, offered advice, and waited patiently.
The API for updateRecall
has changed because boolean
results don't make sense for quiz apps that have a sense of "review sessions" in which the same flashcard can be reviewed more than one time, e.g., if a review session consists of conjugating the same verb twice. Therefore, updateRecall
accepts two integers:
successes
, the number of times the user correctly produced the memory encoded in this flashcard, out oftotal
number of times it was presented to the user.
The old behavior can be recovered by setting total=1
and successes=1
upon success and 0 upon failure.
The memory models from previous versions remain fully-compatible with this update.
While this new feature allows more freedom in desining quiz applications, it does open up the possibility of numerical instability when the function receives a very surprising input. Please wrap calls to updateRecall
in a try
block to gracefully handle this possibility, and get in touch in case it happens to you a lot.
Breaking changes:
predictRecall
returns log-probabilities, which are numbers between -∞ and 0 (log(0) being -∞ and log(1) being 0) by default, as a computational speedup. The returned values can still be sorted, and the lowest value corresponds to the lowest recall probability. Useexact=True
to get true probabilities (at the cost of anexp
function evaluation).- The name of the half-life function is now
modelToPercentileDecay
and has a new API.
Robert Kern's discovery that time-traveling Beta random variables through Ebbinghaus’ exponential decay function transform into GB1 random variables, which have analytic moments, was a major breakthrough. His contribution to this update, in code and ideas and time, cannot be overstated.
With the GB1 mathematical infrastructure, I was able to completely rethink the update step. Both passing and failing a quiz yield exact analytical moments of the posterior over any time horizon, not just when the test was taken. These are fit to a Beta at the very last minute. There is also a rebalancing step (which Robert foreshadowed in the GitHub issue above as a “telescoping” posterior), wherein if one of the Beta’s parameters is large compared to the other, the update is rerun at the approximate half-life of the original unbalanced posterior fit.
All of these changes are transparent to the user, who will just see more accurate behavior in extreme over- and under-reviewing.
This version was tested by a couple of users of the Curtiz app (including the developer of Ebisu). Its updateRecall
function returned models at the test time.