Design Considerations for OS Libraries (Life Modelling Problem) #33
Replies: 18 comments
-
Great summary! Let me address the challenge in @lewisfogden 's comment on LinkedIn
I had the same challenge when developing modelx. On Linux, |
Beta Was this translation helpful? Give feedback.
-
@fumitoh, seems like a hack to get around Python's lack of built in tail call optimization but doesn't solve the underlying issue. Have you looked at trampolines or other ways to implement that optimization instead? How about scrapping recursion entirely? |
Beta Was this translation helpful? Give feedback.
-
@fumitoh said:
@Houstonwp said:
I've developed heavymodel with novice programmers in mind, so TCO/trampolines are probably a step to far (at least for me to implement!). I've decided to accept the recursive nature of the model, and to look for ways of minimising the risk of stack overflows. My thinking for heavymodel has been:
def total_sick(t):
return sum(sick_by_dur(t, dur) for dur in range(t)) In practice as you will usually want to see aggregated results (e.g. the net cashflow), so I think this risk is low. The user could do a manual projection to avoid this. As to coming up with an alternative without recursion - that might need a completely different way of thinking about these kinds of problems (i.e. multi-state models). Could do a loop with a lot of state information but this becomes messy (and code order is vital), and it is close to recursion: #initialise state 0
if0, d0, l0 = 1, 0, 0 # in force, deaths, lapses
for t in range(N):
d1 = q[t] * if0
l1 = w[t] * if0
if1 = if0 - d1 - l1
# store states here, e.g. assign to dict/list
if0, d0, l0 = if1, d1, l1 # move everything along one |
Beta Was this translation helpful? Give feedback.
-
@lewisfogden thanks for sharing this. I've been working on projection models flow management for quite a while now and I find extremely interesting to hear other people's ideas. For me the most challenging part is how to manage the flow of bigger models that cannot be solved analytically and where each time-dependent function can refer at time T1 to any other function (or to itself) in any other time T2. Where I got to is I build a graph of dependencies between functions additionally marking the dependencies (edges of the graph) as either forward-looking (where function F1 at times T1 refers to function F2 at times T2 and T1<=T2) or backwards-looking (similarly, where a function F1 at times T1 refers to function F2 at times T2 and T1>=T2). Having that graph built, I'm working on heuristics that would know what functions at which times (mostly either t=0 or t=T_MAX) and in what order would need to be called to ensure minimal numbers of calls while staying away from stack overflow error (something like the targeted version you've mentioned, but the list would potentially be time-dependent and would be generated automatically. |
Beta Was this translation helpful? Give feedback.
-
@Houstonwp said:
modelx has a dependency trace feature for debugging like this: >>> net_cashflow.preds(6)
[Model1.sample.premiums(t=6)=49.35859623121889,
Model1.sample.claims(t=6)=61.69824528902361]
>>> premiums.succs(6)
[Model1.sample.net_cashflow(t=6)=-12.33964905780472] In order to implement the feature, modelx internally implements its own callstack to record callers and callees with their arguments in newtorkx directional graph, as well as its own memoization mechanism. This graph is also used to auto-clear the dependents of a node when the node value has changed. Is it possible to implement a feature to auto-detect tail calls in user code and implement TCO/trampolines where possible, given that the graph creation relies on the internal callstack? |
Beta Was this translation helpful? Give feedback.
-
I added Rust implementation to the gist earlier today if anyone is interested. |
Beta Was this translation helpful? Give feedback.
-
For those not following the linked gist, there's been an interesting comparison as different approaches and languages have been used to calculate the sample problem (simple 10-period npv). Here's a current comparison of the results:
|
Beta Was this translation helpful? Give feedback.
-
Thanks @alecloudenback - I've added in Nim recursive (which should be close in speed to C). I would like to see a few more of the recursive versions (as they can be applied to more general problems - the loop only works on a subset), so will add in Python. @mglinicka: interesting: have you any bi-directional problems (i.e. referring to both t-1 and t+1), or are they all either one or the other (such as projecting or discounting)? If so would you be able to share an example on gist? I thought about doing this - with python I could look at the ast to see what each function was looking up. |
Beta Was this translation helpful? Give feedback.
-
@lewisfogden re:Nim I didn't add Nim to the table, because with the memoizaton, the timings don't really measure "how fast can I calculate this actuarial problem" to how fast can I hash and look up stuff. In practice, I think it could be a valid technique so don't think it's totally unfair to include it in the table. But the question is how to measure it? The typical benchmarking doesn't really work because it usually assumes that you are rerunning the same computation each time. WIth the memo-ization, it should be "slow" the first time and then run subsequently very fast. Thus the mean would be really sensitive to how many times the benchmark ran the computation. So use the median for everything? |
Beta Was this translation helpful? Give feedback.
-
Benchmarking is really difficult at the best of times and in this case we are all running on different hardware/OS, etc. Julia is running on Roseta2 on M1 (ARM) and I'm on x86_64 (windows) for instance. I could not understand why Julia would be able to beat Rust (as there is no technical reason I know of why this would be the case even if Julia's GC is never used). I asked on the Rust user forum and seems it's most likely just due to my laptop being old or my dev setup. Another user re-ran the benchmarks on his/her system and Rust was twice as fast as Julia coming in at 76 ns. So hard to conclude too much on such a small problem and so many differences in environment. Interesting and kinda fun though... |
Beta Was this translation helpful? Give feedback.
-
You still need to calculate the problem (all values), but with memoisation you only calculate them once. I've outlined an example on my site digitalactuary, and there are a few guide if you look for dynamic programming (e.g. geeksforgeeks) For benchmarking with memoization, you need to clear the cache between benchmarks so that the full problem is recalculated (or with heavymodel library, create a new model instance with empty cache) - in the nim version this is the call It's sometimes a little tricky to thing about this, as both brains and Excel use memoisation by default 😁. Overall, I think we are conflating a few different really interesting topics here (at least!):
@paddyhoran : agree - the julia benchmarks put Julia and Rust fairly close. |
Beta Was this translation helpful? Give feedback.
-
@lewisfogden There are obviously more possibilities than being purely forwards/backwards looking. Besides referring to both t+1 and t-1 you could also have functions like sum(funcX[t0:t1]) that refer to a range of times T or even T that could be calculated in the runtime (so we cannot decide how's that affecting the flow). The other thin is that the dependencies between functions can be dependent on (static) data, so sometimes it might be useful to take this into account as well. Given the variety of dependency types, there's no one correct way of solving that and I would usually use more or less refined methods of flow management that fit the problem. But generally speaking it all goes like this: so in an ideal world we would like to calculate the model policy by policy from t=0 to t=TMAX. So given the graph, as a first step you would look for all the functions that you could calculate it that way. Then you remove these functions from the graph and check if there are any functions that you could calculated backwards, assuming the removed ones are already calculated. Then you remove these functions and go back to looking for the forward-looking ones. And that continues until all the functions are covered. All of the "problematic" ones are edge cases - in the simplest approach you might choose to call them for the their full projection length one by one. But again, there's always a way to get to a heuristic that would work better in most cases. I think the other advantage of building that graph is the possibility of speeding up the runs when you run sensitivities or stochastic scenarios. When you know the dependencies you are able to figure out which functions require to be recalculated when you change certain assumption. That gives massive performance improvement comparing to the standard models and mode of calculation. Also sometimes you only need to calculate part of the model (given the functions that you want in your output), then the graph also comes in handy - it would only cover the functions that are needed to get the values of the functions requested in the output. I'm afraid I do not have anything to share at hand - these were always a part of a bigger system and would not be understandable standalone. |
Beta Was this translation helpful? Give feedback.
-
@alecloudenback my curiosity got the better of me, I could not figure out why my impl was so "slow" on my machine. Turns out it's the pub fn npv(mortality_rates: &[f64], lapse_rates: &[f64], interest_rate: f64, sum_assured: f64, premium: f64, init_pols: f64, term: Option<usize>) -> f64 {
let term = term.unwrap_or_else(|| mortality_rates.len());
let mut result = 0.0;
let mut inforce = init_pols;
let v = 1.0 / (1.0 + interest_rate);
let mut v_t = v;
for (t, (q, w)) in mortality_rates.iter().zip(lapse_rates).enumerate() {
let no_deaths = if t < term {inforce * q} else {0.0};
let no_lapses = if t < term {inforce * w} else {0.0};
let premiums = inforce * premium;
let claims = no_deaths * sum_assured;
let net_cashflow = premiums - claims;
result += net_cashflow * v_t;
v_t *= v;
inforce = inforce - no_deaths - no_lapses;
}
result
} This version runs in 31 ns. I'm curious as the same optimization should speed up the Julia version too? |
Beta Was this translation helpful? Give feedback.
-
Been swamped at work, so just getting back to this now. Indeed, the exponentiation is expensive. You are a performance wizard, @paddyhoran! The following runs in 15ns, compared with 15ns for the rust code above on my machine. @inline function npv3(q,w,P,S,r,term=nothing)
term = term === nothing ? length(q) + 1 : term + 1
inforce = 1.0
result = 0.0
v = (1 / ( 1 + r))
v_t = v
for (t,(q,w)) in enumerate(zip(q,w))
deaths = t < term ? inforce * q : 0.0
lapses = t < term ? inforce * w : 0.0
premiums = inforce * P
claims = deaths * S
ncf = premiums - claims
result += ncf * v_t
v_t *= v
inforce = inforce - deaths - lapses
end
return result
end Note that I added Once Julia is available for Mac native, I will try this again. I couldn't get Rust working on my Windows machine, so I had to test on the laptop. |
Beta Was this translation helpful? Give feedback.
-
And starting to sacrifce readability, but this version is 12ns: @inline function npv4(q,w,P,S,r,term=nothing)
term = term === nothing ? length(q) : term
inforce = 1.0
result = 0.0
v = (1 / ( 1 + r))
v_t = v
for (t,(q,w)) in enumerate(zip(q,w))
t > term && return result
result += inforce * (P - S * q) * v_t
inforce -= inforce * q + inforce * w
v_t *= v
end
return result
end |
Beta Was this translation helpful? Give feedback.
-
Pretty interesting, I'd say we have pushed it as far as it goes. It's crazy how good the emulation is on M1 there's basically no overhead.. |
Beta Was this translation helpful? Give feedback.
-
I started to benchmark the code everyone shared on the same hardware for comparisons' sake. See code here: https://github.com/JuliaActuary/Learn/tree/master/LifeModelingProblemBenchmarks Times are in nanoseconds:
And in graphical format (mean if available, otherwise median): The distance comparisons inspired by Grace Hopper trying to explain what a nanosecond is like. Ideally I'd get other samples from some of the other examples shared and bundle it up into a single runnable script assuming you have the requsite languages/packages installed. I had a hard time getting some of the data; e.g. Rust only appears to log the median time to the terminal, Python only average. SoftwareAll languages/libraries are Mac M1 native unless otherwise noted Julia
Rust
Python:
|
Beta Was this translation helpful? Give feedback.
-
Are Julia and Rust ones single-threaded? I also wonder what the comparison using the standard model would look like. |
Beta Was this translation helpful? Give feedback.
-
@alecloudenback, @Houstonwp and I have looked at some options for the life modelling problem: Gist Here, evaluating R & Julia using a non-recursive approach (which should be about as fast as you can get), and I've tried out nim (a statically type python-like language which compiles to C) which runs at broadly the same speed with memoized recursion.
Some of my thoughts
Beta Was this translation helpful? Give feedback.
All reactions