-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sector and income #230
Sector and income #230
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
clang-tidy review says "All clean, LGTM! 👍" |
… checking for unknown gender and sector.
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got a few queries and small suggestions, but otherwise this all looks pretty sensible!
correlations(row, col) = | ||
std::any_cast<double>(correlations_table.column(col).value(row)); | ||
|
||
for (size_t i = 0; i < opt["RiskFactorModels"].size(); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debatable as to whether this would be an improvement, but FYI you can zip
ranges together in C++20, as you would with Python: https://en.cppreference.com/w/cpp/ranges/zip_view
In this case you could zip opt["RiskFactorModels"]
and correlations_table
(which is also a range because it has cbegin
and cend
methods) and then you avoid defining i
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be a C++23 feature :(
@@ -272,6 +287,9 @@ load_kevinhall_risk_model_definition(const poco::json &opt, const host::Configur | |||
} | |||
|
|||
// Food groups. | |||
std::unordered_map<hgps::core::Identifier, std::map<hgps::core::Identifier, double>> | |||
nutrient_equations; | |||
std::unordered_map<hgps::core::Identifier, std::optional<double>> food_prices; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using floating-point numbers for currency is a common gotcha, because of rounding errors. Do you think it would work to use an integer type instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, I didn't do that as I didn't think much arithmetic would happen with them. They aren't used yet, and will presumably be just logged for financial impact of interventions. They can be changed to unsigned integral types for pence/cents/... I'll leave that for another time, if and as needed.
throw core::HgpsException( | ||
"DynamicHierarchicalLinearModel::generate_risk_factors not yet implemented."); | ||
} | ||
[[maybe_unused]] RuntimeContext &context) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this model not need to generate risk factors or is this just something you haven't implemented yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't generate risk factors. That's done in e.g. StaticHierarchicalLinearModel
. Alas, the confusion of using the generate (static) model and update (dynamic) model terminology when both models have generate and update methods. That's why I will do #231.
src/HealthGPS/kevin_hall_model.cpp
Outdated
auto logits = std::vector<double>(income_models_.size()); | ||
for (size_t i = 0; i < income_models_.size(); i++) { | ||
logits[i] = income_models_[i].intercept; | ||
for (const auto &[factor_name, coefficient] : income_models_[i].coefficients) { | ||
logits[i] += coefficient * person.get_risk_factor_value(factor_name); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternative implementation using ranges (you'll also need #include <ranges>
):
auto logits = std::vector<double>(income_models_.size()); | |
for (size_t i = 0; i < income_models_.size(); i++) { | |
logits[i] = income_models_[i].intercept; | |
for (const auto &[factor_name, coefficient] : income_models_[i].coefficients) { | |
logits[i] += coefficient * person.get_risk_factor_value(factor_name); | |
} | |
} | |
auto logits = std::vector<double>{}; | |
logits.reserve(income_models_.size()); | |
std::ranges::transform(income_models_, std::back_inserter(logits), | |
[&person](const auto &model) { | |
double logit = model.intercept; | |
for (const auto &[factor_name, coefficient] : model.coefficients) { | |
logit += coefficient * person.get_risk_factor_value(factor_name); | |
} | |
return logit; | |
}); |
src/HealthGPS/kevin_hall_model.cpp
Outdated
// Compute softmax probabilities for each income category. | ||
auto e_logits = std::vector<double>(income_models_.size()); | ||
double e_logits_sum = 0.0; | ||
for (size_t i = 0; i < income_models_.size(); i++) { | ||
e_logits[i] = exp(logits[i]); | ||
e_logits_sum += e_logits[i]; | ||
} | ||
|
||
// Compute income category probabilities. | ||
auto probabilities = std::vector<double>(income_models_.size()); | ||
for (size_t i = 0; i < income_models_.size(); i++) { | ||
probabilities[i] = e_logits[i] / e_logits_sum; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These could also use std::ranges::transform
, though the way you've done it is fine too!
One thing I would say is that generally it's better to append to std::vector
s iteratively rather than creating them with a particular size upfront (it means you can skip the zeroing of the memory). You can use the reserve()
method to hint at how many elements it will have eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough with reserve, but I think I prefer the readability of for loops for such simple ops.
Co-authored-by: Alex Dewar <[email protected]>
…amp method, and disallowed nonsensical intervals at construction.
clang-tidy review says "All clean, LGTM! 👍" |
1 similar comment
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from one query re changing the behaviour of Interval
, this LGTM.
And a clang failure, presumably clang-14 from the platform. This version of clang seems to be particularly troublesome.. I don't know why it suddenly throws a tantrum now, though. Tempted to just bypass the blocking and deal with it later. |
It's a known problem, at least. |
In the repo settings you can remove the clang runner from the list of merge requirements. Then you could maybe open an issue to fix it later. |
Another option vis-a-vis clang would be to just use the latest version rather than the one shipped with Ubuntu, which might be a more useful test anyway. |
In this PR, we add two models for initialising and updating a
Person
's sector (region) and income level category. The changes are summarised as follows:Adds
generate
andupdate
methods toKevinHallModel
for sector and income. This is not the ideal place for them, as this class is becoming bloated. In My future refactor plans, these both would become their own 'modules', where the user has control of number of modules and ordering.Allows both
generate
andupdate
methods of risk factor models to always run -- see [UMBRELLA] Generalise form 'static' and 'dynamic' risk factor models #231.Adds new fields and accessors to
Person
, and some error checking for unknown data.A general cleanup as best as possible for now
Fixes #228
Fixes #229