Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sector and income #230

Merged
merged 29 commits into from
Oct 27, 2023
Merged

Sector and income #230

merged 29 commits into from
Oct 27, 2023

Conversation

jamesturner246
Copy link
Contributor

@jamesturner246 jamesturner246 commented Oct 20, 2023

In this PR, we add two models for initialising and updating a Person's sector (region) and income level category. The changes are summarised as follows:

  • Adds generate and update methods to KevinHallModel for sector and income. This is not the ideal place for them, as this class is becoming bloated. In My future refactor plans, these both would become their own 'modules', where the user has control of number of modules and ordering.

  • Allows both generate and update methods of risk factor models to always run -- see [UMBRELLA] Generalise form 'static' and 'dynamic' risk factor models #231.

  • Adds new fields and accessors to Person, and some error checking for unknown data.

  • A general cleanup as best as possible for now

Fixes #228

Fixes #229

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

src/HealthGPS.Console/model_parser.cpp Outdated Show resolved Hide resolved
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

src/HealthGPS.Console/model_parser.cpp Show resolved Hide resolved
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@jamesturner246 jamesturner246 marked this pull request as ready for review October 25, 2023 15:46
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@alexdewar alexdewar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a few queries and small suggestions, but otherwise this all looks pretty sensible!

example_new/KevinHall.json Show resolved Hide resolved
correlations(row, col) =
std::any_cast<double>(correlations_table.column(col).value(row));

for (size_t i = 0; i < opt["RiskFactorModels"].size(); i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debatable as to whether this would be an improvement, but FYI you can zip ranges together in C++20, as you would with Python: https://en.cppreference.com/w/cpp/ranges/zip_view

In this case you could zip opt["RiskFactorModels"] and correlations_table (which is also a range because it has cbegin and cend methods) and then you avoid defining i.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be a C++23 feature :(

src/HealthGPS.Console/model_parser.cpp Outdated Show resolved Hide resolved
@@ -272,6 +287,9 @@ load_kevinhall_risk_model_definition(const poco::json &opt, const host::Configur
}

// Food groups.
std::unordered_map<hgps::core::Identifier, std::map<hgps::core::Identifier, double>>
nutrient_equations;
std::unordered_map<hgps::core::Identifier, std::optional<double>> food_prices;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using floating-point numbers for currency is a common gotcha, because of rounding errors. Do you think it would work to use an integer type instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I didn't do that as I didn't think much arithmetic would happen with them. They aren't used yet, and will presumably be just logged for financial impact of interventions. They can be changed to unsigned integral types for pence/cents/... I'll leave that for another time, if and as needed.

throw core::HgpsException(
"DynamicHierarchicalLinearModel::generate_risk_factors not yet implemented.");
}
[[maybe_unused]] RuntimeContext &context) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this model not need to generate risk factors or is this just something you haven't implemented yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't generate risk factors. That's done in e.g. StaticHierarchicalLinearModel. Alas, the confusion of using the generate (static) model and update (dynamic) model terminology when both models have generate and update methods. That's why I will do #231.

src/HealthGPS/kevin_hall_model.cpp Show resolved Hide resolved
Comment on lines 180 to 186
auto logits = std::vector<double>(income_models_.size());
for (size_t i = 0; i < income_models_.size(); i++) {
logits[i] = income_models_[i].intercept;
for (const auto &[factor_name, coefficient] : income_models_[i].coefficients) {
logits[i] += coefficient * person.get_risk_factor_value(factor_name);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative implementation using ranges (you'll also need #include <ranges>):

Suggested change
auto logits = std::vector<double>(income_models_.size());
for (size_t i = 0; i < income_models_.size(); i++) {
logits[i] = income_models_[i].intercept;
for (const auto &[factor_name, coefficient] : income_models_[i].coefficients) {
logits[i] += coefficient * person.get_risk_factor_value(factor_name);
}
}
auto logits = std::vector<double>{};
logits.reserve(income_models_.size());
std::ranges::transform(income_models_, std::back_inserter(logits),
[&person](const auto &model) {
double logit = model.intercept;
for (const auto &[factor_name, coefficient] : model.coefficients) {
logit += coefficient * person.get_risk_factor_value(factor_name);
}
return logit;
});

Comment on lines 188 to 200
// Compute softmax probabilities for each income category.
auto e_logits = std::vector<double>(income_models_.size());
double e_logits_sum = 0.0;
for (size_t i = 0; i < income_models_.size(); i++) {
e_logits[i] = exp(logits[i]);
e_logits_sum += e_logits[i];
}

// Compute income category probabilities.
auto probabilities = std::vector<double>(income_models_.size());
for (size_t i = 0; i < income_models_.size(); i++) {
probabilities[i] = e_logits[i] / e_logits_sum;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could also use std::ranges::transform, though the way you've done it is fine too!

One thing I would say is that generally it's better to append to std::vectors iteratively rather than creating them with a particular size upfront (it means you can skip the zeroing of the memory). You can use the reserve() method to hint at how many elements it will have eventually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough with reserve, but I think I prefer the readability of for loops for such simple ops.

src/HealthGPS/kevin_hall_model.cpp Outdated Show resolved Hide resolved
src/HealthGPS/person.cpp Show resolved Hide resolved
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@alexdewar alexdewar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from one query re changing the behaviour of Interval, this LGTM.

src/HealthGPS.Core/interval.h Show resolved Hide resolved
src/HealthGPS.Tests/AgeGenderTable.Test.cpp Show resolved Hide resolved
@jamesturner246
Copy link
Contributor Author

jamesturner246 commented Oct 26, 2023

And a clang failure, presumably clang-14 from the platform. This version of clang seems to be particularly troublesome.. I don't know why it suddenly throws a tantrum now, though. Tempted to just bypass the blocking and deal with it later.

@jamesturner246
Copy link
Contributor Author

It's a known problem, at least.

actions/runner-images#8659

@alexdewar
Copy link
Contributor

In the repo settings you can remove the clang runner from the list of merge requirements. Then you could maybe open an issue to fix it later.

@alexdewar
Copy link
Contributor

alexdewar commented Oct 27, 2023

Another option vis-a-vis clang would be to just use the latest version rather than the one shipped with Ubuntu, which might be a more useful test anyway.

@jamesturner246 jamesturner246 merged commit 1fda35b into main Oct 27, 2023
4 of 5 checks passed
@jamesturner246 jamesturner246 deleted the sector_and_income branch October 27, 2023 08:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Initialise a Person's income Initialise a Person's sector (region)
2 participants