Sector and income #230

jamesturner246 · 2023-10-20T14:47:28Z

In this PR, we add two models for initialising and updating a Person's sector (region) and income level category. The changes are summarised as follows:

Adds generate and update methods to KevinHallModel for sector and income. This is not the ideal place for them, as this class is becoming bloated. In My future refactor plans, these both would become their own 'modules', where the user has control of number of modules and ordering.
Allows both generate and update methods of risk factor models to always run -- see [UMBRELLA] Generalise form 'static' and 'dynamic' risk factor models #231.
Adds new fields and accessors to Person, and some error checking for unknown data.
A general cleanup as best as possible for now

Fixes #228

Fixes #229

github-actions

clang-tidy made some suggestions

src/HealthGPS.Console/model_parser.cpp

…tor.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

github-actions · 2023-10-20T18:22:39Z

clang-tidy review says "All clean, LGTM! 👍"

github-actions · 2023-10-23T14:07:47Z

clang-tidy review says "All clean, LGTM! 👍"

github-actions · 2023-10-23T16:45:49Z

clang-tidy review says "All clean, LGTM! 👍"

github-actions

clang-tidy made some suggestions

src/HealthGPS.Console/model_parser.cpp

github-actions · 2023-10-25T13:53:45Z

clang-tidy review says "All clean, LGTM! 👍"

… checking for unknown gender and sector.

github-actions · 2023-10-25T15:29:01Z

clang-tidy review says "All clean, LGTM! 👍"

github-actions · 2023-10-25T15:51:33Z

clang-tidy review says "All clean, LGTM! 👍"

alexdewar

I've got a few queries and small suggestions, but otherwise this all looks pretty sensible!

example_new/KevinHall.json

alexdewar · 2023-10-26T08:54:13Z

src/HealthGPS.Console/model_parser.cpp

-            correlations(row, col) =
-                std::any_cast<double>(correlations_table.column(col).value(row));
+
+    for (size_t i = 0; i < opt["RiskFactorModels"].size(); i++) {


Debatable as to whether this would be an improvement, but FYI you can zip ranges together in C++20, as you would with Python: https://en.cppreference.com/w/cpp/ranges/zip_view

In this case you could zip opt["RiskFactorModels"] and correlations_table (which is also a range because it has cbegin and cend methods) and then you avoid defining i.

Seems to be a C++23 feature :(

src/HealthGPS.Console/model_parser.cpp

alexdewar · 2023-10-26T08:57:35Z

src/HealthGPS.Console/model_parser.cpp

@@ -272,6 +287,9 @@ load_kevinhall_risk_model_definition(const poco::json &opt, const host::Configur
    }

    // Food groups.
+    std::unordered_map<hgps::core::Identifier, std::map<hgps::core::Identifier, double>>
+        nutrient_equations;
+    std::unordered_map<hgps::core::Identifier, std::optional<double>> food_prices;


Using floating-point numbers for currency is a common gotcha, because of rounding errors. Do you think it would work to use an integer type instead?

True, I didn't do that as I didn't think much arithmetic would happen with them. They aren't used yet, and will presumably be just logged for financial impact of interventions. They can be changed to unsigned integral types for pence/cents/... I'll leave that for another time, if and as needed.

alexdewar · 2023-10-26T08:59:22Z

src/HealthGPS/dynamic_hierarchical_linear_model.cpp

-    throw core::HgpsException(
-        "DynamicHierarchicalLinearModel::generate_risk_factors not yet implemented.");
-}
+    [[maybe_unused]] RuntimeContext &context) {}


Does this model not need to generate risk factors or is this just something you haven't implemented yet?

Doesn't generate risk factors. That's done in e.g. StaticHierarchicalLinearModel. Alas, the confusion of using the generate (static) model and update (dynamic) model terminology when both models have generate and update methods. That's why I will do #231.

src/HealthGPS/kevin_hall_model.cpp

alexdewar · 2023-10-26T09:17:51Z

src/HealthGPS/kevin_hall_model.cpp

+    auto logits = std::vector<double>(income_models_.size());
+    for (size_t i = 0; i < income_models_.size(); i++) {
+        logits[i] = income_models_[i].intercept;
+        for (const auto &[factor_name, coefficient] : income_models_[i].coefficients) {
+            logits[i] += coefficient * person.get_risk_factor_value(factor_name);
+        }
+    }


Alternative implementation using ranges (you'll also need #include <ranges>):

Suggested change

auto logits = std::vector<double>(income_models_.size());

for (size_t i = 0; i < income_models_.size(); i++) {

logits[i] = income_models_[i].intercept;

for (const auto &[factor_name, coefficient] : income_models_[i].coefficients) {

logits[i] += coefficient * person.get_risk_factor_value(factor_name);

}

}

auto logits = std::vector<double>{};

logits.reserve(income_models_.size());

std::ranges::transform(income_models_, std::back_inserter(logits),

[&person](const auto &model) {

double logit = model.intercept;

for (const auto &[factor_name, coefficient] : model.coefficients) {

logit += coefficient * person.get_risk_factor_value(factor_name);

}

return logit;

});

alexdewar · 2023-10-26T09:20:30Z

src/HealthGPS/kevin_hall_model.cpp

+    // Compute softmax probabilities for each income category.
+    auto e_logits = std::vector<double>(income_models_.size());
+    double e_logits_sum = 0.0;
+    for (size_t i = 0; i < income_models_.size(); i++) {
+        e_logits[i] = exp(logits[i]);
+        e_logits_sum += e_logits[i];
+    }
+
+    // Compute income category probabilities.
+    auto probabilities = std::vector<double>(income_models_.size());
+    for (size_t i = 0; i < income_models_.size(); i++) {
+        probabilities[i] = e_logits[i] / e_logits_sum;
+    }


These could also use std::ranges::transform, though the way you've done it is fine too!

One thing I would say is that generally it's better to append to std::vectors iteratively rather than creating them with a particular size upfront (it means you can skip the zeroing of the memory). You can use the reserve() method to hint at how many elements it will have eventually.

Fair enough with reserve, but I think I prefer the readability of for loops for such simple ops.

src/HealthGPS/kevin_hall_model.cpp

src/HealthGPS/person.cpp

Co-authored-by: Alex Dewar <[email protected]>

…amp method, and disallowed nonsensical intervals at construction.

github-actions · 2023-10-26T13:55:32Z

clang-tidy review says "All clean, LGTM! 👍"

github-actions · 2023-10-26T14:12:37Z

clang-tidy review says "All clean, LGTM! 👍"

github-actions · 2023-10-26T15:04:26Z

clang-tidy review says "All clean, LGTM! 👍"

github-actions · 2023-10-26T15:37:02Z

clang-tidy review says "All clean, LGTM! 👍"

alexdewar

Aside from one query re changing the behaviour of Interval, this LGTM.

src/HealthGPS.Core/interval.h

src/HealthGPS.Tests/AgeGenderTable.Test.cpp

jamesturner246 · 2023-10-26T16:55:44Z

And a clang failure, presumably clang-14 from the platform. This version of clang seems to be particularly troublesome.. I don't know why it suddenly throws a tantrum now, though. Tempted to just bypass the blocking and deal with it later.

jamesturner246 · 2023-10-26T17:10:27Z

It's a known problem, at least.

actions/runner-images#8659

alexdewar · 2023-10-27T08:13:46Z

In the repo settings you can remove the clang runner from the list of merge requirements. Then you could maybe open an issue to fix it later.

alexdewar · 2023-10-27T08:14:42Z

Another option vis-a-vis clang would be to just use the latest version rather than the one shipped with Ubuntu, which might be a more useful test anyway.

jamesturner246 added 3 commits October 20, 2023 12:13

Add SectorPrevalence to KevinHall.json config.

de623fd

Load Rural Prevalence into model parser from JSON.

d638cef

Load Rural Prevalence into KH model.

585e936

github-actions bot reviewed Oct 20, 2023

View reviewed changes

src/HealthGPS.Console/model_parser.cpp Outdated Show resolved Hide resolved

jamesturner246 and others added 7 commits October 20, 2023 16:16

We want all our RF models' generate_ methods to run.

1922906

Add stub initialise_sector method to KH model class.

3a6534e

Add enum type for sector (region).

e5a312c

Add 'unknown' sector enum type, and inititalise people to unknown sec…

87c4eac

…tor.

Finish initialise_sector code in KevinHall.

5f9a7fe

Clarify that we are using rural prevalence here.

eb35572

Don't std move trivial type

c404c0a

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

model parser clang-tidy.

f4d6638

jamesturner246 added 3 commits October 23, 2023 16:35

Add Income model parameters to KevinHall config JSON.

b4d72ab

Load income models from JSON in mdoel parser.

d7f5b61

Load income models from mdoel parser into kevin hall model.

7c0846f

jamesturner246 added 3 commits October 24, 2023 13:40

Add missing category_1 income model.

1aea169

LinearModelParams should be a vector.

3c6ad90

Add initialise_income to kevin hall model.

c72c2bc

github-actions bot reviewed Oct 24, 2023

View reviewed changes

src/HealthGPS.Console/model_parser.cpp Show resolved Hide resolved

jamesturner246 added 3 commits October 25, 2023 11:32

Fix read after std::move.

61ab5b7

Call initialise_income in KH model.

06e553e

Add sector update code to kevinhall.

e373e53

jamesturner246 added 3 commits October 25, 2023 15:40

Code to convert sector to a real, for use as a coefficient. Add error…

f5a22d2

… checking for unknown gender and sector.

Age group name consistency.

2326ca9

Add person method to check over 18.

3985efa

Add kevinhall code to update income.

905223a

jamesturner246 marked this pull request as ready for review October 25, 2023 15:46

jamesturner246 requested review from dalonsoa and alexdewar October 25, 2023 15:47

alexdewar approved these changes Oct 26, 2023

View reviewed changes

jamesturner246 and others added 3 commits October 26, 2023 13:48

Change weird if

71926bd

Co-authored-by: Alex Dewar <[email protected]>

nutrient_ranges is now a Double interval, interval now has its own cl…

d32fd44

…amp method, and disallowed nonsensical intervals at construction.

KH: rural_prevalence should be an unordered_map.

a44f19c

KH: use vector reserve, rather than letting it zero init.

0ab8ef6

KH: push_back, don't assign, into reserved vector.

1ebaa1a

jamesturner246 requested a review from alexdewar October 26, 2023 15:25

alexdewar approved these changes Oct 26, 2023

View reviewed changes

src/HealthGPS.Core/interval.h Show resolved Hide resolved

src/HealthGPS.Tests/AgeGenderTable.Test.cpp Show resolved Hide resolved

jamesturner246 merged commit 1fda35b into main Oct 27, 2023
4 of 5 checks passed

jamesturner246 deleted the sector_and_income branch October 27, 2023 08:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sector and income #230

Sector and income #230

jamesturner246 commented Oct 20, 2023 •

edited

Loading

github-actions bot left a comment

github-actions bot commented Oct 20, 2023

github-actions bot commented Oct 23, 2023

github-actions bot commented Oct 23, 2023

github-actions bot left a comment

github-actions bot commented Oct 25, 2023

github-actions bot commented Oct 25, 2023

github-actions bot commented Oct 25, 2023

alexdewar left a comment

alexdewar Oct 26, 2023

jamesturner246 Oct 26, 2023

alexdewar Oct 26, 2023

jamesturner246 Oct 26, 2023

alexdewar Oct 26, 2023

jamesturner246 Oct 26, 2023

alexdewar Oct 26, 2023

alexdewar Oct 26, 2023

jamesturner246 Oct 26, 2023

github-actions bot commented Oct 26, 2023

github-actions bot commented Oct 26, 2023

github-actions bot commented Oct 26, 2023

github-actions bot commented Oct 26, 2023

alexdewar left a comment

jamesturner246 commented Oct 26, 2023 •

edited

Loading

jamesturner246 commented Oct 26, 2023

alexdewar commented Oct 27, 2023

alexdewar commented Oct 27, 2023 •

edited

Loading

-    auto logits = std::vector<double>(income_models_.size());
-    for (size_t i = 0; i < income_models_.size(); i++) {
-        logits[i] = income_models_[i].intercept;
-        for (const auto &[factor_name, coefficient] : income_models_[i].coefficients) {
-            logits[i] += coefficient * person.get_risk_factor_value(factor_name);
-        }
-    }
+    auto logits = std::vector<double>{};
+    logits.reserve(income_models_.size());
+    std::ranges::transform(income_models_, std::back_inserter(logits),
+                           [&person](const auto &model) {
+                               double logit = model.intercept;
+                               for (const auto &[factor_name, coefficient] : model.coefficients) {
+                                   logit += coefficient * person.get_risk_factor_value(factor_name);
+                               }
+                               return logit;
+                           });

Sector and income #230

Sector and income #230

Conversation

jamesturner246 commented Oct 20, 2023 • edited Loading

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot commented Oct 20, 2023

github-actions bot commented Oct 23, 2023

github-actions bot commented Oct 23, 2023

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot commented Oct 25, 2023

github-actions bot commented Oct 25, 2023

github-actions bot commented Oct 25, 2023

alexdewar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Oct 26, 2023

github-actions bot commented Oct 26, 2023

github-actions bot commented Oct 26, 2023

github-actions bot commented Oct 26, 2023

alexdewar left a comment

Choose a reason for hiding this comment

jamesturner246 commented Oct 26, 2023 • edited Loading

jamesturner246 commented Oct 26, 2023

alexdewar commented Oct 27, 2023

alexdewar commented Oct 27, 2023 • edited Loading

jamesturner246 commented Oct 20, 2023 •

edited

Loading

jamesturner246 commented Oct 26, 2023 •

edited

Loading

alexdewar commented Oct 27, 2023 •

edited

Loading