Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale spend data to improve budget allocation efficacy #945

Open
wants to merge 154 commits into
base: main
Choose a base branch
from

Conversation

MobiusLooper
Copy link

@MobiusLooper MobiusLooper commented Aug 19, 2024

Description

This change adds scaling to the budget, bounds, and initial guess inside the BudgetOptimizer before handing the budget to the minimize function. It then unscales the data inside the objective function, and again unscales them when passing them back to the MMM at the end of the allocate_budget method. This is because the new changes from PR933 lead to potentially large values being handed to the SLSQP minimizer, and from what I understand, this optimisation is best suited to well scaled data. On my MMM, the larger scale of input (spend in the $100,000 region) meant the optimizer got stuck in a very flat part of the search space, and effectively regurgitated the equally weighted initial guess as optimal, which it was far from being.

I didn't add any tests because it would require instantiating a fitted MMM with actual parameters, and demonstrating that the output deviating from the initial guess, but given the presumably stochastic nature of the optimisation, that felt like not a very good test anyway. But, I could think of something if necessary.

Related Issue

Checklist

Modules affected

  • MMM
  • CLV

Type of change

  • New feature / enhancement
  • Bug fix
  • Documentation
  • Maintenance
  • Other (please specify):

📚 Documentation preview 📚: https://pymc-marketing--945.org.readthedocs.build/en/945/

wd60622 and others added 30 commits April 2, 2024 15:07
* add scaling into lift test method

* test scaling methods

* test change of MMM likelihood

* link up beta_channel parameters

* change name to sigma

* reorganize and edit

* closes pymc-labs#406

* address the feedback in docstrings

* add more to docstring

* format the docstring

* be verbose for future devs

* be explicit for the column max values

* incorporate the feedback

* hide cells and add to intro

* add conclusion

* switch to header 2

* run notebook through

* move the types together

* separate model estimated from empirical

* fix typos
* drop python 3.9

* try python 3.12

* undo try python 3.12
* improve nb

* rm warnings and add link to lifetimes quickstart

* address comments

* feedback part 3

* remove warnings manually
* add more info to the notebook

* hide plots code

* fix plot y labels

* fix plot outputs and remove model build

* improve final note probability plots

* address comments

* use quickstart dataset

* feedback part 3

* remowe warnings manually

* feedback part 4
* improve mmm docs init

* add more code examples to docstrings

* minor improvemeents

* typo

* better phrasing

* add thomas suggestion
* move fixtures to conftest

* docstrings and moved set_model_fit to conftest

* fixed pandas quickstart warnings

* revert to MockModel and add ParetoNBD support

* quickstart edit for issue 609

* notebook edit
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.3.5 → v0.3.7](astral-sh/ruff-pre-commit@v0.3.5...v0.3.7)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Juan Orduz <[email protected]>
* fix potential bugs

* minor improvements

* remove rule for nb

* fix test

* improve tests syntax

* use stacklevel 2 for warnings

* use stacklevel 1 for warnings as they are used in public methods

* ignore B904

* undo change

* ricardos feedback

* use fit_posterior

* Update pymc_marketing/clv/models/gamma_gamma.py

Co-authored-by: Ricardo Vieira <[email protected]>

* last fix XD

---------

Co-authored-by: Ricardo Vieira <[email protected]>
* notebook opening and imports

* model definition markdown

* Data Load Notebook section

* WIP model fitting section

* added notebook to docs directory

* notebook edits and graph code

* ppc section and nb cleanup

* demz sampling and WIP plotting

* WIP predictive plots

* WIP heatmap plots

* predictive plots

* WIP covariates and nbqa-ruff edits

* covariate section

* plot additions

* fig sizes

* remove model file
…book (pymc-labs#651)

* add spaces, increase indentation, and fix number order

* explicit with 6
* Creating plot waterfall

Co-Authored-By: Carlos Trujillo <[email protected]>

* requested changes

* pre-commit

---------

Co-authored-by: Carlos Trujillo <[email protected]>
Databricks should have a lower-case b.
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.4.1 → v0.4.2](astral-sh/ruff-pre-commit@v0.4.1...v0.4.2)
- [github.com/pre-commit/mirrors-mypy: v1.9.0 → v1.10.0](pre-commit/mirrors-mypy@v1.9.0...v1.10.0)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* support for negative values and dates (if used)

* fix terrible spelling

* test dates in coords

* cover numpy objects

* consolidate the tests
cetagostini and others added 5 commits August 14, 2024 15:38
* Correcting typo num_days by horizon

* Correcting typo num_days by horizon and scaler

* Running notebooks

* Update UML Diagrams

* Rename horizon by periods

* Adding test requested to check budget outputs

* Running notebooks

* Update UML Diagrams

* Small notebook missing change.

* Correction in tests

* Change on name

* running notebook modifying function

* Update UML Diagrams
* helper command to view the artifacts from test

* pass tune from kwargs

* test for support of all samplers

* add mlflow as a mock import

* actual import as autolog is missing from docs
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.5.7 → v0.6.1](astral-sh/ruff-pre-commit@v0.5.7...v0.6.1)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@juanitorduz
Copy link
Collaborator

Thank you for your contribution and context @MobiusLooper !

At the moment, we have just one test failing:

FAILED tests/mmm/test_delayed_saturated_mmm.py::TestDelayedSaturatedMMM::test_allocate_budget_to_maximize_response

From your remark

I didn't add any tests because it would require instantiating a fitted MMM with actual parameters, and demonstrating that the output deviating from the initial guess, but given the presumably stochastic nature of the optimisation, that felt like not a very good test anyway. But, I could think of something if necessary.

Feel free to modify (or even delete) old tests you think are not good enough. If you delete one test, please add one (or more)
to make sure the code is doing what you expect.

Feel free to ask for support in the tests :)

@MobiusLooper
Copy link
Author

MobiusLooper commented Aug 21, 2024

I realised I missed handling the (common) case of bounds being provided to the allocate_budget method, hence the update on scaling bounds

@cetagostini cetagostini self-assigned this Aug 21, 2024
@cetagostini
Copy link
Contributor

cetagostini commented Aug 21, 2024

Hey @MobiusLooper, thanks for raise a PR. I'm not quite sure about the solution, some things worry me but could we think about them together.

a) You are scaling budgets based on the self.scales.max(), meaning, the budget get scale by the max across channels, but the channels scales are based within channels.

I think this is similar to the initial implementation and my personal opinions is it can bring some errors, also as complications in interpretation. The resulting number is not scaled back with the same value used during training, therefore it cannot be understood in the same way.

I left a small exercise where given a random allocation and specific scales per channel, using an allocation across channels can lead to an underestimation/overestimation (depending on the case) of the global budget and its response. Although directionally the response is the same, the amount of budget allocated to each one varies, and as the numbers are larger the variation is greater because the small decimals (on scale) represent larger units.

b) Even if there is a workaround for this, and we could decide that some scaling is not representative or that it might even be correct, is this really the solution? If the problem is that the optimizer is not exploring the 2D space optimally, we can expose the initial search values, the tolerance, increase the number of iterations, or adjust the bounds to be more informative than zero to inf, in order to guide it.

PS: We could even expose the method and you could switch to other more suitable if needed.

I could not find any reference in the official documentation from scipy about the optimizer working better inside a scaled space than outside a scaled space. If you have official information, it would be good to consider it.

I'm short on time this week, but it certainly this calls for discussion. Ideally, we'll validate this with examples. If we can build a couple of cases where we can see that this behavior is repetitive and not solvable by manipulating the internal parameters of the minimize, then we can think about it. I want to verify that this is not an isolated case, because I have been using it without seeing this behavior.

Example use case

### Using Global Scaler even when channels have an individual scale.

import numpy as np
import matplotlib.pyplot as plt

# Given parameters
lam_x1 = 0.2
beta_x1 = 1.2
lam_x2 = 0.7
beta_x2 = 2.1

scaler_x1 = 175 # e.g: original scale for x1 (MaxAbs for X1 based on model transformations)
scaler_x2 = 198 # e.g: original scale for x2 (MaxAbs for X2 based on model transformations)
scaler_target = 400 # e.g: Target scale (MaxAbs for target)

total_budget = 300 # Global budget to optimize
print("real total budget", total_budget)

split_x1 = 0.45  # 45% to x1 - e.g: Random example allocation decide by the scipy.minimize during the iteration process
split_x2 = 0.55  # 55% to x2 - e.g: Random example allocation decide by the scipy.minimize during the iteration process

# Scale the budget based on the max scaler between channels
max_scaler = max(scaler_x1, scaler_x2)

# Normalize the budget based on the global scale
normalized_budget_x1 = (total_budget * split_x1) / max_scaler
normalized_budget_x2 = (total_budget * split_x2) / max_scaler

# Total input budget to optimize normalize by global scaler.
print("normalized total budget", normalized_budget_x2+normalized_budget_x1)

# Total input budget to optimize in case to normalize each split by their respective existing scale (current implementation)
print("normalized budget on orignal scale", ((total_budget * split_x1) / scaler_x1) + ((total_budget * split_x2) / scaler_x2))

# Logistic saturation function
def logistic_saturation(x, lam):
    return (1 - np.exp(-lam * x)) / (1 + np.exp(-lam * x))

# Calculate the response for each channel
response_x1 = logistic_saturation(normalized_budget_x1, lam_x1) * beta_x1
response_x2 = logistic_saturation(normalized_budget_x2, lam_x2) * beta_x2

# Combine the responses (assuming additive effects for simplicity)
total_response = response_x1 + response_x2

# Rescale the response to the original target scale
rescaled_response = total_response * scaler_target

# Rescale the total budget (x1+x2) based on the original scales (Even when the split for those was doing at MaxAbs across channels and not within)
rescaled_total_budget = (normalized_budget_x2*scaler_x2) + (normalized_budget_x1*scaler_x1) # Total budget don't match

# Rescale the total budget (x1+x2) based on the global scales
rescaled_total_budget_global_scale = (normalized_budget_x2+normalized_budget_x1) * max_scaler # Total budget match - Doesn't mean the implementation is accurate.

print("Total Budget Original Scale", rescaled_total_budget)
print("Total Budget Original Global Scale", rescaled_total_budget_global_scale)

# Print the results
print("Normalized Budget X1:", normalized_budget_x1)
print("Normalized Budget X2:", normalized_budget_x2)
print("Response X1:", response_x1)
print("Response X2:", response_x2)
print("Total Response:", total_response)
print("Rescaled Total Response:", rescaled_response)

# Plot the saturation curves and the points
x_values = np.linspace(0, 1, 100)
y_values_x1 = logistic_saturation(x_values, lam_x1) * beta_x1
y_values_x2 = logistic_saturation(x_values, lam_x2) * beta_x2

plt.figure(figsize=(12, 6))
plt.plot(x_values, y_values_x1, label='Channel X1 Response Curve')
plt.plot(x_values, y_values_x2, label='Channel X2 Response Curve')
plt.scatter([normalized_budget_x1], [response_x1], color='red', label='X1 Budget Allocation', zorder=5)
plt.scatter([normalized_budget_x2], [response_x2], color='blue', label='X2 Budget Allocation', zorder=5)
plt.xlabel('Normalized Budget')
plt.ylabel('Response')
plt.title('Logistic Saturation Response Curves')
plt.legend()
plt.grid(True)
plt.show()

cc: @juanitorduz @wd60622

@MobiusLooper
Copy link
Author

Thanks for the comment and thoughts @cetagostini!

My understanding of how this is doesn't cause the same issue as the initial implementation is that that implementation was treating the scale of local channel spend, and total budget, differently. This shouldn't have that issue, as everything is being scaled by the same number. I'm only about 50% sure I understood your concern on a). I think the fact it's a shared scaling factor sidesteps it, but let me know if I might be mistaken.

On b), I agree, this doesn't seem like the ideal way to solve this problem. Ideally, as you suggest, the solve kwargs could be better used to get to a better optimisation, but I've not come up with anything from my (limited) research down this avenue. I also can't find any literature about scale in optimisation outside of a stack overflow post, so certainly nothing authoritative. So I'll have to rely on examples! I've created a simplified example below that might help demonstrate the problem. And I added at the end a demonstration of the invariance under scaling I was attempting to explain in the first paragraph.

import numpy as np
from scipy.optimize import minimize

def hill(x, gamma, alpha):
    return x ** alpha / (gamma ** alpha + x ** alpha)

def objective(spends, normalise=False):
    spends = spends / scalers
    if normalise:
        spends = spends * max(scalers)
    return -sum(hill(spends[i], gammas[i], alphas[i]) for i in range(3))

def constraint(spends, total_budget):
    return total_budget - sum(spends)

unscaled_budget = 150000
scalers = [70000, 60000, 90000]
alphas = [1.2, 0.5, 0.6]
gammas = [0.7, 0.3, 0.65]

unscaled_initial_spends = np.full(3, unscaled_budget / 3)
unscaled_bounds = [(0, unscaled_budget) for _ in range(3)]

scaled_budget = unscaled_budget / max(scalers)
scaled_initial_spends = np.full(3, scaled_budget / 3)
scaled_bounds = [(0, scaled_budget) for _ in range(3)]

unscaled_result = minimize(
    objective, 
    unscaled_initial_spends, 
    args=(False),
    bounds=unscaled_bounds,
    constraints={'type': 'eq', 'fun': constraint, 'args': (unscaled_budget,)},
    method='SLSQP',
    tol=1e-9
)

scaled_result = minimize(
    objective, 
    scaled_initial_spends, 
    args=(True),
    bounds=scaled_bounds,
    constraints={'type': 'eq', 'fun': constraint, 'args': (scaled_budget,)},
    method='SLSQP', 
    tol=1e-9
)

# demonstrating better optimisation results
print(f"Optimal spends without scaling: {unscaled_result.x}, Objective: {-unscaled_result.fun:.3f}")
print(f"Optimal spends with scaling: {scaled_result.x * max(scalers)}, Objective: {-scaled_result.fun:.3f}\n")

# demonstrating that the scaling doesn't change the calculation of the objective function
print(f"Objective for initial guess without scaling: {objective(unscaled_initial_spends):.6f}")
print(f"Objective for initial guess with scaling: {objective(scaled_initial_spends, True):.6f}")

* Default saturation_from_dict to default_priors

* Default to AdstockTransformation.default priors in adstock_from_dict
@AlfredoJF
Copy link

Hey @MobiusLooper

Thanks for sharing this example. Here is the output:

Optimal spends without scaling: [50000. 50000. 50000.], Objective: 1.608
Optimal spends with scaling: [76298.5413179  33295.11262302 40406.34605908], Objective: 1.651

Objective for initial guess without scaling: -1.607527
Objective for initial guess with scaling: -1.607527

I see your point however, the implementation is not correct because we have individual scalers by channel.

I modified your objective function to test individual scalers in the minimize function and I got the same result without scaling.

def objective(spends, normalise=False, original_scale=False):
    spends = spends / scalers
    if normalise:
        spends = spends * max(scalers)
    if original_scale:
        spends = spends * np.array(scalers)
    return -sum(hill(spends[i], gammas[i], alphas[i]) for i in range(3))
    
 ind_scaled_budget = unscaled_budget / np.array(scalers)
ind_scaled_initial_spends = np.full(3, ind_scaled_budget / 3)
ind_scaled_bounds = [(0, ind_scaled_budget[i]) for i in range(3)]

ind_scaled_result = minimize(
    objective, 
    ind_scaled_initial_spends, 
    args=(False, True),
    bounds=ind_scaled_bounds,
    constraints={'type': 'eq', 'fun': constraint, 'args': (ind_scaled_budget,)},
    method='SLSQP', 
    tol=1e-9
)

# demonstrating better optimisation results
print(f"Optimal spends with individual scaling: {ind_scaled_result.x * np.array(scalers)}, Objective: {-ind_scaled_result.fun:.3f}\n")

# demonstrating that the scaling doesn't change the calculation of the objective function
print(f"Objective for initial guess with individual scaling: {objective(ind_scaled_initial_spends, False, True):.6f}")

Result:


Optimal spends with individual scaling: [50000. 50000. 50000.], Objective: 1.608

Objective for initial guess with individual scaling: -1.607527

@MobiusLooper
Copy link
Author

MobiusLooper commented Aug 22, 2024

Thanks for the comment @AlfredoJF

It's definitely interesting that this individual scaling also enables well scaled inputs for the optimisation, and yet also doesn't actually optimise the allocations away from the initial spends.

When you say

not correct

can I just ask you to clarify what you mean? I see that the individual scaling isn't currently used in the BudgetOptimizer from what I can see.

@AlfredoJF
Copy link

Hey @MobiusLooper, sure so the main issue is this line spends = spends / scalers in your objective function

If you set the value normalise=True your spends var will be completely different from the original spend and it's like comparing apples to oranges. We want apples to apples

@MobiusLooper
Copy link
Author

Isn't my implementation the same as what's currently in main?

My implementation:

spends = spends / scalers

main branch:

for idx, (_channel, params) in enumerate(self.parameters.items()):
            budget = budgets[idx] / self.scales[idx]

@camlaird
Copy link

Hey @AlfredoJF, we were wondering if you managed to further have a look at this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.