-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating Causal Identification module #1166
base: main
Are you sure you want to change the base?
Conversation
What is |
Old example, I did the correction! |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1166 +/- ##
===========================================
- Coverage 95.59% 36.62% -58.97%
===========================================
Files 39 40 +1
Lines 4066 4117 +51
===========================================
- Hits 3887 1508 -2379
- Misses 179 2609 +2430 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool stuff @cetagostini ! I gave a quick look into the code and I think it brings a lot of value. Nevertheless, I think we need to improve the right level of abstraction and do not include this feature in the BaseMMM
class (see comment below)
""" | ||
|
||
def __init__( | ||
self, causal_model: CausalModel, treatment: list[str] | tuple[str], outcome: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we type treatment
as an iterator https://wiki.python.org/moin/Iterator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(applied to all below)
dag: str | None = Field( | ||
None, | ||
description="Optional DAG provided as a string Dot format for causal identification.", | ||
), | ||
treatment_nodes: list[str] | tuple[str] | None = Field( | ||
None, | ||
description="Column names of the variables of interest to identify causal effects on outcome.", | ||
), | ||
outcome_node: str | None = Field( | ||
None, description="Name of the outcome variable." | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where I would like to discuss the API. Our MMM class is already a huge monolith of many components, and I would like us to start modularizing more or even making it a subclass.
For instance, we can keep BaseMMM
as it is and have an additional
CausalMMM(BaseMMM)
, and if people want to use this class, they need to install DoWhy
. I am personally against adding DoWhy
as a required dependency, as in my experience, they sometimes hard-pin soma packages and can make it harder to resolve dependencies. WDYT?
Thoughts @wd60622 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea. I can work quickly on it, will wait for William comments 🙌🏻 Probably will have a meeting with him on Tuesday!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the dependencies are the issue then I think we can get away with having the dowhy and networkx only be required if the dag is specified. That would make models with backward compat not needing to add the new dependencies for the same model. Would only checking for these depends in the case of using this functionality solve your concerns? @juanitorduz
I think going the route of subclassing could just add more code to manage 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think going the route of subclassing could just add more code to manage 😢
true ... what would be your suggestion @wd60622 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think less code to manage its better and users still import the same MMM class. The amount of code lines is only 20, I don't see it as something crazy. Whats your opinion?
attrs["dag"] = ( | ||
json.dumps(self.dag, default=default) if hasattr(self, "dag") else "None" | ||
) | ||
attrs["treatment_nodes"] = ( | ||
self.treatment_nodes if hasattr(self, "treatment_nodes") else "None" | ||
) | ||
attrs["outcome_node"] = ( | ||
self.outcome_node if hasattr(self, "outcome_node") else "None" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think this has to be in the model builder class, which is much more general. For instance, this is not necessary for the CLV module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then where should I added? I got issues if I don't add them here, but let me try again.
"networkx", | ||
"dowhy", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should not be core dependencies but rather optional (like numpyro for sampling). See the comment above on abstraction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm following your and @wd60622 idea and moving this to optional dependencies.
View / edit / reply to this conversation on ReviewNB juanitorduz commented on 2024-11-17T18:02:02Z Did anything chane in the MMM example notebook? |
View / edit / reply to this conversation on ReviewNB juanitorduz commented on 2024-11-17T18:20:27Z Add subtitle like: business problem |
View / edit / reply to this conversation on ReviewNB juanitorduz commented on 2024-11-17T18:20:28Z Shall we remove the first data points which are generated by the natural fact that we can not adstock much for the initial point ? |
View / edit / reply to this conversation on ReviewNB juanitorduz commented on 2024-11-17T18:20:28Z Again, lets remove the first point because this initial jump is just artificial and looks odd. |
View / edit / reply to this conversation on ReviewNB juanitorduz commented on 2024-11-17T18:20:29Z Any idea on the divergences? |
View / edit / reply to this conversation on ReviewNB juanitorduz commented on 2024-11-17T18:20:30Z Can we use |
View / edit / reply to this conversation on ReviewNB juanitorduz commented on 2024-11-17T18:20:31Z Maybe we should display the HDI instead of the heat plots as we need to fix that the legend here is miningless (I will fix this soon) |
View / edit / reply to this conversation on ReviewNB juanitorduz commented on 2024-11-17T18:20:31Z Do we need to display these warnings? |
@carlosagostini I really liked the notebook 🚀 ! I think if we improve the level of abstraction (see comments above) and update the notebooks, we can merge this one soon! @drbenvincent It would be great if you could review the notebook to provide feedback if you have time 🙏 :) |
outcome=self.outcome_node, | ||
) | ||
|
||
self.control_columns = self.causal_graphical_model.compute_adjustment_sets( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I understanding that this line will modify the control columns for the model? Then we don't need to change the build_model
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, thats why I added here.
outcome_node: str | None = Field( | ||
None, description="Name of the outcome variable." | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to specify this? We already have the output_var
. Could that be leveraged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me check, not necessarily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we could simplify, that'd be great
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wd60622 we can but then I'll need to restrict that output variable in the DAG is always y
(The return from output_var
). The use of sales
or revenue
or registrations
or any other would not be possible as name. Not crazy, but then we avoid one parameter. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments and questions
Description
Short description: Integration of CausalGraphModel in BaseMMM Class
This update integrates a CausalGraphModel into the BaseMMM class, allowing for automated causal identification based on backdoor criteria, assuming a given Directed Acyclic Graph (DAG).
Summary of Changes
Added Causal Graph Option:
BaseMMM
class now accepts an optionaldag
parameter, which can be provided either as a string (DOT format) or anetworkx.DiGraph
.dag
is provided, aCausalGraphModel
is instantiated to analyze causal relationships and determine necessary adjustment sets.Automatic Minimal Adjustment Set Handling:
BaseMMM
initialization now includes logic to calculate the minimal adjustment set required to estimate the causal effect of the treatment variables (assume to be media channels) on the outcome.control_columns
are automatically updated to include variables from the minimal adjustment set only.yearly_seasonality
is not in the minimal adjustment set, theyearly_seasonality
parameter is set toNone
, effectively disabling it in the model.Warnings for Missing Adjustment Sets:
Code Example
Here's how to initialize
BaseMMM
with a DAG for causal inference:Related Issue
Checklist
Modules affected
Type of change
📚 Documentation preview 📚: https://pymc-marketing--1166.org.readthedocs.build/en/1166/