Distributed consistent flag evaluations #396
Replies: 7 comments 16 replies
-
Just to add - we (Flagsmith) use epoch markers as part of our real time streaming infrastructure and it has worked well; its simple, reliable and easy to understand. |
Beta Was this translation helpful? Give feedback.
-
I’ve heard this hypothetical scenario being discussed a few times, but I’ve never seen it happen in the wild. I’m sort of skeptical that you’d really see different distributed services wanting to evaluate the same feature flag. I’ve definitely seen a service at the top of the stack change a parameter that’s passed down to another service - for example, it checks whether a given user has access to same-day shipping in their market, and changes the parameters to an email template, or something - but IMO you don't want multiple services to be coordinating their behavior via a shared feature flag. It's much better to make that explicit by having the first service evaluate the flag and then pass the relevant parameters directly when calling the downstream service(s) |
Beta Was this translation helpful? Give feedback.
-
I think sometimes people theoretically over-think this problem too. Rolling server cluster upgrades will cause some traffic to get vN and some to get vN+1 but I've never, ever heard anyone worry or care about that situation. |
Beta Was this translation helpful? Give feedback.
-
I have definitely seen that at a customer @moredip, and thinking of "coordinated feature deployments" I think this is not a bad pattern. There we just used |
Beta Was this translation helpful? Give feedback.
-
Distributed consistent flag evaluations with multiple downstream services @sebastian-zahrhuber I really like the idea and actually we implemented similar ideas in projects.
This could be a good solution. In this case, we would have to accept that the I really appreciate that
We also used it in several projects to add some When handling a request, the SDK could just save the evaluated flags in a similar mechanism we use for evaluation context. I would be really in favor of going this route. Once we see a tendency that other are in favor of this too, I would love to build an experimental version for the JS SDKs. (Edit) I see one huge problem: How do we make sure that the baggage is only set by a trusted party? I guess this could be a blocker. If a client can set baggage (which it can always do), and this flag is used for "permission toggles", we can not trust the flag information. As an addition to this, I would also like to add an optional Distributed consistent flag evaluations with asynchronous Execute/Poll pattern I see the point but I have seen this problem way less then the other one, my main concern is that most of the proposed solutions either require knowledge about this condition in the first entity of the call chain (Version 1) or a separate cache which I think is hard to spec as OpenFeature. Also it makes this condition opaque to the services/clients. |
Beta Was this translation helpful? Give feedback.
-
Oh I did not see that one, nice point!
But we would have to have the concept in the SDKs too then right? Maybe generally a thing to add next to Should we open an issue to discuss that in the spec repo? |
Beta Was this translation helpful? Give feedback.
-
Happy to see that there's already a conversation around this! Thanks for the initial writeup @sebastian-zahrhuber. A counter argument for transporting evaluated flag values is payload size, our in-house flagging system uses a yaml file as the source of data which is now 2MB+ for all our flags and rules. The reason why the total size is relevant for us is that, for better or worse, the context schema that's applied to the rules can vary by service, thus we need to transport all the rules and not just evaluated values. I can't disclose too much information, but epoch markers are the synchronization mechanism that makes the most sense for us. We plan to implement it internally either way, but would love to adopt OpenFeature and contribute if possible. |
Beta Was this translation helpful? Give feedback.
-
Within my master thesis, I am currently investigating how distributed consistent flag evaluations could be implemented with OpenFeature and therefore, I want to share my ideas with you.
Imagine a service architecture where requests are forwarded to multiple downstream services. In this scenario, it could be important that all services down the call stack evaluate a shared feature flag to the same value, even if it was changed in the meantime.
Distributed consistent flag evaluations with multiple downstream services
To implement this, I see the following options available:
Since the OpenFeature SDKs do not yet support a versioning concept, the best option currently implementable, in my opinion, is transporting the evaluated flag value via the request down the stream. Therefore, a Baggage header could be used.
With this approach, the service has the responsibility to forward the baggage header and the service needs custom logic to use the baggage header instead of evaluating the feature flag itself.
To avoid having to implement this logic in all services, a further development of the SDK's would be to use this baggage header as a parameter for OpenFeature SDK flag evaluation. The SDK would then decide if a value should be used from the baggage header or evaluated from the provider. This way, the service would not have the responsibility to check the header.
Distributed consistent flag evaluations with asynchronous Execute/Poll pattern
A special extension is an asynchronous execute/poll pattern, where an operation is started with an execute request and the status of the operation can be queried with a poll request. Here too, it can be important that the shared feature flag is evaluated to the same value during the poll request as it was during the execute request.
To ensure distributed consistent flag evaluation in this case as well, I have looked into two possible approaches:
Distributed consistent flag evaluations using a cache for the first service + baggage header for downstream services
This approach would offer best performance since only one OFREP call needs to be made and the only one cache storage/read access is necessary. However, the downside is that it is only possible to store feature flag values within the scope of the shared feature flags, as only one service has access to the cache.
Distributed consistent flag evaluations using a cache for all services
The other option would be to have access to the cache for all services. This raises questions such as the support of multiple scopes, or whether only feature flags of the shared scope or the feature flags of all scopes should be consistent for the execute/poll pattern. Depending on this, one must also be cautious about data leaks on the cache side.
The downside of this approach would be the poorer performance due to multiple accesses to Redis and possibly multiple OFREP calls.
Beta Was this translation helpful? Give feedback.
All reactions