-
-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is the reason behind dynamoDB streams for event store? #86
Comments
To be honest, this library started just as a proof of concept that GraphQL subscriptions can be done using the new Api Gateway v2. Then I added abstractions so it should be fairly easy to completely change the source of events, etc. Back in the time when I started with this library, there wasn't an option to use for example SQS as a Lambda source. So I used DynamoDB to guarantee that the event will be processed if something fails during event processing.
Do you mean to send event "manually" by invoking the lambda with this event? Or do you mean to process the event directly in the same process? |
I mean invoking lambda with this event. It would be better to separate event processor logic from mutations which publish said event.
I am not sure that DynamoDB to guarantee of event processing actually does anything useful. |
I have few questions because I'm not sure I understand. If you invoke such function with an event, what'll happen if this function fails? How'd you retry the event processing? |
I think this is the only case
The direct invocation is not suitable for retrying, which is why it is not the best option. Here are the reasons which may cause EventProcessor to fail I can think of:
|
I think that DynamoDB store should be eventually discarded and replaced for example by In DynamoDB approach we don't have mechanism to do this granular operations because if whole batch fails it's processed again. The problem in both cases is if you receive an event and then you need to send it to multiple connections and it fails in the middle. It'll cause the batch to be retried because the message is not directly connected to specific connection but to subscriptions. It'd be easier to have an event per subscriber, this way we can rely on queue functionality. |
I think this approach would require to have a queue per subscriber which might not be viable for when you have lots of them. |
Queue per subscriber or event per subscriber which is not viable too because you'd need to fetch all the subscriptions for given event and publish the event to all of them. At the moment we have really simple mechanism of subscription tracking which only tracks by event name, so it's not really optimal, it'd be better to have more information stored on subscription so we can easily fetch subscriptions that are relevant. For example let's say that you're developing a chat app with rooms. Now you subscribe to This way it's easier to target only subscriptions that are relevant to the event at least from the variables point of view. Still I don't like the idea of fetching all the subscriptions and creating an event for all the connections. And there are still possible edge cases for example there is no way to send the message to freshly subscribed connection (the one that subscribed during the fan out of events to all the connections in subscription). So the problem is basically in PubSub mechanism that it's not efficient, because we need to keep track of connections, subscriptions and then somehow fan out events. Do you have any idea how it could be solved for Lambda environment? In normal server you have active PubSub connections so it's easier to implement ad hoc queues because you don't need to solve event sourcing but in Lambda we still need to invoke a Lambda function which is something that can be automatically done by DynamoDB, Kinesis, SQS, SNS but the fan out part is problematic (I'm not expert in AWS, maybe there really is a way to do this with but for the last year and half I don't work with anything on AWS so this library is basically just evolved by it's users.) |
I am not sure that the way you encode subscription event name would make any difference, whether it be all int the name like
I think with the current implementation we keep the list of subscribers which are relevant to given event and it works fine on that part
Not sure if we can actually do anything in this case, but this seems okay to me.
I think the possible solution here to make these subscriber lists more manageable is to split the EventProcessor functionality: This way we would have one lambda to do fan out, queue for a retry mechanism and lambdas to handle actual sending. |
Yes it doesn't make any difference but the first one (in the name) is basically pushing responsibility to you as a developer and the second could be "automatic".
Yes they are relevant to an event so if we solve event targetting (mentioned above) then this one is solved too.
I'm not sure either.
Yes I like the idea you proposed only thing is that it's not compatible but that can be addressed for example by introducing new package for this type of event processor. (I was thinking that maybe we should have multiple packages each for different sources, for example redis, dynamodb, sqs, etc). |
You mean not compatible with serverless tamplate? Anyway, the library is currently in alpha version, API changes are to be expected. And if you do not like that developer is required to introduce a bunch of lambda functions in their serverless templates this could be addressed with putting all of our handlers (subscribersHandler, eventProcessor or even webSocketHandler as well) in one lambda and managing event sorces. And different packages idea is lovely, but I don't think that splitting it to packages is top priority right now |
Yes now as I'm thinking about it, it's not really a breaking change because as you said, it's breaking only how the infrastructure is deployed. In that case we can treat this as new event processor and document how it needs to be deployed in order to work correctly. So this gives me the idea that only things that are common is websocket and http handlers. The rest is up to event source you choose to use. So basically we'd just need to document each possible event source in its own "manual" and maybe provide some example |
So right now EventStore is using dynamoDB table to store incoming events and ddb stream to relay that events one by one to event processor lambda. Basically something like that:
Event is posted by mutation -> Event is written to Events table to dynamoDB -> DynamoDB stream invokes event processor lambda -> Event processor lambda decides to which connections event should be posted and posts it.
My question is why do we need this complex process of relaying events thru dynamoDB table. I can guess that this is some sort of event bus which decouples logic which receives events and logic which publishes them.
However this approach has no architectural benefits as it seems to me.
We do not use dynamoDB stream event batching to process multiple events at the same time, as it would delay events coming thru.
And this system is not acting as a fan-out to subscribers, as the 'fan-out' of the event happens in the event processor lambda.
Wouldn't it be better to directly invoke event processor lambda on incoming event?
The text was updated successfully, but these errors were encountered: