Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronized access to MessageService restricts performance #296

Open
magnusbaeck opened this issue Aug 16, 2024 · 6 comments · May be fixed by #302
Open

Synchronized access to MessageService restricts performance #296

magnusbaeck opened this issue Aug 16, 2024 · 6 comments · May be fixed by #302
Assignees
Labels
bug Something is wrong and needs fixing.

Comments

@magnusbaeck
Copy link
Member

Description

The ProducerController synchronizes all calls to the MessageService. This serializes all publish operations and thus has severe performance implications when the rate of incoming requests reaches a certain point. In our case our requests usually take about 35 ms but during our peak hours publishing a single message can take several seconds. This increases the total execution time for our ETOS executions significantly.

Additional Context

The problematic synchronization was added in #219 in an attempt to address a problem with corrupt HTTP responses without addressing the root cause of the poor implementation of MessageServiceRMQImpl. My code review comment in said PR (#219 (comment)) explains the problem but was never addressed.

The following graphs illustrate what we're seeing every night. Note how the request latency shoots up by several orders of magnitude without being constrained by thread pools, CPU, or memory. (The CPU usage on the right y axis of the top-left graph is in millicores.)

image

This serialization also effectively disables the attempt at a connection pool in RabbitMqProperties. Which perhaps is just as well since that class doesn't appear to be very thread safe and that its giveMeRandomChannel method doesn't return a random channel as advertised but actually just returns the first open channel in the list of channels.

Logs

No response

Expected Behavior

I expect Eiffel REMReM Publish to perform well under reasonable loads and that it its performance will be contrained by CPU, memory, thread pool size, and other tunable parameters.

Steps To Reproduce

No response

The version of this project/repo, if applicable

master

The version/edition of the Eiffel Protocol used, if applicable

N/A

@magnusbaeck magnusbaeck added the bug Something is wrong and needs fixing. label Aug 16, 2024
@z-sztrom
Copy link
Contributor

z-sztrom commented Sep 3, 2024

Confirmed, the synchronization sequence limits publish service to publish one event in a time, i.e. an event is published only when publishing of a previous one has been finished.

@z-sztrom
Copy link
Contributor

z-sztrom commented Nov 1, 2024

I tried to compare current implementation of publish service with one without the synchronized section and I didn't observe any performance difference. Publishing took roughly the same time: non-synchronized version 1% faster.

I tested in Docker environment and I suspect that MB was a bottleneck. I'll continue testing.

@magnusbaeck, can you perform testing if I provide you non-synchronized version?

@z-sztrom
Copy link
Contributor

z-sztrom commented Nov 7, 2024

I did testing using real machines (not Docker containers) and I found that publishing without synchronized section is cca 15-20% faster (publishing up to cca 100 events).

@magnusbaeck
Copy link
Member Author

Sorry, I was away last week and have had to spend some time this week on getting back on track. Yes, I can do some load testing.

100 events/s sounds very low. How many concurrent client connections did you run?

@z-sztrom
Copy link
Contributor

z-sztrom commented Nov 8, 2024

No, it's a misunderstanding. Version without synchronization is 15-20% faster when publishing up to 100 events. Then the spped advantage drops down gradually and when publishing cca 10000 events, performance of both, synchronized and non-synchronized versions, becomes the same.

Overall performance I achieved on my setup was cca 3000events/sec.

@z-sztrom
Copy link
Contributor

z-sztrom commented Nov 12, 2024

As you stated in your comment (#219 (comment)), the problem arises from inside of MessageServiceRMQImpl. The implementation has never been prepared to work in parallel/concurrent environment.

Originally, I had an impression that the implementation is located in semantics library. But, fortunately, it isn't. So that could be altered as part of publish service.

@z-sztrom z-sztrom linked a pull request Nov 14, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong and needs fixing.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants