Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugin behaviour when rabbitMQ target is down #108

Closed
kript opened this issue Sep 30, 2022 · 5 comments
Closed

plugin behaviour when rabbitMQ target is down #108

kript opened this issue Sep 30, 2022 · 5 comments

Comments

@kript
Copy link

kript commented Sep 30, 2022

Hi folks,

We recently had a disk full incident on our test RabbitMQ system which meant that the audit plugin installed and configured on our dev zones couldn't reach a rabbitMQ system to report the audited PEP's.

We discovered the following things;

  1. we were still able to iget and iput files 👍
  2. The logs were not helpful, as they contained just lots of
send: Broken pipe
send: Connection refused
recv: Connection refused
send: Broken pipe

this was happening quite often;

$ grep -c recv /var/lib/irods/log/rodsLog.2022.09.26
6944247
$ grep -c "send:" /var/lib/irods/log/rodsLog.2022.09.26
7691342

As soon as the RabbitMQ service was restored the odd messages went away. however the system did not have any queues, so it wasn't processing any of the messages. it would be helpful to have a way to know the message was acknowledged by the RAbbitMQ system (perhaps a debug setting in the options?).

Can we have the plugin log something more helpful please? It required debugging from first principles to find the cause as we also had the Indexing and tiering plugins installed and had also made a minor database change...

N.B. This is tangentially related to #106

@trel
Copy link
Member

trel commented Sep 30, 2022

Thanks for the report.

Yes, we need to make the plugin a lot more helpful / resilient in this scenario.

@alanking
Copy link
Contributor

alanking commented Jun 7, 2023

When I killed the message broker (in my case, ActiveMQ 5.14) with a 4.3.0 server and the audit plugin as of #118, I get a message like this in the logs:

{
  "error_condition::description": "Connection refused - on read from localhost:5672",
  "error_condition::name": "proton:io",
  "error_condition::what": "proton:io: Connection refused - on read from localhost:5672",
  "log_category": "rule_engine",
  "log_facility": "local0",
  "log_level": "error",
  "log_message": "Transport error in proton messaging handler",
  "request_api_name": "GENERAL_ADMIN_AN",
  "request_api_number": 701,
  "request_api_version": "d",
  "request_client_user": "rods",
  "request_host": "192.168.16.3",
  "request_proxy_user": "rods",
  "request_release_version": "rods4.3.0",
  "rule_engine_plugin": "audit_amqp",
  "server_host": "d6b557785867",
  "server_pid": 1063015,
  "server_timestamp": "2023-06-07T18:30:36.354Z",
  "server_type": "agent"
}

I think this provides all the information that was missing before (the server, the plugin name, more detailed messages, etc.). Perhaps this is resolved?

@korydraughn
Copy link
Contributor

I believe this does resolve the issue.

Please assign appropriate labels and developer to issue before closing.

@SwooshyCueb
Copy link
Member

Oh yeah, I did add a whole bunch more logging in my first refactor PR. I didn't know about this issue (or had forgotten about it) or I'd have tagged it in the commits.

@trel
Copy link
Member

trel commented Jun 11, 2023

Looks like this was handled by 2b3a398 from #105.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants