Failing verification of LD-Signature does not remove it from inbox #14587

eternal-flame-AD · 2024-09-20T03:28:09Z

💡 Summary

"skip: LD-Signatureの検証に失敗しました" was marked as a "unrecoverableerror" but doesn't remove the message from inbox pipeline, stalling it.

🥰 Expected Behavior

If an inbox message can't be accepted, it should be removed from the pipeline immediately (or at least a retry that happens later in time) so as other messages don't get starved of workers.

🤬 Actual Behavior

I found today my instance has a >8h delay in receiving external notes. I went to see the Bull Dashboard and say 9000 pending inbox messages and growing. The CPU and memory utility was low. I investigated the running jobs and found they have been stalled for ~20 minutes. All show an error but are still in the "running state" (see reproduction section). Probably this stalled the pipeline causing the pile up.

Someone on misskey suggested it might be retrying behavior but (1) If it's a signature mismatch it should be permanent error (2) No job should take up a slot in the workers for 25 minutes without releasing it to someone else first.

Deleting the offending server alone did not resolve the issue (temporarily unblocks it but get's blocked again quickly), I need to manually purge the redis inbox of that relay for the problem to resolve.

Screenshot: https://mi.yumechi.jp/notes/9ydk0fspg51f000e (sorry I forgot to screenshot the "Active" tab, but it was the 16 jobs on the Active tab that got stalled.

📝 Steps to Reproduce

I don't know how to inject events manually into the queue so not sure how to exacly reproduce it again but I saved the event json, and the error on the Bull Dashboard. (I see an option to manually put in a JSON data+options but I don't know the underlying logic of misskey so wasn't comfortable just putting it in on my real instance, however I am will to try it out if someone on the team could give me a JSON to put in)

All activities stalled on the queue look like this (it's different events but same type and data structure, from the same relay).

{
  "jobData": {
    "activity": {
      "@context": [
        "https://www.w3.org/ns/activitystreams",
        "https://w3id.org/security/v1"
      ],
      "actor": "https://relay.infosec.exchange/actor",
      "id": "https://relay.infosec.exchange/activity/f390340e-a6d0-4b5b-83c4-130e025d770f",
      "object": "https://e-komik.org/chapter/413697",
      "to": [
        "https://relay.infosec.exchange/followers"
      ],
      "type": "Announce"
    },
    "signature": {
      "scheme": "Signature",
      "params": {
        "keyId": "https://relay.infosec.exchange/actor#main-key",
        "algorithm": "hs2019",
        "headers": [
          "(request-target)",
          "accept",
          "content-type",
          "date",
          "digest",
          "host"
        ],
        "signature": "pYJrmyxIXqEFqpJl/BZEyiNKcESqvrLNCjoHC7nWYk2cs51sC4EsdVIkBKYtJOVSwjC2F6s9YGixUiYMQs1P2NN6dBXWZzCz6EcXqjg/4wa83/a4sXi41ZTe4Zg25FtzrP4nmKcSXWnpqC0Yj9M1+Jhp066KRl4g948rN6YRG8UbCeqfyBO+3GsszlUspMy2it4MpSYUZkiYDgltNcsjtiJ6fQvcI7wYzTYPJU/YM1FRgxWJRNHrYQbUvuhZhbFXUDQgsxxA1+ntwv6umu4sDZSmSI9Kz2rq+fdsQseOEeVO7OjPF/SmFQaVBPPw13WFH3BA0G35nUCh/ZRnYQ5h0TInKG7C0bkw16V1aN16MOscz4ut3x1g9nxP5A+PerGIXjza5fRKrPdzcTOcPe6yRaR+N61OBEZLqSy77Z1ektCj4rhHY+HqJQwJWvP1vuCry0Jhukrp4pZOhJXePOBUy2rGqRkdTy7nMjh+5HVMwOywxM/lRyHrsDVG+iG/SFK7esSibZ491mH32lF1w+6CW5UCDgjsU0Qe3BXslXEkWiSJW57U+5IEvhgcwfKQnhaOY9xInFYSIAbyENnDvCNQKQM7lr4eLSXZwHTFHyLe0juoW7ClXkr4IGsEw1XY1xBUPpq6PbMklmZi/jwSjuCfhR1Hzq/IzVqY4R0e3rTx8kM="
      },
      "signingString": "(request-target): post /inbox\naccept: application/activity+json\ncontent-type: application/activity+json\ndate: Thu, 19 Sep 2024 18:12:52 GMT\ndigest: SHA-256=IWErZ8d8jaeErrZ7ICJ/l+3lfdNDR6w2Z9hbakNtSLY=\nhost: mi.yumechi.jp",
      "algorithm": "HS2019",
      "keyId": "https://relay.infosec.exchange/actor#main-key"
    }
  },
  "returnValue": null
}

Bulls dashboard shows this error:

Error: quote resolve failed
    at ApNoteService.createNote (file:///misskey/packages/backend/built/core/activitypub/models/ApNoteService.js:239:27)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async ApNoteService.resolveNote (file:///misskey/packages/backend/built/core/activitypub/models/ApNoteService.js:324:20)
    at async ApNoteService.createNote (file:///misskey/packages/backend/built/core/activitypub/models/ApNoteService.js:197:40)
    at async ApNoteService.resolveNote (file:///misskey/packages/backend/built/core/activitypub/models/ApNoteService.js:324:20)
    at async ApInboxService.announceNote (file:///misskey/packages/backend/built/core/activitypub/ApInboxService.js:282:26)
    at async ApInboxService.announce (file:///misskey/packages/backend/built/core/activitypub/ApInboxService.js:261:36)
    at async ApInboxService.performOneActivity (file:///misskey/packages/backend/built/core/activitypub/ApInboxService.js:165:20)
    at async ApInboxService.performActivity (file:///misskey/packages/backend/built/core/activitypub/ApInboxService.js:134:22)
    at async InboxProcessorService.process (file:///misskey/packages/backend/built/queue/processors/InboxProcessorService.js:187:28)
    at async Worker.processJob (/misskey/node_modules/.pnpm/[email protected]/node_modules/bullmq/dist/cjs/classes/worker.js:445:28)
    at async Worker.retryIfFailed (/misskey/node_modules/.pnpm/[email protected]/node_modules/bullmq/dist/cjs/classes/worker.js:634:24)

  e: {
    stack: 'UnrecoverableError: skip: LD-Signatureの検証に失敗しました\n' +
      '    at InboxProcessorService.process (file:///misskey/packages/backend/built/queue/processors/InboxProcessorService.js:1
32:27)\n' +
      '    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n' +
      '    at async Worker.processJob (/misskey/node_modules/.pnpm/[email protected]/node_modules/bullmq/dist/cjs/classes/worker.js
:445:28)\n' +
      '    at async Worker.retryIfFailed (/misskey/node_modules/.pnpm/[email protected]/node_modules/bullmq/dist/cjs/classes/worker
.js:634:24)',
    message: 'skip: LD-Signatureの検証に失敗しました',
    name: 'UnrecoverableError'
  }
}

docker and docker-compose logs for some reason stops working as soon as it's stalled ... i hope the above is enough information.

💻 Frontend Environment

Not a frontend problem. Firefox and aria both show delayed messages. 

git describe=2024.8.0-2-g882c8b93c

My instance is https://mi.yumechi.jp/

Here's note featuring some screenshots: https://mi.yumechi.jp/notes/9ydpeizaxdq8000f https://mi.yumechi.jp/notes/9ydpeizaxdq8000f

🛰 Backend Environment (for server admin)

Deployed with official docker-compose file, redis:7-alpine, postgres:15-alpine.

git describe=2024.8.0-2-g882c8b93c

OS is synology DSM, kernel 4.4, x86_64

Do you want to address this bug yourself?

I don't have time looking at the source today but I am willing to submit a PR if it is within my capabilities.

Yes, I will patch the bug myself and send a pull request

The text was updated successfully, but these errors were encountered:

CHN-beta · 2024-09-23T15:06:32Z

Same issue.

turkeysanwich · 2024-09-23T19:17:34Z

I had the same issue, stuff from e-komik.org through a relay stalled the whole queue. blocking the relay didn't work but blocking e-komik.org seems to have cleared it up.

KisaragiEffective · 2024-09-23T19:25:13Z

permalink:

misskey/packages/backend/src/queue/processors/InboxProcessorService.ts

Line 142 in 882c8b9

    
           throw new Bull.UnrecoverableError('skip: LD-Signatureの検証に失敗しました');

eternal-flame-AD · 2024-09-23T19:39:22Z

Update: I read the source and could not find exactly where the issue was. However I see similar code paths to 'skip: LD-Signatureの検証に失敗しました' in today's log after I removed the offending relay and there was no stall. It seems to be something specific to the e-komik event.

Maybe add a hard timeout to the jobs, I think if one job takes more than 1 minute most instance owners would rather not receive it? ref: https://docs.bullmq.io/patterns/timeout-jobs

eternal-flame-AD added the ⚠️bug? This might be a bug label Sep 20, 2024

eternal-flame-AD changed the title ~~Input queue stalled by certain relay messages~~ Inbox queue stalled by certain relay messages Sep 20, 2024

KisaragiEffective added packages/backend Server side specific issue/PR 🌌Federation The Federation/ActivityPub feature labels Sep 23, 2024

KisaragiEffective changed the title ~~Inbox queue stalled by certain relay messages~~ Failing verification of LD-Signature does not remove it from inbox Sep 23, 2024

samunohito added this to [実験中] 管理用 Nov 27, 2024

github-project-automation bot moved this to Todo in [実験中] 管理用 Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing verification of LD-Signature does not remove it from inbox #14587

Failing verification of LD-Signature does not remove it from inbox #14587

eternal-flame-AD commented Sep 20, 2024 •

edited

Loading

CHN-beta commented Sep 23, 2024

turkeysanwich commented Sep 23, 2024

KisaragiEffective commented Sep 23, 2024

eternal-flame-AD commented Sep 23, 2024 •

edited

Loading

Failing verification of LD-Signature does not remove it from inbox #14587

Failing verification of LD-Signature does not remove it from inbox #14587

Comments

eternal-flame-AD commented Sep 20, 2024 • edited Loading

💡 Summary

🥰 Expected Behavior

🤬 Actual Behavior

📝 Steps to Reproduce

💻 Frontend Environment

🛰 Backend Environment (for server admin)

Do you want to address this bug yourself?

CHN-beta commented Sep 23, 2024

turkeysanwich commented Sep 23, 2024

KisaragiEffective commented Sep 23, 2024

eternal-flame-AD commented Sep 23, 2024 • edited Loading

eternal-flame-AD commented Sep 20, 2024 •

edited

Loading

eternal-flame-AD commented Sep 23, 2024 •

edited

Loading