Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deliverability problems with emails coming from microsoft [+fix/workaround in v0.0.13] #237

Open
mjl- opened this issue Nov 6, 2024 · 8 comments

Comments

@mjl-
Copy link
Owner

mjl- commented Nov 6, 2024

Incoming delivery attempts from microsoft are aborted (by them) during SMTP STARTTLS.
Current workaround is disabling session tickets in our TLS config. Commit upcoming, more details will follow.

mjl- added a commit that referenced this issue Nov 6, 2024
@mjl-
Copy link
Owner Author

mjl- commented Nov 6, 2024

See https://www.xmox.nl/b/#22c8911bf3f768931d93f599b0eb03882d1c78e3 for a binary with the workaround.

@mjl-
Copy link
Owner Author

mjl- commented Nov 6, 2024

Thanks to @mdavids for reporting.

Incoming deliveries were aborted by microsoft's smtp client during starttls. I have an @outlook.com account and an office365 account, and testing showed messages did not come through. I am not getting DSNs or have access to error logs, but mdavids got an error "451 4.4.0 Security status Renegotiate". An initial hunch that renegotiation was in play (which the Go TLS stack doesn't implement) didn't lead to a solution. The connections turned out to be TLS1.3, which doesn't support renegotiation at all.

After some added TLS debugging and testing, the workaround is to disable session tickets in the Go TLS config for the server. I started receiving TLS reports from microsoft with "validation failure" errors (no further details) on Oct 24. They were about MTA-STS, which I checked and couldn't find any issues. (One limitation of TLS reporting is that there's no way to classify an error as just "TLS error". It can only be classified as a TLSA (DANE) error or an MTA-STS error, which can put you on the wrong track when analyzing problems). Anyway, after looking at the TLS messages being exchanged, I noticed the "session ticket message" the Go TLS library sends immediately after its "server finished" message. The TLS1.3 RFC seems to say that such messages can only be sent after having read a "client finished" message, https://datatracker.ietf.org/doc/html/rfc8446#section-4.6.1. Maybe Microsoft recently updated its TLS stack to be more strict, rejecting connections with messages coming in at a moment when not allowed. Probably something more is going on, I would expect the impact to be larger and more well-known.

@mjl- mjl- changed the title Deliverability problems with emails coming from microsoft Deliverability problems with emails coming from microsoft [+fix/workaround] Nov 6, 2024
@mjl-
Copy link
Owner Author

mjl- commented Nov 6, 2024

I have TLS reports without failures from Microsoft on Oct 23, so that points to changes on 24th.
Support for incoming DANE was announced at Oct 28: https://techcommunity.microsoft.com/blog/exchange/announcing-general-availability-of-inbound-smtp-dane-with-dnssec-for-exchange-on/4281292. It's for their incoming, not outgoing email. But any update may have come with more tls stack changes.

@mjl-
Copy link
Owner Author

mjl- commented Nov 6, 2024

For completeness: The tcp connection is aborted by remote during the TLS handshake, after mox has written the "server finished" and "sessionticket" messages. Remote doesn't write any bytes in response (no tls alert or client finished message).

@mjl-
Copy link
Owner Author

mjl- commented Nov 6, 2024

And TLS reports.
Still good on Oct 23:

{
	"organization-name": "Microsoft Corporation",
	"date-range": {
		"start-datetime": "2024-10-23T00:00:00Z",
		"end-datetime": "2024-10-23T23:59:59Z"
	},
	"contact-info": "[email protected]",
	"report-id": "133742711803146449+ueber.net",
	"policies": [
		{
			"policy": {
				"policy-type": "tlsa",
				"policy-string": [
					"[\"3 1 1 5C046FF012891B5F0D6176024C5C25FF486A7C12B8000FDF8B418AB3ECF6D309\",\"3 1 1 CEC87FB33D2A7499CA78E824E59B77531AC1FDEC7378FC81FCE7E5D213A364AB\"]"
				],
				"policy-domain": "ueber.net"
			},
			"summary": {
				"total-successful-session-count": 7,
				"total-failure-session-count": 0
			}
		},
		{
			"policy": {
				"policy-type": "sts",
				"policy-string": [
					"version: STSv1",
					"mode: enforce",
					"mx: mail.axillis.nl",
					"max_age: 604800"
				],
				"policy-domain": "ueber.net"
			},
			"summary": {
				"total-successful-session-count": 7,
				"total-failure-session-count": 0
			}
		}
	]
}

Errors on Oct 24:

{
	"organization-name": "Microsoft Corporation",
	"date-range": {
		"start-datetime": "2024-10-24T00:00:00Z",
		"end-datetime": "2024-10-24T23:59:59Z"
	},
	"contact-info": "[email protected]",
	"report-id": "133743605086584199+ueber.net",
	"policies": [
		{
			"policy": {
				"policy-type": "tlsa",
				"policy-string": [
					"[\"3 1 1 5C046FF012891B5F0D6176024C5C25FF486A7C12B8000FDF8B418AB3ECF6D309\",\"3 1 1 CEC87FB33D2A7499CA78E824E59B77531AC1FDEC7378FC81FCE7E5D213A364AB\"]"
				],
				"policy-domain": "ueber.net"
			},
			"summary": {
				"total-successful-session-count": 2,
				"total-failure-session-count": 0
			}
		},
		{
			"policy": {
				"policy-type": "sts",
				"policy-string": [
					"version: STSv1",
					"mode: enforce",
					"mx: mail.axillis.nl",
					"max_age: 604800"
				],
				"policy-domain": "ueber.net"
			},
			"summary": {
				"total-successful-session-count": 2,
				"total-failure-session-count": 6
			},
			"failure-details": [
				{
					"result-type": "validation-failure",
					"receiving-mx-hostname": "mail.axillis.nl",
					"receiving-ip": "2a02:2770:12:0:21a:4aff:fee0:fa47",
					"failed-session-count": 3,
					"failure-reason-code": "VALIDATION_FAILURE"
				},
				{
					"result-type": "validation-failure",
					"receiving-mx-hostname": "mail.axillis.nl",
					"receiving-ip": "84.22.96.237",
					"failed-session-count": 3,
					"failure-reason-code": "VALIDATION_FAILURE"
				}
			]
		}
	]
}

And the logging of a failing session:

Nov 01 13:19:56 mail mox[260590]: l=info m="new connection" pkg=smtpserver remote=52.101.43.122:46292 local=84.22.96.237:25 submission=false tls=false listener=public cid=192618eb71c delta="124.654µs"
Nov 01 13:19:56 mail mox[260590]: l=trace m="LS: 220 mail.axillis.nl ESMTP mox v0.0.12\r\n" pkg=smtpserver cid=192618eb71c delta="49.665µs"
Nov 01 13:19:56 mail mox[260590]: l=trace m="RC: EHLO SJ2PR03CU001.outbound.protection.outlook.com\r\n" pkg=smtpserver cid=192618eb71c delta=144.37905ms
Nov 01 13:19:56 mail mox[260590]: l=debug m="smtp command result" pkg=smtpserver kind=smtp cmd=ehlo code=250 ecode= duration="39.779µs" cid=192618eb71c delta="120.851µs"
Nov 01 13:19:56 mail mox[260590]: l=trace m="LS: 250-mail.axillis.nl\r\n250-PIPELINING\r\n250-SIZE 104857600\r\n250-STARTTLS\r\n250-ENHANCEDSTATUSCODES\r\n250-8BITMIME\r\n250-LIMITS RCPTMAX=1000\r\n250 SMTPUTF8\r\n" pkg=smtpserver cid=192618eb71c delta="26.945µs"
Nov 01 13:19:56 mail mox[260590]: l=trace m="RC: STARTTLS\r\n" pkg=smtpserver cid=192618eb71c delta=144.395457ms
Nov 01 13:19:56 mail mox[260590]: l=debug m="smtp command result" pkg=smtpserver kind=smtp cmd=starttls code=220 ecode=2.0.0 duration="515.3µs" cid=192618eb71c delta="625.824µs"
Nov 01 13:19:56 mail mox[260590]: l=trace m="LS: 220 2.0.0 go! (H0zC1Oe0Fiok3krEZ63f3w)\r\n" pkg=smtpserver cid=192618eb71c delta="23.203µs"
Nov 01 13:19:56 mail mox[260590]: l=debug m="starting tls server handshake" pkg=smtpserver cid=192618eb71c delta="551.909µs"
Nov 01 13:19:57 mail mox[260590]: l=info m="connection closed" err="starttls handshake: EOF (io error)" pkg=smtpserver cid=192618eb71c delta=295.626141ms

@mjl-
Copy link
Owner Author

mjl- commented Nov 6, 2024

See golang/go#70232

@mjl- mjl- changed the title Deliverability problems with emails coming from microsoft [+fix/workaround] Deliverability problems with emails coming from microsoft [+fix/workaround in v0.0.13] Nov 6, 2024
@mjl-
Copy link
Owner Author

mjl- commented Nov 6, 2024

The fix is in mox release v0.0.13, released just now.

@mdavids
Copy link

mdavids commented Nov 7, 2024

For the record: https://list.mailop.org/private/mailop/2024-November/029764.html

mjl- added a commit that referenced this issue Nov 29, 2024
… incoming deliveries

and add an alerting rule if the failure rate becomes >10% (e.g. expired
certificate).

the prometheus metrics includes a reason, including potential tls alerts, if
remote smtp clients would send those (openssl s_client -starttls does).

inspired by issue #237, where incoming connections were aborted by remote. such
errors would show up as "eof" in the metrics.
mjl- added a commit that referenced this issue Dec 6, 2024
…ssion tickets

the field is optional. if absent, the default behaviour is currently to disable
session tickets. users can set the option if they want to try if delivery from
microsoft is working again. in a  future version, we can switch the default to
enabling session tickets.

the previous fix was to disable session tickets for all tls connections,
including https. that was a bit much.

for issue #237
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants