Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sending a mail in charsets other than Latin #413

Open
Mihara opened this issue Jun 13, 2023 · 7 comments
Open

Sending a mail in charsets other than Latin #413

Mihara opened this issue Jun 13, 2023 · 7 comments

Comments

@Mihara
Copy link

Mihara commented Jun 13, 2023

When I try to use the HTTP interface to compose a message using Cyrillic characters, the result is obviously useless:

Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=ISO-8859-1
....
Type: Private

?????????

When attempting to compose an email from command line, the result is the same. I don't see a way to set a different content-type anywhere, but the presence of a content-type kind of implies that other charsets should be possible.

Winlink documentation says nothing regarding charsets in B2F, and incoming messages contain no charset information, so I'm not sure they are. and messages arriving from the Internet seem to assume ISO-8859-1 and likewise mangle cyrillic. But if so, why have a content-type header at all?

Is this a limitation of Pat, or a limitation of Winlink system in general? If Since this is a general limitation of Winlink, maybe Pat should prevent the user from inputting characters that will be mangled on saving, or display an error...

@martinhpedersen
Copy link
Member

martinhpedersen commented Jun 13, 2023

Hi @Mihara,

According to the Open B2F (aka winlink) protocol, "The body of the message is limited to ASCII characters ...". However, I've found that Winlink Express and the CMS' SMTP bridge encodes and decodes the body as ISO-8859-1. Since we're trying to be fully compatible with the Winlink system, Pat also uses ISO-8859-1.

Winlink Express and CMS' SMTP bridge ignores the Content-Type header. It was added to Pat's underlying B2F implementation in an attempt to improve this situation. Pat will decode according to the Content-Type (as per standard MIME), so in theory we could open up for different charsets and still be backwards compatible with older Pat versions 🙂

So yes, we can add an option allowing the user to set a different charset. However, this will not work well with other Winlink clients until they also adopt this.

We can certainly look into providing a warning if we detect unicode characters not valid in ISO-8859-1.

@martinhpedersen
Copy link
Member

In my opinion, the best solution would be to get the Winlink ecosystem to migrate to UTF-8 (and set the Content-Type header).

@Mihara
Copy link
Author

Mihara commented Jun 13, 2023

In my opinion, the best solution would be to get the Winlink ecosystem to migrate to UTF-8 (and set the Content-Type header).

Absolutely. But in my experience, this sort of problem is rarely, if ever, even acknowledged in amateur radio software. (Apparently, the huge number of Japanese amateurs doesn't do digital beyond FT8, or if they do, they don't let anyone join.) With something as extensive as Winlink, which seems to be running on Windows XP in the wild as often as not, chances of this happening aren't very high.

A warning if characters won't fit into ISO-8859-1 should be a sensible palliative measure until something can be done about Winlink at large. I'd write it myself and send in a PR, but my skill with golang is currently minimal, bordering on nonexistent.

@martinhpedersen
Copy link
Member

Absolutely. But in my experience, this sort of problem is rarely, if ever, even acknowledged in amateur radio software

Yes, I know 😞

I would love to add opt-in UTF-8 encoding, but given that it will only work P2P between two Pat users I'm afraid it will cause more confusion than doing any good.

A third alternative is to "Q" encode characters that are not representative in ISO-8859-1. Then we would still be compliant with ISO-8859-1, but the characters would be readable in supported clients (i.e. Pat). If we're in luck, they might be decoded properly when reaching regular email clients via SMTP. In fact, this is what we do with non-ascii characters in the subject line. I think that would be more useful than the ? replacement character we output today.

We should still provide the warning though, as Winlink Express users will have a hard time manually deciphering them.

@Mihara
Copy link
Author

Mihara commented Jun 13, 2023

A third alternative is to "Q" encode characters that are not representative in ISO-8859-1.

I'm not entirely sure this is worth the trouble, if only because a response message sent from the Internet side will not be Q-encoded in turn, which will further confuse the users who don't even know Winlink is involved. 😅

@Mihara
Copy link
Author

Mihara commented Jun 13, 2023

For reference: Out of curiosity, I tried to make WoAD send a message containing Cyrillic to [email protected]. The result was rather curious.

Date: Tue, 13 Jun 2023 23:53:00 +0300 (GMT+03:00)
From: SERVICE
To: R2AZE
Message-ID: VN5YOW0SMBGX
Subject: Test Message
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

The following message was received by the Winlink test message reflector...

"Subject: Wait, what does WoAD do about charsets?
Message ID: A10L9Q84GLXI
Date: 2023/06/13 20:51
From: R2AZE
To: TEST=20
Source: R2AZE
CMS Site: CMS-B

=C3=90=C2=9A=C3=90=C2=B8=C3=91=C2=80=C3=90=C2=B8=C3=90=C2=BB=C3=90=C2=BB=C3=
=90=C2=B8=C3=91=C2=86=C3=90=C2=B0!"

Now, some of that is the result of WoAD exporting this to a .eml file, where encoding headers would be required. The interesting part is that UTF-8 encoded Cyrillic was sent as Q-encode of individual single bytes, and then was encoded again on export. :)

@Mihara
Copy link
Author

Mihara commented Jun 14, 2023

Further reference information: https://winlink.userecho.com/communities/1/topics/94-rms-express-decoding-utf-8-charset

RMS Express used to support unicode (UTF-8) in the email body part. This feature go lost somehow. It is very embarrasing to receive email replys stating "what are the many ? (question marks) in your text for? Please give again UTF-8 character support in message body parts. Otherwise RMS Express will NOT be international.

Thank you
Gert, OE3ZK

Dated 12 years ago...

ad1217 added a commit to ad1217/winlink-parser that referenced this issue Dec 16, 2024
RMS Express defaults to this encoding, and pat sends content-type. See
la5nta/pat#413 (comment) for
more details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants