-
-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Character references in autolinks #727
Comments
Ah, I filed an issue about exactly the same problem in commonmark/commonmark.js#263. So it seems that the intention is to supported character references inside autolinks. Maybe we can add an example to the spec with a character reference in an autolink? |
I’m pretty strongly in the camp that character references should not work in autolinks. I don’t think there should be one edge case where backslashes don’t work but characters references do? |
I think the motivation was that autolinks can be URLs that you just copy from some other source, and these might contain character references. |
I’m not sure about that reasoning: they might as well be fine unicode, particularly when coming from an address bar. I could see problems with double decoding. |
On motivation: do you mean cmark is more in line with your motivation? That the absence in cmjs was because it was forgotten? That no test for it in the spec was intended? What do you think about the test on character escapes but no test of character references? |
Yes, in the linked issue, I said I thought that cmark was getting it right. |
I see why it would be nice if entities got resolved in exactly the places backslash escapes do -- but again, this is motivated by a desire to support URL copy-pasting. |
Consistency with character escapes is most important to me. a <https://example.com>
b <https://example.com>
c <https://example.com>
d <https://example.com>
e <some.[email protected]>
f <some.user@example.com> Note that C and D are not allowed per CommonMark as the protocol (part before and including |
@jgm IMO there is an equally valid argument against character reference if we are talking about copy-pasting: one could also copy-paste from a place that doesn't interpret character references, like the browser's URL bar, or a displayed webpage (as opposed to the HTML source). |
@xiaq - granted. |
Granting that there are these two possible sources for copy/paste, I think my reasoning was that if a valid character reference occurs in a copied URL, it's by far likeliest that its source is raw HTML rather than the browser's URL bar or a displayed web page. How often does one want to display something like |
I mostly care about consistency, so then I’d also ask: how often does one want to display something like But thinking some more about this, while the motivation of “allow copy/paste” is a good one, to get there I believe we should then also allow unicode letters/punctuation in email atext, and unicode letters + at likely |
The spec doesn't specify whether character references are supported inside autolinks. The following Markdown:
is rendered as the following by cmark:
but as the following by commonmark.js:
The text was updated successfully, but these errors were encountered: