Skip to content
This repository has been archived by the owner on Sep 18, 2021. It is now read-only.

Broken Conformance: URLs with unicode chars in them #104

Open
yaauie opened this issue Dec 19, 2013 · 1 comment
Open

Broken Conformance: URLs with unicode chars in them #104

yaauie opened this issue Dec 19, 2013 · 1 comment

Comments

@yaauie
Copy link
Contributor

yaauie commented Dec 19, 2013

A conformance spec is currently broken on master.

  1) Failure:
test_urls Autolink URLs with unicode chars in them(ConformanceTest) [test/conformance_test.rb:126]:
<"See: <a href=\"http://example.com/tsa-pre✓™\">http://example.com/tsa-pre✓™</a> is a link"> expected but was
<"See: <a href=\"http://example.com/tsa-pre\">http://example.com/tsa-pre</a>✓™ is a link">

In this case, the unicode characters are not being included in the matched URL when we expect them to be.

I believe @psychs addressed the rationale behind what should and shouldn't match in #91, but I don't believe the spec is clear enough which unicode codepoint ranges should be considered part of the URL, and which shouldn't.

A solution to this issue would be to fix the spec; alternatively, I can take on the task of fixing it given documentation on what codepoint ranges should be considered part of the URL.

@jakl
Copy link
Contributor

jakl commented Dec 20, 2013

Our international team will be looking more deeply into this after the holidays. I'm not sure offhand how to pick the best unicode ranges without understanding all supported languages coupled with research. A safe immediate fix for this test is to link non-language characters like ✓™

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants