Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode and invisible characters #409

Open
AlexRMU opened this issue Feb 29, 2024 · 2 comments
Open

Unicode and invisible characters #409

AlexRMU opened this issue Feb 29, 2024 · 2 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@AlexRMU
Copy link

AlexRMU commented Feb 29, 2024

Samples

When working with unicode (that is, almost any text), you need to remember many things. Here are a few of them:

  • Invisible characters
    Invisible characters can behave differently on different devices, browsers, and fonts. They are usually invisible, but they still take up space.

    "឴" != "";
    "_឴_" != "__";

    That's how they are highlighted in the VS Code:
    image

  • Combining character and cursed strings
    The display of the combining character depends on many factors. They can often display strangely and break the interface and styles.

    This is how they are currently displayed in the editor:
    image

    That's how they are displayed in the VS Code:
    image

  • Surrogate couples and normalization
    https://en.wikipedia.org/wiki/Unicode_equivalence
    https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize

    "_._".normalize(); // "_._"
    "_._".normalize("NFC"); // "_._"
    "_._".normalize("NFD"); // "_._"
    "_._".normalize("NFKC"); // "_._"
    "_._".normalize("NFKD"); // "_._"
    
    const name1 = "\u0041\u006d\u00e9\u006c\u0069\u0065";
    const name2 = "\u0041\u006d\u0065\u0301\u006c\u0069\u0065";
    name1 != name2; // "Amélie" != "Amélie"
    name1.length != name2.length
    
    const name1NFC = name1.normalize("NFC");
    const name2NFC = name2.normalize("NFC");
    name1NFC == name2NFC; // "Amélie" == "Amélie"
    name1NFC.length == name2NFC.length

    Before and after formatting:
    image


Everything seems to be fine with this in the editor now.
I suggest:

  • highlight invisible characters
  • automatically normalize and decode all strings when pasting or formatting
@josdejong
Copy link
Owner

Would be nice indeed to render invisible characters in a visual way.

Anyone interested in looking into this? Help would be welcome.

@josdejong josdejong added enhancement New feature or request help wanted Extra attention is needed labels Feb 29, 2024
@josdejong josdejong changed the title Unicode Unicode and invisible characters Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants