-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hashtag contains linked punctuation #1518
Comments
Did you add a space between your intended hashtag and the subsequent characters? |
No, but hashtags should never include punctuation. I put other punctuation
at the end like periods and exclamation marks and they don’t cause this
issue.
…On Mon, Aug 28, 2023 at 12:43 AM alltheseas ***@***.***> wrote:
Did you add a space between your intended hashtag and the subsequent
characters?
—
Reply to this email directly, view it on GitHub
<#1518 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA3YJQYEJER5MPIOVOXAFZDXXQOYPANCNFSM6AAAAAA377JYEM>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
On Damus? On Twitter? |
Either one, but here’s an example.
The tag #zapathon is followed by a period but it’s not linked.
note1gcv2lskpdf4umthlfrjsqdtplvqmv8mpt3lzwaku405lcz9pnuysyf8sjz
…On Mon, Aug 28, 2023 at 9:22 AM alltheseas ***@***.***> wrote:
I put other punctuation
at the end like periods and exclamation marks and they don’t cause this
issue.
On Damus? On Twitter?
—
Reply to this email directly, view it on GitHub
<#1518 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA3YJQ6XRAKD2GZTVCGQCFTXXSLPNANCNFSM6AAAAAA377JYEM>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Thanks, that helps. what happens Hash tag with one period post hashtag is not added to hashtag. Hash tag with two periods post hashtag is not added to hashtag. Hash tag with three periods post hashtag is added to hashtag. Hash tag with four periods post hashtag: three are added to hashtag, one is not. Cannot replicate with commas |
Three periods as an ellipsis character … is what I believe caused the issue. |
…phanumeric characters such as punctuation marks. Closes: damus-io#1518
note12u7dx9gcm3tapau7ljqk7ymkc2lz7zkh0u3w85ndsed7s3kpl8ys943vz7 |
Here are some examples of how hashtags are displayed with the patch I am proposing (jonmarrs@336215a). Let me know if you would like me to run any other tests. |
Yes, that’s the expected behavior.
|
Thanks @jonmarrs Have you reviewed the contributing guidelines? |
Looks perfect. |
Thanks @alltheseas I read the contributing guidelines, and submitted the patch to [email protected] using git send-email. Now I'm working on running some test cases. |
@jonmarrs if you join the dev chat, it can be helpful sometimes to ping questions with the other devs |
Just wanted to give a quick update. I've been looking at how "..." is encoded in Damus, and it is indeed being encoded as an ellipsis rather than three separate periods. Apparently the ellipsis is encoded in UTF-8 as 0xE2 0x80 0xA6 (https://www.compart.com/en/unicode/U+2026). So those three hex values (0xE2 0x80 0xA6) that encode the ellipsis are not being detected as punctuation. |
On Thu, Aug 31, 2023 at 04:20:30PM -0700, Jon Marrs wrote:
> Three periods as an ellipsis character … is what I believe caused the issue.
Just wanted to give a quick update. I've been looking at how "..." is encoded in Damus, and it is indeed being encoded as an ellipsis rather than three separate periods. Apparently the ellipsis is encoded in UTF-8 as 0xE2 0x80 0xA6 (https://www.compart.com/en/unicode/U+2026). So those three hex values (0xE2 0x80 0xA6) that encode the ellipsis are not being detected as punctuation.
<img width="1013" alt="image" src="https://github.com/damus-io/damus/assets/40682667/d6e68635-5aa1-4d06-8ee1-e18fd42d34e6">
Perhaps we should check for an ellipsis when parsing a hashtag.
yeah we might need to be smarter about checking punctuation, since utf-8
chars are allowed in hashtags and we definitely want to keep that.
|
We could try to detect this block of unicode punctuation (https://www.compart.com/en/unicode/block/U+2000), which contains the ellipsis and other punctuation marks. |
Here is my proposed solution for selectively filtering UTF-8 punctuation: master...jonmarrs:damus:2023-08-hashtag-linked-punctuation |
Out of curiosity, is there any reason to include non-alphanumeric characters in hashtags? |
Should we filter out currency symbols from hashtags as well? |
It seems like the overall strategy should be to decide which UTF-8 categories/blocks we want to filter out from hashtags. Right now I am filtering out General Punctuation. |
Here are some more test results. Twitter: https://twitter.com/Martian_BTC/status/1697676619530506353?s=20 Damus (iPhone app): note1l6x97pcynhvrfzpfg90pnkxa9tl3dhtwek3dgdray2pqmhr6x3kqadea4z Clearly there are some inconsistencies. As far as I know, there is no ISO standard for hashtags. Should the goal be for Damus to replicate how Twitter/X handles hashtags as closely as possible? |
…phanumeric characters such as punctuation marks. Closes: damus-io#1518
Clearly there are some inconsistencies. As far as I know, there is no ISO standard for hashtags. Should the goal be for Damus to replicate how Twitter/X handles hashtags as closely as possible?
probably, theirs works quite well.
|
Please review my pull request (#1546), which I believe closes this issue. |
Check for UTF-8 punctuation (such as ellipsis) in addition to regular punctuation in hashtags. Closes: damus-io#1518
At the end of the hashtag I included a … character which was not intended to be linked to the tag but Damus included it anyway.
note ID:
note1a8y42nfmeyz4gnvmjxta8k2l8dzl4sa9y6m726aa976ludlzq56sjde5dg
The text was updated successfully, but these errors were encountered: