-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove UTF-8 punctuation from hashtags and add test cases for hashtags #1546
Remove UTF-8 punctuation from hashtags and add test cases for hashtags #1546
Conversation
If you remove the recently added withAnimation(.bouncy) statements (16edc3f) from RelayConfigView.swift, the code should compile and run successfully. |
On Wed, Sep 13, 2023 at 12:46:01AM -0700, Jon Marrs wrote:
If you remove the recently added withAnimation(.bouncy) statements (16edc3f) from RelayConfigView.swift, the code should compile and run successfully.
oops sorry about that, must be a iOS17 thing not supported by non-xcode
beta. Fixed.
|
76c0e16
to
56da2be
Compare
Awesome! Comments inline:
On Thu, Aug 31, 2023 at 06:21:31PM -0700, Jon Marrs wrote:
Check for UTF-8 punctuation (such as ellipsis) in addition to regular punctuation in hashtags.
Closes: #1546
---
+static inline int parse_utf8_char(struct cursor *cursor, unsigned int *code_point, unsigned int *utf8_length)
+{
+ u8 first_byte;
+ if (!parse_byte(cursor, &first_byte))
+ return 0; // Not enough data
+
+ // Determine the number of bytes in this UTF-8 character
+ int remaining_bytes = 0;
+ if (first_byte < 0x80) {
+ *code_point = first_byte;
+ return 1;
+ } else if ((first_byte & 0xE0) == 0xC0) {
+ remaining_bytes = 1;
+ *utf8_length = remaining_bytes + 1;
+ *code_point = first_byte & 0x1F;
+ } else if ((first_byte & 0xF0) == 0xE0) {
+ remaining_bytes = 2;
+ *utf8_length = remaining_bytes + 1;
+ *code_point = first_byte & 0x0F;
+ } else if ((first_byte & 0xF8) == 0xF0) {
+ remaining_bytes = 3;
+ *utf8_length = remaining_bytes + 1;
+ *code_point = first_byte & 0x07;
+ } else {
+ remaining_bytes = 0;
+ *utf8_length = 1; // Assume 1 byte length for unrecognized UTF-8 characters
+ // TODO: We need to gracefully handle unrecognized UTF-8 characters
+ printf("Invalid UTF-8 byte: %x\n", *code_point);
+ *code_point = ((first_byte & 0xF0) << 6); // Prevent testing as punctuation
+ return 0; // Invalid first byte
+ }
+
+ // Peek at remaining bytes
+ for (int i = 0; i < remaining_bytes; ++i) {
+ u8 next_byte;
+ if (!(next_byte = peek_char(cursor, i+1))) {
peek_char returns -1 when out of bounds, but I don't see that handled here.
[..]
-static inline int parse_char(struct cursor *cur, char c) {
- if (cur->p >= cur->end)
- return 0;
-
- if (*cur->p == c) {
- cur->p++;
- return 1;
- }
-
- return 0;
-}
-
-static inline int peek_char(struct cursor *cur, int ind) {
- if ((cur->p + ind < cur->start) || (cur->p + ind >= cur->end))
- return -1;
-
- return *(cur->p + ind);
-}
-
This was a bit confusing to review since it looks like these functions
were moved unncessarily. Next time let's just add the new function under
these ones.
Cheers,
Will
|
Check for UTF-8 punctuation (such as ellipsis) in addition to regular punctuation in hashtags. Closes: damus-io#1518
56da2be
to
2f32700
Compare
Ok. I am handling peek_char returning -1 now.
Sorry about moving the functions, but I had to move them above my code to use them in my code. I think this pull request should be ready to merge now. Let me know if there is anything else you want me to change. -Jon |
On Wed, Sep 13, 2023 at 02:45:09PM -0700, Jon Marrs wrote:
> This was a bit confusing to review since it looks like these functions were moved unncessarily. Next time let's just add the new function under these ones.
Sorry about moving the functions, but I had to move them above my code to use them in my code.
no problem
I think this pull request should be ready to merge now. Let me know if there is anything else you want me to change.
Will take a look!
|
Closes: #1546 Signed-off-by: William Casarin <[email protected]>
Thanks! Merged in aa4ecc2
…On Wed, Sep 13, 2023 at 10:40:30AM -0700, Jon Marrs wrote:
> peek_char returns -1 when out of bounds, but I don't see that handled here.
Ok. I am handling peek_char returning -1 now.
> This was a bit confusing to review since it looks like these functions were moved unncessarily. Next time let's just add the new function under these ones.
Sorry about moving the functions, but I had to move them above my code to use them in my code.
I think this pull request should be ready to merge now. Let me know if there is anything else you want me to change.
-Jon
--
Reply to this email directly or view it on GitHub:
#1546 (comment)
You are receiving this because you commented.
Message ID: ***@***.***>
|
Patch 1:
damus-c: remove UTF-8 punctuation from hashtags
Check for UTF-8 punctuation in addition to regular punctuation in hashtags.
Examples of changes:
Closes: #1518
Patch 2:
test: add test cases for ASCII and UTF-8 characters in hashtags