-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle multibyte characters in indexing #2619
Handle multibyte characters in indexing #2619
Conversation
The PR looks great! Based on the type checking errors, it looks like there are a few places where we still need to update the code to pass the right arguments. Btw, you can run type checking locally with |
Oops! I fixed the type error and formatting problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. Thank you very much for the contribution and for the patience while we couldn't review it.
I have tested this in our Core monolith and it represents only a 5% slow down (about an extra 2 seconds) to finish indexing. This is an acceptable trade off in exchange for the correctness.
Motivation
issue: #1251
prev PR: #2051 (comment)
When a Ruby file contains multibyte characters (such as Japanese, Chinese, emojis, etc.), the "go to definition" and hover features do not work correctly. The definition location or hover documentation will be incorrect.
This issue arises because the current implementation assumes single-byte characters when calculating offsets during index building and document referencing. We need to properly handle multibyte characters to ensure these features work reliably for all users.
Implementation
I implemented this based on the following comment to resolve the handling problem of multibyte characters in indexing.
ref. #2051 (comment)
Automated Tests
Add a test case for each entity types.
Manual Tests