Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle multibyte characters in indexing #2619

Merged

Conversation

NotFounds
Copy link
Contributor

Motivation

issue: #1251
prev PR: #2051 (comment)

When a Ruby file contains multibyte characters (such as Japanese, Chinese, emojis, etc.), the "go to definition" and hover features do not work correctly. The definition location or hover documentation will be incorrect.

This issue arises because the current implementation assumes single-byte characters when calculating offsets during index building and document referencing. We need to properly handle multibyte characters to ensure these features work reliably for all users.

Implementation

I implemented this based on the following comment to resolve the handling problem of multibyte characters in indexing.

Let's take the approach of saving the code units with the new Prism API in the index. So essentially, we need to:

  1. Configure the index with the encoding that was negotiated between editor and server. After setting the global state encoding here, you want to grab the @index.configuration object and set the encoding there, so that we can check what is the encoding being used during indexing

  2. You then need to pass the encoding (or maybe the entire config object?) to the declaration listener, where we will use the encoding to invoke the Prism location API location.start_code_units_column(encoding) to get the proper locations for multibyte characters

  3. Finally, we should add a few tests to ensure that we don't accidentally regress. One test per entity type should be okay. These would be:

ref. #2051 (comment)

Automated Tests

Add a test case for each entity types.

Manual Tests

@NotFounds NotFounds requested a review from a team as a code owner September 25, 2024 09:08
@vinistock vinistock added bugfix This PR will fix an existing bug server This pull request should be included in the server gem's release notes labels Sep 25, 2024
@vinistock
Copy link
Member

The PR looks great! Based on the type checking errors, it looks like there are a few places where we still need to update the code to pass the right arguments.

Btw, you can run type checking locally with bundle exec srb tc.

@NotFounds
Copy link
Contributor Author

Oops! I fixed the type error and formatting problem.

Copy link
Member

@vinistock vinistock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Thank you very much for the contribution and for the patience while we couldn't review it.

I have tested this in our Core monolith and it represents only a 5% slow down (about an extra 2 seconds) to finish indexing. This is an acceptable trade off in exchange for the correctness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This PR will fix an existing bug server This pull request should be included in the server gem's release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants