Handle multibyte characters in indexing #2619

NotFounds · 2024-09-25T09:08:24Z

Motivation

issue: #1251
prev PR: #2051 (comment)

When a Ruby file contains multibyte characters (such as Japanese, Chinese, emojis, etc.), the "go to definition" and hover features do not work correctly. The definition location or hover documentation will be incorrect.

This issue arises because the current implementation assumes single-byte characters when calculating offsets during index building and document referencing. We need to properly handle multibyte characters to ensure these features work reliably for all users.

Implementation

I implemented this based on the following comment to resolve the handling problem of multibyte characters in indexing.

Let's take the approach of saving the code units with the new Prism API in the index. So essentially, we need to:

Configure the index with the encoding that was negotiated between editor and server. After setting the global state encoding here, you want to grab the @index.configuration object and set the encoding there, so that we can check what is the encoding being used during indexing

You then need to pass the encoding (or maybe the entire config object?) to the declaration listener, where we will use the encoding to invoke the Prism location API location.start_code_units_column(encoding) to get the proper locations for multibyte characters

Finally, we should add a few tests to ensure that we don't accidentally regress. One test per entity type should be okay. These would be:

A class and a module

A constant

An instance variable

A method

ref. #2051 (comment)

Automated Tests

Add a test case for each entity types.

Manual Tests

lib/ruby_indexer/lib/ruby_indexer/declaration_listener.rb

lib/ruby_indexer/test/classes_and_modules_test.rb

vinistock · 2024-09-25T14:20:46Z

The PR looks great! Based on the type checking errors, it looks like there are a few places where we still need to update the code to pass the right arguments.

Btw, you can run type checking locally with bundle exec srb tc.

NotFounds · 2024-09-26T00:30:57Z

Oops! I fixed the type error and formatting problem.

vinistock

This is great. Thank you very much for the contribution and for the patience while we couldn't review it.

I have tested this in our Core monolith and it represents only a 5% slow down (about an extra 2 seconds) to finish indexing. This is an acceptable trade off in exchange for the correctness.

NotFounds added 2 commits September 24, 2024 17:56

Add test for indexing multibyte characters

c54b0fb

Modify to consider encoding in indexing

63df3c1

NotFounds requested a review from a team as a code owner September 25, 2024 09:08

NotFounds requested review from alexcrocha and vinistock September 25, 2024 09:08

NotFounds commented Sep 25, 2024

View reviewed changes

lib/ruby_indexer/lib/ruby_indexer/declaration_listener.rb Show resolved Hide resolved

NotFounds commented Sep 25, 2024

View reviewed changes

lib/ruby_indexer/test/classes_and_modules_test.rb Show resolved Hide resolved

vinistock added bugfix This PR will fix an existing bug server This pull request should be included in the server gem's release notes labels Sep 25, 2024

This was referenced Sep 25, 2024

Hover informations seems offset by non-ascii characters #1355

Closed

Definition jumps are not possible with files containing Japanese characters. #1347

Closed

NotFounds added 2 commits September 26, 2024 09:20

Fix type error

57f4f9f

Format code

d15ede6

vinistock approved these changes Sep 30, 2024

View reviewed changes

vinistock merged commit b4280d2 into Shopify:main Sep 30, 2024
21 checks passed

vinistock mentioned this pull request Sep 30, 2024

Better handle multibyte character locations #1251

Closed

NotFounds mentioned this pull request Oct 4, 2024

Handle multibyte characters in RubyDocument #2669

Merged

vinistock mentioned this pull request Oct 4, 2024

Ruby LSP Indexing is very slow on version 0.19.0 #2671

Closed

renovate bot mentioned this pull request Nov 2, 2024

fix(deps): update non-major dependencies Kong/docs.konghq.com#8090

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle multibyte characters in indexing #2619

Handle multibyte characters in indexing #2619

NotFounds commented Sep 25, 2024

vinistock commented Sep 25, 2024

NotFounds commented Sep 26, 2024

vinistock left a comment

Handle multibyte characters in indexing #2619

Handle multibyte characters in indexing #2619

Conversation

NotFounds commented Sep 25, 2024

Motivation

Implementation

Automated Tests

Manual Tests

vinistock commented Sep 25, 2024

NotFounds commented Sep 26, 2024

vinistock left a comment

Choose a reason for hiding this comment