-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of finding indexables #2082
Improve performance of finding indexables #2082
Conversation
fabdd05
to
460e046
Compare
I just resolved the merge conflict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great improvements 🚀
@vinistock I believe I've addressed all of your comments |
@vinistock do you need anything else from me on this? |
c104c96
to
1651c8a
Compare
I brought this back to the team for discussion and we reached a consensus. We do not want to copy whatever RuboCop is doing. It's hard to tell if there is legacy code or decisions baked in there that are not relevant for the Ruby LSP. In addition to that, the majority of the gains will come from ignoring the directories at the first level, so we can instead focus on that for the performance improvement. Can we please switch to doing this instead:
def combined_patterns
# code that will return a glob pattern with only the relevant directories combined
# on the first level below Dir.pwd
"**/{lib,app,whatever}/**/*.rb"
end
Remember, we will still need to keep the excluded check loop, since someone could be excluding files that are a few directories below (which is fine). |
That sounds good! I think this will end up being a lot simpler. Avoiding traversing |
@vinistock I just found a slight problem with only excluding top-level folders. So, we want to exclude the Bundler path if it's inside of the pwd. That's something the Thoughts? Do we just default exclude |
2nd question @vinistock. Where'd we land on treating included_patterns and excluded_patterns as relative to |
For the first question: I'm not sure we can always just exclude the entire For the second one, yeah, let's make all patterns relative to the workspace. |
Sounds good, thanks! |
I'll set this to draft until @natematykiewicz has chance to return to it. |
Thanks @andyw8! One thing that's been tripping me up is the lack of normalization to the paths. Vini said a few comments above that everything can be assumed to be relative to the workspace. The default path is a full absolute path to the directory, but I believe users would be passing in relative paths. Perhaps you guys could get all paths to either be relative paths or absolute paths, instead of both? Then I'd have a much easier time doing this PR. I feel like I'm having to make a lot of decisions trying to normalize these paths (do the instance variables hold relative or absolute paths?), and realizing it's probably out of scope of this performance improvement PR anyways. |
That last comment was about both the |
I just saw that #2424 was merged. That should make this PR much simpler. |
3a8c7fe
to
4279f64
Compare
@vinistock @andyw8 I just force pushed to this branch. I started over from This change actually made this all massively easier. No longer having to consider leading All in all, this PR is much simpler now. |
Note, since we're only excluding the very top-level directories, |
804afb0
to
39ca825
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great
@vinistock just so you know, I think I've handled all of your feedback |
The changes look good to me, but could you please confirm if this produced any significant speed ups? I benchmarked these changes in two ways and saw no difference for our app, so I just want to confirm that it will actually achieve the desired outcome. Benchmark 1
# frozen_string_literal: true
require "bundler/setup"
require "benchmark"
require "ruby_lsp/load_sorbet"
require "ruby_lsp/internal"
T::Utils.run_all_sig_blocks
index = RubyIndexer::Index.new
RubyVM::YJIT.enable
r = Benchmark.realtime do
index.index_all
end
puts r Benchmark 2 (IPS) This needs
# frozen_string_literal: true
require "bundler/setup"
require "benchmark/ips"
require "ruby_lsp/load_sorbet"
require "ruby_lsp/internal"
T::Utils.run_all_sig_blocks
index = RubyIndexer::Index.new
RubyVM::YJIT.enable
Benchmark.ips do |x|
x.report("old") { index.configuration.indexables }
x.report("new") { index.configuration.indexables }
x.hold!("tmp_results")
x.compare!
end |
Currently, all folders and files in the current tree are turned into IndexablePath, and then excluded files are filtered out after. When there are large file trees that are meant to be excluded, this results in a lot of unnecessary work. ActiveStorage stores files in the `tmp` directory in many many small folders. So does Bootsnap. Additionally, node_modules can become quite large. Ruby LSP has to traverse all of these files, even though the entire directory should just be ignored. Instead we can skip any top-level directories whose paths have been excluded. We still need to loop through all IndexablePath objects compare them to the exclude_patterns, in case nested folders or file name patterns were excluded. Still, skipping some large top-level directories proves to be a big performance improvement. Before this PR in my Rails app, `indexables` took 76 seconds to run. Now it takes, 0.17 seconds. Before and after code both return the same exact file list.
1bf7699
to
ed45843
Compare
So, how much this change helps entirely depends on how big the excluded directories are. I recently wiped my tmp/storage because it became unbearable to use RubyLsp with how large it was. I just rebased my branch. On Looking at just the # "main" branch code
Benchmark.measure { RubyIndexer::Configuration.new.indexables }
=>
#<Benchmark::Tms:0x00000001197b3ee0
@cstime=0.0,
@cutime=0.0,
@label="",
@real=4.37744399998337,
@stime=1.801075,
@total=2.530652,
@utime=0.7295769999999999>
# my changes
Benchmark.measure { RubyIndexer::Configuration2.new.indexables }
=>
#<Benchmark::Tms:0x000000011b9da8c0
@cstime=0.0,
@cutime=0.0,
@label="",
@real=0.7563209999352694,
@stime=0.19453800000000054,
@total=0.754276,
@utime=0.5597379999999994> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution! Let's move forward with this since you are seeing gains
Thank you @vinistock! |
Currently, all folders and files in the current tree are turned into IndexablePath, and then excluded files are filtered out after.
When there are large file trees that are meant to be excluded, this results in a lot of unnecessary work.
ActiveStorage stores files in the
tmp
directory in many many small folders. So does Bootsnap. Ruby LSP has to traverse all of these files, even though the entire directory should just be ignored.Rubocop has solved this by breaking the
includes
patterns up into many patterns, applying the exclusions before theDir.glob
, so I followed in their footsteps. This works great for exclusions that end in "**/*". We still need to loop through all IndexablePath objects and see if they're excluded, in the case that an extension was provided on the excluded path, but this can cut down load time dramatically.Before this PR in my Rails app,
indexables
took 76 seconds to run. Now it takes, 0.19 seconds. Before and after code both return the same exact file list.Additionally, I added
node_modules
to the list of excluded trees, since that can be very large and never includes Ruby files.I also removed the
*.rb
from the bundler path. Having a file extension on that means we need to scan all files. But we simply want to ignore the entire bundler path tree. I kind of wonder if we should always replace*.rb
with*
, to help people improve performance. Excluding an actual file name or partial file name makes sense. But excluding the only file extension we scan means we can do it faster by excluding the whole folder.Motivation
Opening a ruby file caused my LSP server to print "Ruby LSP: indexing files" for 76 seconds at 0% before the progress bar starts moving.
Implementation
I knew that Rubocop has solved this problem before, so I looked at this file and followed what they did.
Automated Tests
I added tests for the new pattern
exclude_pattern
that gets used withfnmatch
, while ensuring I didn't break any existing tests.Manual Tests
I made a file that has both implementations of
indexables
on my computer. Then ran this in the Rails Console:So here you can see that it finds the same ~14k files in 0.25% of the time.