languages: copy over latest version from sourcegraph #828

keegancsmith · 2024-09-17T10:20:21Z

I realised we haven't been updating this package as we updated the package in the sourcegraph repo.

We don't need all the functionality it has, but its easier to just copy paste everything.

Test Plan: go test

I realised we haven't been updating this package as we updated the package in the sourcegraph repo. We don't need all the functionality it has, but its easier to just copy paste everything. Test Plan: go test

github-actions · 2024-09-17T10:21:13Z

Fuzz test failed on commit 78a7ea4. To troubleshoot locally, use the GitHub CLI to download the seed corpus with

gh run download 10901106986 -n testdata

keegancsmith · 2024-09-17T13:50:56Z

Ok there are issues here. The issue is boils down to how we changed normalization between the two versions of the package.

I actually have bigger concerns around this now. We just lookup by query.Language.Language and do no normalization on it. We expect normalization to be done before. This then relies on the normalization being the same at index time as the time we construct the query. I think this is quite easy to mess up, so I need to read more code to see how this can be made more robust.

But essentially I feel like when we read in index data from disk we should normalize by the rules of the running process. Then we do the same normilization when executing a query. There is one other issue where we directly look into the go-enry maps to get out extensions. That code also seems very fragile.

cc @jtibshirani who I think looked at all this code recently.

varungandhi-src

Let's move the lib/codeintel/languages package to a public repo and add a dependency on that? Otherwise, having to maintain this copy separately is going to get tedious quickly, and reviewing changes is also not that simple.

Wrt the normalization, I don't fully understand the requirements -- do we need to maintain some kind of compatibility between successive versions of the package?

keegancsmith · 2024-09-18T08:30:19Z

Let's move the lib/codeintel/languages package to a public repo and add a dependency on that? Otherwise, having to maintain this copy separately is going to get tedious quickly, and reviewing changes is also not that simple.

Agreed, this makes sense to be shared. However, there is a tricky requirement around testing since the implementation of this package is tied directly to the version of go-enry. Maybe what we can do is expose a package which contains the tests, then as part of CI for sourcegraph and zoekt we run those tests to ensure we are compatible with the resolved version of go-enry?

Wrt the normalization, I don't fully understand the requirements -- do we need to maintain some kind of compatibility between successive versions of the package?

We do case-sensitive string comparisons on the language value. I noticed that while doing this upgrade the casing we chose for some stuff changed. I didn't exactly track it down, but I believe it has something to do with things migrating between being inside of go-enry vs our custom support (I believe the code made different decisions around deciding to ToLower a string). I need to look into it more to give you a definite answer, but I saw enough to mark this PR as draft :)

jtibshirani · 2024-09-18T20:22:02Z

Let's move the lib/codeintel/languages package to a public repo and add a dependency on that?

+1 let's do that! Could we commit to languages being our only interface for language info (including the IsVendor, etc. methods we added)? Then we could completely remove a direct dependency on go-enry.

languages: copy over latest version from sourcegraph

78a7ea4

I realised we haven't been updating this package as we updated the package in the sourcegraph repo. We don't need all the functionality it has, but its easier to just copy paste everything. Test Plan: go test

keegancsmith requested review from mmanela, varungandhi-src and a team September 17, 2024 10:20

cla-bot bot added the cla-signed label Sep 17, 2024

keegancsmith marked this pull request as draft September 17, 2024 13:47

varungandhi-src reviewed Sep 17, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

languages: copy over latest version from sourcegraph #828

languages: copy over latest version from sourcegraph #828

keegancsmith commented Sep 17, 2024

github-actions bot commented Sep 17, 2024

keegancsmith commented Sep 17, 2024

varungandhi-src left a comment

keegancsmith commented Sep 18, 2024

jtibshirani commented Sep 18, 2024

languages: copy over latest version from sourcegraph #828

Are you sure you want to change the base?

languages: copy over latest version from sourcegraph #828

Conversation

keegancsmith commented Sep 17, 2024

github-actions bot commented Sep 17, 2024

keegancsmith commented Sep 17, 2024

varungandhi-src left a comment

Choose a reason for hiding this comment

keegancsmith commented Sep 18, 2024

jtibshirani commented Sep 18, 2024