Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupt pool name based on filename #4431

Closed
philrz opened this issue Mar 12, 2023 · 3 comments · Fixed by #5019
Closed

Corrupt pool name based on filename #4431

philrz opened this issue Mar 12, 2023 · 3 comments · Fixed by #5019
Assignees
Labels

Comments

@philrz
Copy link
Contributor

philrz commented Mar 12, 2023

Repro is with Zed commit 7882cee. This was originally originally reported by a community user in a Slack message.

Repro requires the attached test data
Alice test_full-report.json.gz, though it must be gunzip'ed first, as there's something about using this filename as a pool name that triggers the problem.

$ zed -version
Version: v1.6.0-4-g7882cee3

$ rm -rf lake && zed -lake lake serve
$ gunzip Alice\ test_full-report.json.gz

$ zed create "Alice test_full-report.json" 
pool created: Alice test_full-report.json 2MvSa0aIBXymAIFxgQ7TFogdAuD

$ zed ls
Alice test_full-report.json 2MvSa0aIBXymAIFxgQ7TFogdAuD key ts order desc

$ zed use "Alice test_full-report.json"
branch not found

$ zed use 2MvSa0aIBXymAIFxgQ7TFogdAuD
Switched to branch "main" on pool "2MvSa0aIBXymAIFxgQ7TFogdAuD"

$ zed query 'from "Alice test_full-report.json" | count()'
status code 404: AlichftestYfulosreporwtjson: pool not found

The branch not found and corrupt pool name AlichftestYfulosreporwtjson are signs of the bug. I was tinkering with some debug in the Go code and there's something highly suspicious about these all having the same number of characters:

AlichftestYfulosreporwtjson
2MvSa0aIBXymAIFxgQ7TFogdAuD
Alice test_full-report.json
@philrz philrz added bug Something isn't working community labels Mar 12, 2023
@nwt nwt self-assigned this Mar 13, 2023
@nwt
Copy link
Member

nwt commented Mar 13, 2023

This is caused by segmentio/ksuid#74. We can work around that, but before we do, let's see if it gets fixed over there.

@philrz
Copy link
Contributor Author

philrz commented Oct 12, 2023

There doesn't seem to be much activity in the https://github.com/segmentio/ksuid project. I've added the good first issue label in hopes that someone from our community might be interested in putting the workaround on the Zed side. 😄

@philrz philrz closed this as completed Oct 12, 2023
@philrz philrz reopened this Oct 12, 2023
@philrz philrz assigned mattnibs and unassigned nwt Oct 18, 2023
mattnibs added a commit that referenced this issue Oct 19, 2023
This commit changes the lakeapi to use the pool name (which call also be
the string version of the pool ID) over requiring the pool id when
operating on a given pool. Apart from reducing the amount of requests
needed when operating on a remote lake, this also allows for users to
create a pool with ksuid name with Zed knowing how to resolve this
correctly.

Closes #4431
mattnibs added a commit that referenced this issue Feb 3, 2024
The commit fixes an issue with pool looks that made it so pools with a
ksuid-like name could not be looked up for most zed commands. Fix the
logic for lake/api.PoolID so that looking up a pool by ID is tried first
and then falls back to looking up a pool by name if nothing is found.

Closes #4431
mattnibs added a commit that referenced this issue Feb 5, 2024
The commit fixes an issue with pool looks that made it so pools with a
ksuid-like name could not be looked up for most zed commands. Fix the
logic for lake/api.PoolID so that looking up a pool by ID is tried first
and then falls back to looking up a pool by name if nothing is found.

Closes #4431
mattnibs added a commit that referenced this issue Feb 6, 2024
The commit fixes an issue with pool looks that made it so pools with a
ksuid-like name could not be looked up for most zed commands. Fix the
logic for lake/api.PoolID so that looking up a pool by ID is tried first
and then falls back to looking up a pool by name if nothing is found.

Closes #4431
@philrz
Copy link
Contributor Author

philrz commented Feb 14, 2024

Verified in Zed commit 935c334.

Repeating the original repro steps, the name of the pool is no longer a problem.

$ zed -version
Version: v1.13.0-15-g935c3340

$ zed create "Alice test_full-report.json"
pool created: Alice test_full-report.json 2cNInaDXCQXBLNLxJH3f8aXsDdl

$ zed ls
Alice test_full-report.json 2cNInaDXCQXBLNLxJH3f8aXsDdl key ts order desc

$ zed use "Alice test_full-report.json"
Switched to branch "main" on pool "Alice test_full-report.json"

$ zed load Alice.test_full-report.json 
(2/1) 19B/19B 19B/s 100.00%
2cNIp3i8sNCkr4uDKqLRkmYv6Ld committed

$ zed query 'from "Alice test_full-report.json" | count()'
1(uint64)

Thanks @mattnibs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants