Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unescaped config names with special characters in the URL #2992

Open
marcenacp opened this issue Jul 22, 2024 · 3 comments
Open

Unescaped config names with special characters in the URL #2992

marcenacp opened this issue Jul 22, 2024 · 3 comments
Labels
bug Something isn't working name issue P1 Not as needed as P0, but still important/wanted

Comments

@marcenacp
Copy link
Contributor

When playing with mlcroissant, we observed the following issue:

bigcode/commitpackft has both the configs c and c#. When going to https://huggingface.co/api/datasets/bigcode/commitpackft/parquet/c#/train/0.parquet, it lists https://huggingface.co/api/datasets/bigcode/commitpackft/parquet/c/train/0.parquet (instead of https://huggingface.co/api/datasets/bigcode/commitpackft/parquet/c%23/train/0.parquet).

Should dataset names / config names be escaped in the URLs?

cc @severo @lhoestq

@severo
Copy link
Collaborator

severo commented Jul 22, 2024

sure. Thanks for reporting.

@severo severo added bug Something isn't working P1 Not as needed as P0, but still important/wanted labels Jul 22, 2024
@severo
Copy link
Collaborator

severo commented Jul 22, 2024

@marcenacp
Copy link
Contributor Author

@severo Do I understand correctly that each service should:

  1. deserialize the names from the URL before using the name
  2. call other services with serialized names in the URL?

Do you see a way to fix it more gradually service by service (e.g., starting by /parquet)? How can we make sure that we don't break anybody relying on names not being serialized in the URL?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working name issue P1 Not as needed as P0, but still important/wanted
Projects
None yet
Development

No branches or pull requests

2 participants