Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sub-process output parsing assumes UTF-8 but is not always UTF-8 #3591

Open
3 tasks done
Klaim opened this issue Nov 12, 2024 · 0 comments
Open
3 tasks done

Sub-process output parsing assumes UTF-8 but is not always UTF-8 #3591

Klaim opened this issue Nov 12, 2024 · 0 comments
Assignees
Labels
type::bug Something isn't working

Comments

@Klaim
Copy link
Member

Klaim commented Nov 12, 2024

Troubleshooting docs

  • My problem is not solved in the Troubleshooting docs

Anaconda default channels

  • I do NOT use the Anaconda default channels (pkgs/* etc.)

How did you install Mamba?

Micromamba

Search tried in issue tracker

yes

Latest version of Mamba

  • My problem is not solved with the latest version

Tried in Conda?

I have this problem with Conda as well, without using Mamba

Describe your issue

Through figuring out #3584 we realized that currently when micromamba (or mamba) calls python and then parses it's output, the code assumes that the output is UTF-8. However python is designed to output using the current system/console encoding. When it is not UTF-8 and the data is detected as not being UTF-8 we can get errors, otherwise we are essentially processing incorrect data without explicit errors.
This issue can be most visible on Windows which default encoding is not UTF-8 (it can be set to UTF-8, making the issue disappear), but it can also appear on any other system which default encoding is not UTF-8.

That problem was worked-around so far by adding in the CI scripts environment variables to request python to explicitly output UTF-8 which is why our CI didnt detect the issue when new python-calling code was added to mamba/micromamba, while users can.

#3584 demonstrates that we could set that variable always through the sub-process launching command instead of requesting users to do it from externally. We do know we are calling python at that point and also know what encoding we expect to receive.
We need to generalize this solution to the other sub-process launching, including python but also the other ones. Output of these sub-process when parsed should always be treated as system-encoding (reproc doesnt change that apparently) and we need to make sure that if we parse such output it is understood or converted.

Once that is done, we can removed the ci scripts flags/env variables that hides the problem.

mamba info / micromamba info

micromamba 2.0.3 exposes that faulty behavior

Logs

N/A

environment.yml

N/A

~/.condarc

N/A
@Klaim Klaim added the type::bug Something isn't working label Nov 12, 2024
@Klaim Klaim self-assigned this Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type::bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant