Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that stdin/stdout/stderr are not mapped 1:1 under NODERAWFS #18163

Merged
merged 5 commits into from
Aug 9, 2024

Conversation

kleisauke
Copy link
Collaborator

@kleisauke kleisauke commented Nov 8, 2022

Previously, when linking with -sNODERAWFS, the stdin, stdout and stderr streams were respectively attached to file descriptors 0, 1 and 2 of the running Node process.

createStandardStreams: function() {
FS.streams[0] = FS.createStream({ nfd: 0, position: 0, path: '', flags: 0, tty: true, seekable: false }, 0, 0);
for (var i = 1; i < 3; i++) {
FS.streams[i] = FS.createStream({ nfd: i, position: 0, path: '', flags: 577, tty: true, seekable: false }, i, i);
}
},

This implicitly means that overriding the print, printErr, stdin, stdout and stderr handlers on the incoming module wasn't possible.

emscripten/src/shell.js

Lines 431 to 432 in 52e3805

{{{ makeModuleReceiveWithVar('out', 'print', 'console.log.bind(console)', true) }}}
{{{ makeModuleReceiveWithVar('err', 'printErr', 'console.warn.bind(console)', true) }}}

emscripten/src/library_fs.js

Lines 1496 to 1499 in 52e3805

// Allow Module.stdin etc. to provide defaults, if none explicitly passed to us here
Module['stdin'] = input || Module['stdin'];
Module['stdout'] = output || Module['stdout'];
Module['stderr'] = error || Module['stderr'];

This commit ensures that stdin/stdout/stderr uses the library_tty.js implementation for its printing, which leverages the console object, whenever the above handlers weren't overriden. This matches what is done when filesystem support isn't included.

Resolves: #17688.
Resolves: #22264.

@sbc100
Copy link
Collaborator

sbc100 commented Nov 9, 2022

Is this really the behaviour we want though? Doesn't NODERAWFS imply that you do want write directly the stdin/stdout of the node process?

@kleisauke
Copy link
Collaborator Author

Good point, it looks like this was included in the initial implementation (see #6008), so perhaps there should be some discussion about it first.

Writing directly to stdout/stderr is sometimes not desirable, at least from a library point of view. The behavior is still the same if these handlers on the incoming module are not overridden. For example, console.warn will output to stderr by default and stdin usage is still working on this branch.

$ node -v
v14.18.2
$ node -e "console.warn('hello, world\!')" > /dev/null
hello, world!
$ emcc test/module/test_stdin.c -sNODERAWFS -sFORCE_FILESYSTEM -sEXIT_RUNTIME
$ echo 'hello, world!' | node a.out.js
hello, world!
eof

This PR also unifies its behavior with the other file systems (e.g. WasmFS also uses out() and err() for printing on the Node backend, see system/lib/wasmfs/special_files.cpp) and when file system support is not needed/detected (the mentioned issue occurred only with -sFORCE_FILESYSTEM for a simple printf() program).

@kleisauke
Copy link
Collaborator Author

I'm marking this as a draft, since I'm a bit worried that this may cause performance issues like issue #16755, which doesn't affect the NODERAWFS setting currently.

IIUC, pyodide currently resolves(?) that issue within the above linked PR (/cc @hoodmane).

@kleisauke
Copy link
Collaborator Author

I've marked this PR as ready for review. I think issue #16755 should be addressed separately and not block this one.

test/test_other.py Outdated Show resolved Hide resolved
Previously, when linking with -sNODERAWFS, the stdin, stdout and
stderr streams were respectively attached to file descriptors 0, 1
and 2 of the running Node process.

This implicitly means that overriding the print, printErr, stdin,
stdout and stderr handlers on the incoming module wasn't possible.

This commit ensures that stdin/stdout/stderr uses the library_tty.js
implementation for its printing, which leverages the console object,
whenever the above handlers weren't overriden. This matches what is
done when filesystem support isn't included.

Resolves: emscripten-core#17688.
Resolves: emscripten-core#22264.
@kleisauke kleisauke force-pushed the wasm-vips-issue-23 branch from 4cb3b78 to f927a00 Compare August 8, 2024 08:08
test/unistd/isatty.c Outdated Show resolved Hide resolved
test/test_other.py Outdated Show resolved Hide resolved
test/test_other.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Landing as is since the browser test failures are unrelated (and fixed on main already)

@sbc100 sbc100 merged commit c614fa5 into emscripten-core:main Aug 9, 2024
21 of 27 checks passed
@kleisauke kleisauke deleted the wasm-vips-issue-23 branch August 10, 2024 07:50
@sbc100
Copy link
Collaborator

sbc100 commented Aug 14, 2024

It looks like this is breaking on windows sadly. I guess we don't have any noderawfs tests tagged as @crossplatform so it wasn't getting tested in github CI on windows :(

On the emscripten-releases roller I'm seeing lots of failures: https://logs.chromium.org/logs/emscripten-releases/buildbucket/cr-buildbucket/8740021917817007009/+/u/Emscripten_testsuite__other_/stdout

The all looks like this:

ErrnoError: No such file or directory
    at Object.open (C:\b\s\w\ir\x\t\emtest_jyle_eyr\emscripten_test_other_i3dmf5t9\a.out.js:3282:17)
    at Object.createStandardStreams (C:\b\s\w\ir\x\t\emtest_jyle_eyr\emscripten_test_other_i3dmf5t9\a.out.js:3615:24)
    at Object.init (C:\b\s\w\ir\x\t\emtest_jyle_eyr\emscripten_test_other_i3dmf5t9\a.out.js:3651:12)
    at Object.init (C:\b\s\w\ir\x\t\emtest_jyle_eyr\emscripten_test_other_i3dmf5t9\a.out.js:2199:13)
    at initRuntime (C:\b\s\w\ir\x\t\emtest_jyle_eyr\emscripten_test_other_i3dmf5t9\a.out.js:420:6)
    at doRun (C:\b\s\w\ir\x\t\emtest_jyle_eyr\emscripten_test_other_i3dmf5t9\a.out.js:4785:5)
    at run (C:\b\s\w\ir\x\t\emtest_jyle_eyr\emscripten_test_other_i3dmf5t9\a.out.js:4806:5)
    at runCaller (C:\b\s\w\ir\x\t\emtest_jyle_eyr\emscripten_test_other_i3dmf5t9\a.out.js:4726:19)
    at removeRunDependency (C:\b\s\w\ir\x\t\emtest_jyle_eyr\emscripten_test_other_i3dmf5t9\a.out.js:552:7)
    at receiveInstance (C:\b\s\w\ir\x\t\emtest_jyle_eyr\emscripten_test_other_i3dmf5t9\a.out.js:736:5)
    at receiveInstantiationResult (C:\b\s\w\ir\x\t\emtest_jyle_eyr\emscripten_test_other_i3dmf5t9\a.out.js:754:5) {
  errno: 44,
  code: 'ENOENT'

Which means I think the issue is with createStandardStreams, which is no longer overridden here:

emscripten/src/library_fs.js

Lines 1431 to 1434 in bd8b894

// open default streams for the stdin, stdout and stderr devices
var stdin = FS.open('/dev/stdin', {{{ cDefs.O_RDONLY }}});
var stdout = FS.open('/dev/stdout', {{{ cDefs.O_WRONLY }}});
var stderr = FS.open('/dev/stderr', {{{ cDefs.O_WRONLY }}});

@sbc100
Copy link
Collaborator

sbc100 commented Aug 14, 2024

I think we should perhaps revert, and at the same time add @crossplatform to test_noderawfs in test_other.py, then we can attempt to re-land with windows testing?

@kleisauke
Copy link
Collaborator Author

kleisauke commented Aug 14, 2024

PR #22372 addresses the issue by preventing that /dev/stdin from being incorrectly transformed into \dev\stdin on Windows here:

path = PATH.normalize(path);

$ node -p "require('node:path/win32').normalize('/dev/stdin')"
\dev\stdin
$ node -p "require('node:path/posix').normalize('/dev/stdin')"
/dev/stdin

This ensures that the split here works as intended:

// split the absolute path
var parts = path.split('/').filter((p) => !!p);

kleisauke added a commit to kleisauke/emscripten that referenced this pull request Aug 14, 2024
Avoids mangling `C:/tmp` to `C:\tmp`. Should be safe, the underlying
Windows API can accept either the backslash or slash to separate
directory and file components of a path.

See: https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file

Ensures `NODERAWFS` compatibility on Windows that was regressed since
emscripten-core#18163.
sbc100 added a commit to sbc100/emscripten that referenced this pull request Aug 14, 2024
Without this the decorated function doesn't mirror properties of the
underlying tests, which was breaking things like `@crossplatform`.

See emscripten-core#18163
sbc100 pushed a commit that referenced this pull request Aug 14, 2024
Avoids mangling `C:/tmp` to `C:\tmp`. Should be safe, the underlying
Windows API can accept either the backslash or slash to separate
directory and file components of a path.

See:
https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file

Ensures `NODERAWFS` compatibility on Windows that was regressed since
#18163.
sbc100 added a commit to sbc100/emscripten that referenced this pull request Aug 14, 2024
Without this the decorated function doesn't mirror properties of the
underlying tests, which was breaking things like `@crossplatform`.

See emscripten-core#18163
sbc100 added a commit to sbc100/emscripten that referenced this pull request Aug 14, 2024
Without this the decorated function doesn't mirror properties of the
underlying tests, which was breaking things like `@crossplatform`.

See emscripten-core#18163
sbc100 added a commit that referenced this pull request Aug 14, 2024
Without this the decorated function doesn't mirror properties of the
underlying tests, which was breaking things like `@crossplatform`.

See #18163
@dschuff
Copy link
Member

dschuff commented Aug 16, 2024

This is also breaking the ray test from the LLVM testsuite (the output is here FTR but honestly don't bother as that log isn't useful).
That test (source seems to just use cout to output text and then what appears to be binary data casted to char. I haven't looked in detail at what's happening yet, other than to verify that the output is indeed different.

@dschuff
Copy link
Member

dschuff commented Aug 16, 2024

With a very small reproducer like

int main(void) {
  std::cout << "Some ASCII text\n";
  for (int i = 0; i < 256; ++i)
    std::cout << char(i);
  return 0;
}

You can see differences in outputting 0-valued bytes, and when the counter rolls from 0x7f to 0x80, i.e. out of the ASCII range. So presumably because stdout is open in text mode, there may be a difference in how that gets handled, relative to what the test expects (and it's possible that it's relying on something it shouldn't). But I would still like to understand what's happening here, e.g. is this a behavior difference in the emscripten runtime or node, etc.

kleisauke added a commit to kleisauke/emscripten that referenced this pull request Aug 17, 2024
@kleisauke
Copy link
Collaborator Author

@dschuff Sorry for the breakage, looks like NODERAWFS is lacking some test coverage. I just opened PR #22393 for this.

You can see differences in outputting 0-valued bytes, and when the counter rolls from 0x7f to 0x80, i.e. out of the ASCII range.

Hmm, I was also able to reproduce this without NODERAWFS, it seems to trigger this warning:

if ((u0 & 0xF8) != 0xF0) warnOnce('Invalid UTF-8 leading byte ' + ptrToString(u0) + ' encountered when deserializing a UTF-8 string in wasm memory to a JS string!');

@kleisauke
Copy link
Collaborator Author

FWIW, here's a standalone reproducer for that ASCII issue:

Expected behavior:

$ curl -LO https://github.com/llvm/llvm-test-suite/raw/main/SingleSource/Benchmarks/Misc-C++/Large/ray.cpp
$ em++ -O3 -sEXIT_RUNTIME ray.cpp -o ray.js
$ node ray.js > x.ppm
$ sha256sum x.ppm
cd6fa6fa75b4ba29077f7e4e531b96f17ff0b82f3c70df526c36f748d6a38c5c  x.ppm
$ vips copy x.ppm x.png
x.png

x

Current/actual behavior:

$ em++ -O3 -sEXIT_RUNTIME ray.cpp -o ray.js
$ node ray.js > x.ppm
$ sha256sum x.ppm
acaa1db08ee30dafda03b41c77466b1435fe8ffdd4d88e5a5876e87dff126f58  x.ppm
$ vips copy x.ppm x.png
VipsImage: memory area too small --- should be 262144 bytes, you passed 79485

kleisauke added a commit to kleisauke/emscripten that referenced this pull request Aug 18, 2024
@kleisauke
Copy link
Collaborator Author

... I just revised PR #22393 to avoid library_tty.js for stdio streams when linking with -sNODERAWFS, which should fix this particular ray test.

Of course, the bug still occurs without -sNODERAWFS (i.e. when library_tty.js is used), but let's address that separately.

sbc100 pushed a commit that referenced this pull request Aug 20, 2024
This reverts commit c614fa5.

---

Here's a standalone reproducer:

Before:
```console
$ curl -LO https://github.com/llvm/llvm-test-suite/raw/main/SingleSource/Benchmarks/Misc-C++/Large/ray.cpp
$ em++ -O3 -sEXIT_RUNTIME -sNODERAWFS ray.cpp -o ray.js
$ node ray.js > x.ppm
$ sha256sum x.ppm
acaa1db08ee30dafda03b41c77466b1435fe8ffdd4d88e5a5876e87dff126f58  x.ppm
$ vips copy x.ppm x.png
VipsImage: memory area too small --- should be 262144 bytes, you passed 79485
```

After:
```console
$ em++ -O3 -sEXIT_RUNTIME -sNODERAWFS ray.cpp -o ray.js
$ node ray.js > x.ppm
$ sha256sum x.ppm
cd6fa6fa75b4ba29077f7e4e531b96f17ff0b82f3c70df526c36f748d6a38c5c  x.ppm
$ vips copy x.ppm x.png
```
<details>
  <summary>x.png</summary>


![x](https://github.com/user-attachments/assets/832af0bc-17af-4919-9c3c-72f0683079b8)
</details>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants