Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling __FILE__ in code size tests in CircleCI #23195

Open
aheejin opened this issue Dec 17, 2024 · 20 comments · May be fixed by #23212 or #23196
Open

Handling __FILE__ in code size tests in CircleCI #23195

aheejin opened this issue Dec 17, 2024 · 20 comments · May be fixed by #23212 or #23196

Comments

@aheejin
Copy link
Member

aheejin commented Dec 17, 2024

C++'s __FILE__ macro expands to the file's path. Whether that's a relative path or an absolute one depends on what's given to the compiler. For example,

$ clang++ ../../test.cpp

will expand test.cpp's __FILE__ into ../../test.cpp, whereas

$ clang++ ~/test.cpp

will expand it into /whatever/absolute/path/test.cpp.

In system_lib.py, we use both of them depending on the paths.

In CircleCI, tests that depend on build-libs, which is a part of build-linux, which uses Ninja, will be built with absolute paths when code includes __FILE__. This includes all core, other, and browser tests. Other tests (test-mac-arm64, test-windows, ...) don't use build-libs so they use build_objects and thus will include relative paths.

So, this will produce different builds for different CircleCI tests. deterministic_paths parameter, which uses -ffile-prefix-map in system_lib.py makes all relative paths the same string and all absolute paths the same string, but does not make an absolute path and a relative path the same. So even if we make CircleCI always set deterministic_paths, the problem remains:

if self.deterministic_paths:
source_dir = utils.path_from_root()
if batch_inputs:
relative_source_dir = os.path.relpath(source_dir, build_dir)
cflags += [f'-ffile-prefix-map={relative_source_dir}/=']
cflags += [f'-ffile-prefix-map={source_dir}=/emsdk/emscripten',
'-fdebug-compilation-dir=/emsdk/emscripten']

This problem was brought into attention because the new LLVM 19's libc++abi adds more usage of __FILE__ and one of them ended up in other.test_minimal_runtime_code_size_hello_embind, causing the code sizes to be different between different CircleCI tests. We can make this usage an empty string in release mode (which Zig did) but this does not fundamentally fix the problem that __FILE__ can end up in other code size tests. Currently we seem to use __FILE__ in several places in libraries:

aheejin@aheejin:~/emscripten/system/lib$ grep __FILE__ * -R
compiler-rt/lib/sanitizer_common/sanitizer_internal_defs.h:      __sanitizer::CheckFailed(__FILE__, __LINE__, \
compiler-rt/lib/builtins/int_util.h:#define compilerrt_abort() __compilerrt_abort_impl(__FILE__, __LINE__, __func__)
libc/musl/include/assert.h:#define assert(x) ((void)((x) || (__assert_fail(#x, __FILE__, __LINE__, __func__),0)))
libcxx/include/__assert:       : _LIBCPP_ASSERTION_HANDLER(__FILE__ ":" _LIBCPP_TOSTRING(__LINE__) ": assertion " _LIBCPP_TOSTRING(            \
libcxx/src/verbose_abort.cpp:  __assert2(__FILE__, __LINE__, __func__, buffer);
libcxxabi/src/abort_message.h:        ::abort_message("%s:%d: %s", __FILE__, __LINE__, __msg);                                                       \
libcxxabi/src/abort_message.cpp:    __assert2(__FILE__, __LINE__, __func__, buffer);
mimalloc/include/mimalloc/types.h:#define mi_assert(expr)     ((expr) ? (void)0 : _mi_assert_fail(#expr,__FILE__,__LINE__,__func__))
wasmfs/support.h:  wasmfs::handle_unreachable(msg, __FILE__, __LINE__)

Not sure what is the best way to proceed:

  1. Make __FILE__ an empty string in release mode via adding -D__FILE__="" in system_lib.py.
  2. Make __FILE__ an empty string when the environment variable CIRCLECI is set, which is set in all CIrcleCI tests. But then we have to make sure to clear cache and set CIRCLECI when rebaselining code size tests on our local machine, which is a pain, like
      $ ./emcc --clear-cache
      $ CIRCLECI=1 ./tools/maint/rebaseline_tests.py
  3. ???

I personally prefer 1, which can be as simple as #23196.

aheejin added a commit to aheejin/emscripten that referenced this issue Dec 17, 2024
This removes all absolute and relative file paths generated by
`__FILE__` macro from release build, making the release build
reproducible and the results of code size tests consistent.

Without `-Wno-builtin-macro-redefined`, the build errors out:
```console
In file included from <built-in>:365:
<command line>:3:9: error: redefining builtin macro [-Werror,-Wbuiltin-macro-redefined]
    3 | #define __FILE__ ""
      |         ^
1 error generated.
```

The more detailed description on why this is necessary is in emscripten-core#23195.
@aheejin
Copy link
Member Author

aheejin commented Dec 17, 2024

cc @dschuff @sbc100

@aheejin aheejin linked a pull request Dec 17, 2024 that will close this issue
@sbc100
Copy link
Collaborator

sbc100 commented Dec 17, 2024

I think the better approach is to try to make deterministic_paths actually deterministic regardless of weather we use batch builds.

Alternatively, we should just have CI never use batch building?

@sbc100
Copy link
Collaborator

sbc100 commented Dec 17, 2024

Actually its not just CI.. anyone who want to run this test needs to have the same contents on libc++abi.a.

Another approach then would be to somehow patch out or remove the usage of __FILE__ in libc++abi?

@sbc100
Copy link
Collaborator

sbc100 commented Dec 17, 2024

I found some good info on this problem here: https://reproducible-builds.org/docs/build-path/

I sounds like -fmacro-prefix-map might be the answer?

@sbc100
Copy link
Collaborator

sbc100 commented Dec 17, 2024

It batch builds are really the problem here I'm kind of tempted to remove them, or make them opt-in only, since they cause me other headaches from time to time myself.

@dschuff
Copy link
Member

dschuff commented Dec 17, 2024

If we can switch batch builds to use absolute paths like the non-batch and Ninja builds, then we could use absolute paths everywhere, and the appropriate command-line flags to make those paths deterministic (or not, for local development debugging). The comment for why it uses relative paths is to avoid response files, but given that we need response file support anyway (because the command-line lengths might be too long anyway) what is the real advantage of that?

@sbc100
Copy link
Collaborator

sbc100 commented Dec 17, 2024

Sure at the very least we could use absolute paths when deterministic_paths is set?

I actually prefer absolute paths because one of the annoying things about the batch builds is that filenames in the error message it produces are not open-able (because they are relative to some build directory, not my directory where I ran the command).

@dschuff
Copy link
Member

dschuff commented Dec 17, 2024

Actually I think if we just make the new value (the strings on the RHS of the '=') of line 499 and 500 of

if self.deterministic_paths:
source_dir = utils.path_from_root()
if batch_inputs:
relative_source_dir = os.path.relpath(source_dir, build_dir)
cflags += [f'-ffile-prefix-map={relative_source_dir}/=']
cflags += [f'-ffile-prefix-map={source_dir}=/emsdk/emscripten',
'-fdebug-compilation-dir=/emsdk/emscripten']

match, then the relative and absolute paths should end up matching in deterministic mode, and non-deterministic mode should be unchanged.

@sbc100
Copy link
Collaborator

sbc100 commented Dec 17, 2024

That would be a great solution!

One downside is that the codesize tests will fail without deterministic_paths set... I'm not sure how to fix that.

@sbc100
Copy link
Collaborator

sbc100 commented Dec 17, 2024

How about we use -fmacro-prefix-map in all cases and -ffile-prefix-map/-fdebug-compilation-dir only for deterministic_paths?

@aheejin
Copy link
Member Author

aheejin commented Dec 17, 2024

I think the better approach is to try to make deterministic_paths actually deterministic regardless of weather we use batch builds.
Alternatively, we should just have CI never use batch building?

Some of the CI uses batch building while others use Ninja, so removing batch building from CI does not fix the problem. The ones that use Ninja are the ones dependent on build-libs (and build-linux):

- test-sanity:
requires:
- build-linux
- test-posixtest:
requires:
- build-linux
- test-core0:
requires:
- build-linux
- test-core2:
requires:
- build-linux
- test-core3:
requires:
- build-linux
- test-wasm64:
requires:
- build-linux
- test-wasm64-4gb:
requires:
- build-linux
- test-wasm2js1:
requires:
- build-linux
- test-other:
requires:
- build-linux
- test-browser-chrome:
requires:
- build-linux
- test-browser-chrome-2gb:
requires:
- build-linux
- test-browser-chrome-wasm64:
requires:
- build-linux
- test-browser-chrome-wasm64-4gb:
requires:
- build-linux
- test-browser-firefox:
requires:
- build-linux

Actually its not just CI.. anyone who want to run this test needs to have the same contents on libc++abi.a.
Another approach then would be to somehow patch out or remove the usage of __FILE__ in libc++abi?

That's basically what Zig guys did (at least for the release mode): ziglang/zig@98a30ac
This can be a quick solution, which I can do, but we still have some uses of __FILE__ in other parts of libraries. They just don't happen to affect the current code size tests.

I found some good info on this problem here: https://reproducible-builds.org/docs/build-path/
I sounds like -fmacro-prefix-map might be the answer?

I think we are already using -ffile-prefix-map which implies -fmacro-prefix-map? Even if we are using them we are using it separately for relative paths and absolute paths, so it makes all different relative paths into a single path and all different absolute paths into a single one but doesn't make an absolute path and a relative path into a single path. Maybe we should consider doing that.

It batch builds are really the problem here I'm kind of tempted to remove them, or make them opt-in only, since they cause me other headaches from time to time myself.

We only need to fix the relative/absolute path thing so I guess we don't need to remove the batch build itself unless it has other problems? But I'm not very familiar with that. It looks the batch build started using relative paths in order not to switch to response file too often.

If we can switch batch builds to use absolute paths like the non-batch and Ninja builds, then we could use absolute paths everywhere, and the appropriate command-line flags to make those paths deterministic (or not, for local development debugging). The comment for why it uses relative paths is to avoid response files, but given that we need response file support anyway (because the command-line lengths might be too long anyway) what is the real advantage of that?

Sure at the very least we could use absolute paths when deterministic_paths is set?

I actually prefer absolute paths because one of the annoying things about the batch builds is that filenames in the error message it produces are not open-able (because they are relative to some build directory, not my directory where I ran the command).

If we use absolute paths everywhere that's not gonna be deterministic because they are different in different CI bots (e.g. test-other and test-mac-arm64) and also different from our local build, no?

Actually I think if we just make the new value (the strings on the RHS of the '=') of line 499 and 500 of

if self.deterministic_paths:
source_dir = utils.path_from_root()
if batch_inputs:
relative_source_dir = os.path.relpath(source_dir, build_dir)
cflags += [f'-ffile-prefix-map={relative_source_dir}/=']
cflags += [f'-ffile-prefix-map={source_dir}=/emsdk/emscripten',
'-fdebug-compilation-dir=/emsdk/emscripten']

match, then the relative and absolute paths should end up matching in deterministic mode, and non-deterministic mode should be unchanged.

This can be good solution, but this will still generate an "not openable" file paths @sbc100 mentioned. Would that be OK? Also you need to clear cache and rebuild with the deterministic mode every time rebaselining code size tests.

@dschuff
Copy link
Member

dschuff commented Dec 17, 2024

-ffile-prefix-map subsumes both -fmacro-prefix-map and -fdebug-prefix-map. So I think it's what we want. I think the only problem is that in deterministic mode, the paths that started out relative differ from the paths that started out absolute. So making the RHS of those match as I mentioned would fix that.

@sbc100
Copy link
Collaborator

sbc100 commented Dec 17, 2024

but we still have some uses of __FILE__ in other parts of libraries

BTW, some of these are never used. For example I just tried to make a test case using compilerrt_abort which looks like it uses __FILE__:

int_util.h:#define compilerrt_abort() __compilerrt_abort_impl(__FILE__, __LINE__, __func__)

But actually it ignores all its arguments in most configurations (including ours).

@sbc100
Copy link
Collaborator

sbc100 commented Dec 17, 2024

-ffile-prefix-map subsumes both -fmacro-prefix-map and -fdebug-prefix-map. So I think it's what we want.

But for my local builds I want real debug path in by debug into, but I still want to be able to pass all the code size tests, so I think I want just the -fmacro-prefix-map, which is why I suggested adding -fmacro-prefix-map by default even when deterministic_paths is not set.

@sbc100
Copy link
Collaborator

sbc100 commented Dec 17, 2024

Alternatively we just could just set deterministic_paths paths all the time (make it the only option). I don't actually do DWARF debugging enough to care that much about my local builds having local paths.

@aheejin
Copy link
Member Author

aheejin commented Dec 17, 2024

@sbc100 So where do you want the real local absolute paths to be? In debug info, or in macro __FILE__, or neither? Not sure if I understand you correctly. If you don't care about the debug info, do you think we can always use the fake /emsdk/emsripten paths for all occasions and remove all the uses of relative or "real" local absolute paths?

@aheejin
Copy link
Member Author

aheejin commented Dec 17, 2024

@sbc100 So where do you want the real local absolute paths to be? In debug info, or in macro __FILE__, or neither? Not sure if I understand you correctly. If you don't care about the debug info, do you think we can always use the fake /emsdk/emsripten paths for all occasions and remove all the uses of relative or "real" local absolute paths?

Also you added the option to opt out of the batch build due to the path problem: #20929
(I was gonna link this in the original issue and I ended up linking a wrong one)

@dschuff
Copy link
Member

dschuff commented Dec 17, 2024

-ffile-prefix-map subsumes both -fmacro-prefix-map and -fdebug-prefix-map. So I think it's what we want.

But for my local builds I want real debug path in by debug into, but I still want to be able to pass all the code size tests, so I think I want just the -fmacro-prefix-map, which is why I suggested adding -fmacro-prefix-map by default even when deterministic_paths is not set.

oh I see what you mean. Yeah, adding -fmacro-prefix-map unconditionally would be fine with me.
If you do debugging locally, you can just configure your debugger with a substitution rule that replaces /emsdk/emscripten with the actual path to your local emscripten. This is what I would expect users to do too if they want to debug libc.

@sbc100
Copy link
Collaborator

sbc100 commented Dec 17, 2024

@sbc100 So where do you want the real local absolute paths to be? In debug info, or in macro __FILE__, or neither? Not sure if I understand you correctly. If you don't care about the debug info, do you think we can always use the fake /emsdk/emsripten paths for all occasions and remove all the uses of relative or "real" local absolute paths?

If I was going to care about real absolute paths it would only be in the debug info.

However, give the number of times I actually use the debug into I'm not sure its really worth that much to me to have this.

@sbc100
Copy link
Collaborator

sbc100 commented Dec 17, 2024

Also you added the option to opt out of the batch build due to the path problem: #20929
(I was gonna link this in the original issue and I ended up linking a wrong one)

I added that option because the batch builds was making it hard to read the error message and build just one object at a time. It wasn't so much about that paths, although that is somewhat of an issue yes.

aheejin added a commit to aheejin/emscripten that referenced this issue Dec 18, 2024
When `deterministic_paths` is set, we are currently using
`-ffile-prefix-map` to produce the same path in data and debug info. In
the case of absolute paths, their emscripten path is replaced with a
fake path `/emsdk/emscripten`, and in the case of relative paths, all
path relative to the emscripten directory is removed, so
`../../system/lib/somefile.c` becomes `system/lib/somefiles.c`.
https://github.com/emscripten-core/emscripten/blob/f66b5d706e174d9e5cc6122c06ea29dcd2735cd0/tools/system_libs.py#L472-L477
https://github.com/emscripten-core/emscripten/blob/f66b5d706e174d9e5cc6122c06ea29dcd2735cd0/tools/system_libs.py#L495-L501

But this does not make relative paths and absolute paths the same, which
can be a problem when data generated by `__FILE__` macro is included in
one of code size tests. This problem is discussed in emscripten-core#23195.

This PR makes `__FILE__` macro produce the same data in all cases, so
that it wouldn't change any results for code size tests. This is done by
`-fmacro-prefix-map`.

For the debug info, when `deterministic_paths` is set, this uses a fake
path `/emsdk/emscripten` as a base emscripten path. When
`deterministic_paths` is not set, this uses real local absolute paths in
the debug info. This allows local developers to see their real paths in
the debug info while continuing to use the same (fake) path
`/emsdk/emscripten` we have used so far for the release binaries. Users
can set their debug base path to whatever path they like, but given that
we have used `/emsdk/emscripten` in release binaries for a while, it is
possible that some users have set their configuration with this
directory, so it would be better not to break them by changing it.

This is basically implementing what's suggested in
emscripten-core#23195 (comment)
and
emscripten-core#23195 (comment)

This also turns `deterministic_paths` on for the Ninja path in
embuilder for consistency with the non-Ninja path.

Fixes #23915.
aheejin added a commit to aheejin/emscripten that referenced this issue Dec 18, 2024
When `deterministic_paths` is set, we are currently using
`-ffile-prefix-map` to produce the same path in data and debug info. In
the case of absolute paths, their emscripten path is replaced with a
fake path `/emsdk/emscripten`, and in the case of relative paths, all
path relative to the emscripten directory is removed, so
`../../system/lib/somefile.c` becomes `system/lib/somefiles.c`.
https://github.com/emscripten-core/emscripten/blob/f66b5d706e174d9e5cc6122c06ea29dcd2735cd0/tools/system_libs.py#L472-L477
https://github.com/emscripten-core/emscripten/blob/f66b5d706e174d9e5cc6122c06ea29dcd2735cd0/tools/system_libs.py#L495-L501

But this does not make relative paths and absolute paths the same, which
can be a problem when data generated by `__FILE__` macro is included in
one of code size tests. This problem is discussed in emscripten-core#23195.

This PR makes `__FILE__` macro produce the same data in all cases by
using the fake path `/emsdk/emscripten` as its base, so that it wouldn't
change any results for code size tests. This is done by
`-fmacro-prefix-map`. This differs from the current behavior because we
don't handle relative and absolute paths differently.

For the debug info, when `deterministic_paths` is set, this uses a fake
path `/emsdk/emscripten` as a base emscripten path. When
`deterministic_paths` is not set, this uses real local absolute paths in
the debug info. This allows local developers to see their real paths in
the debug info while continuing to use the same (fake) path
`/emsdk/emscripten` we have used so far for the release binaries. Users
can set their debug base path to whatever path they like, but given that
we have used `/emsdk/emscripten` in release binaries for a while, it is
possible that some users have set their configuration with this
directory, so it would be better not to break them by changing it. This
is done by `-ffile-prefix-map` as we have done so far, which is an alias
for both `-fdebug-prefix-map` and `-fmacro-prefix-map`.

This is basically implementing what's suggested in
emscripten-core#23195 (comment)
and
emscripten-core#23195 (comment)

This also turns `deterministic_paths` on for the Ninja path in
embuilder for consistency with the non-Ninja path.

Fixes #23915.
@aheejin aheejin linked a pull request Dec 18, 2024 that will close this issue
aheejin added a commit to aheejin/emscripten that referenced this issue Dec 19, 2024
When `deterministic_paths` is set, we are currently using
`-ffile-prefix-map` to produce the same path in data and debug info. In
the case of absolute paths, their emscripten path is replaced with a
fake path `/emsdk/emscripten`, and in the case of relative paths, all
path relative to the emscripten directory is removed, so
`../../system/lib/somefile.c` becomes `system/lib/somefiles.c`.
https://github.com/emscripten-core/emscripten/blob/f66b5d706e174d9e5cc6122c06ea29dcd2735cd0/tools/system_libs.py#L472-L477
https://github.com/emscripten-core/emscripten/blob/f66b5d706e174d9e5cc6122c06ea29dcd2735cd0/tools/system_libs.py#L495-L501

So this does not make relative paths and absolute paths the same, which
can lead to different builds depending on whether the command line uses
absolute paths vs. relative ones. Currently we use relative paths when
`EMCC_BATCH_BUILD` is set. And Ninja builds cannot use relative paths.

This is also what was suggested in
emscripten-core#23195 (comment)
while discussins `__FILE__` problem in #23915.
aheejin added a commit to aheejin/emscripten that referenced this issue Dec 19, 2024
When `deterministic_paths` is set, we are currently using
`-ffile-prefix-map` to produce the same path in data and debug info. In
the case of absolute paths, their emscripten path is replaced with a
fake path `/emsdk/emscripten`, and in the case of relative paths, all
path relative to the emscripten directory is removed, so
`../../system/lib/somefile.c` becomes `system/lib/somefiles.c`.
https://github.com/emscripten-core/emscripten/blob/f66b5d706e174d9e5cc6122c06ea29dcd2735cd0/tools/system_libs.py#L472-L477
https://github.com/emscripten-core/emscripten/blob/f66b5d706e174d9e5cc6122c06ea29dcd2735cd0/tools/system_libs.py#L495-L501

So this does not make relative paths and absolute paths the same, which
can lead to different builds depending on whether the command line uses
absolute paths vs. relative ones. Currently we use relative paths when
`EMCC_BATCH_BUILD` is set. And Ninja builds cannot use relative paths.

This is also what was suggested in
emscripten-core#23195 (comment)
while discussins `__FILE__` problem in #23915.
aheejin added a commit to aheejin/emscripten that referenced this issue Dec 19, 2024
When `deterministic_paths` is set, we are currently using
`-ffile-prefix-map` to produce the same path in data and debug info. In
the case of absolute paths, their emscripten path is replaced with a
fake path `/emsdk/emscripten`, and in the case of relative paths, all
path relative to the emscripten directory is removed, so
`../../system/lib/somefile.c` becomes `system/lib/somefiles.c`.
https://github.com/emscripten-core/emscripten/blob/f66b5d706e174d9e5cc6122c06ea29dcd2735cd0/tools/system_libs.py#L472-L477
https://github.com/emscripten-core/emscripten/blob/f66b5d706e174d9e5cc6122c06ea29dcd2735cd0/tools/system_libs.py#L495-L501

So this does not make relative paths and absolute paths the same, which
can lead to different builds depending on whether the command line uses
absolute paths vs. relative ones. Currently we use relative paths when
`EMCC_BATCH_BUILD` is set. And Ninja builds cannot use relative paths.

This is also what was suggested in
emscripten-core#23195 (comment)
while discussins `__FILE__` problem in #23915.

This does not change any code size tests because no code size tests
happens to contain `__FILE__` at the moment. (One will be added in
 emscripten-core#22994)
aheejin added a commit that referenced this issue Dec 19, 2024
#23222)

When `deterministic_paths` is set, we are currently using
`-ffile-prefix-map` to produce the same path in data and debug info. In
the case of absolute paths, their emscripten path is replaced with a
fake path `/emsdk/emscripten`, and in the case of relative paths, all
path relative to the emscripten directory is removed, so
`../../system/lib/somefile.c` becomes `system/lib/somefiles.c`.
https://github.com/emscripten-core/emscripten/blob/f66b5d706e174d9e5cc6122c06ea29dcd2735cd0/tools/system_libs.py#L472-L477
https://github.com/emscripten-core/emscripten/blob/f66b5d706e174d9e5cc6122c06ea29dcd2735cd0/tools/system_libs.py#L495-L501

So this does not make relative paths and absolute paths the same, which
can lead to different builds depending on whether the command line uses
absolute paths vs. relative ones. Currently we use relative paths when
`EMCC_BATCH_BUILD` is set. And Ninja builds cannot use relative paths.

This is also what was suggested in
#23195 (comment)
while discussins `__FILE__` problem in #23195.

This does not change any code size tests because no code size tests
happens to contain `__FILE__` at the moment. (One will be added in
#22994)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants