-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spec tests that conflict with mature proposals #1788
Comments
Thanks for writing this up! I might expand upon the proposed wording to say:
Basically clarifying that when the proposal-in-question is enabled that the upstream spec tests still pass which would require changing binary formats/etc if necessary and encapsulates the goal of being able to run all the spec tests with a bleeding-edge implementation. |
I commented elsewhere that if we ignored enabled/disabled proposals in binary decoding there was only a few failures, but turns out I didn't investigate that enough and forgot some other tests, specifically tests like this which assert multi-memory is not implemented. This makes me realize that fixing this is not going to be that easy because it also involves sending PRs to a lot of other proposals which have additionally copied this new test too. For example if an upstream spec test is updated then any proposal created before that fix will also need to be updated which can end up in a lot of PRs to a lot of places and can delay running spec tests. Ideally I think this could establish a few guidelines both for proposals and engine/tooling authors, along the lines of:
This is pretty similar to what's proposed here but a major change would be to segregate new tests into a specific directory rather than adding them to existing tests. Upon merging to the main specification they could get merged into the main tests, but it would be extremely helpful if test conflicts only required updating the main upstream spec rather than all active proposals that have forked the specification too. |
How is this different from the current practice of putting new tests in test/core/current_proposal/new_test.wast? e.g. https://github.com/WebAssembly/gc/tree/main/test/core/gc. I do think it would be good to avoid updating upstream spec tests to accommodate phase 1 proposals, but maybe it would be ok for phase 2 proposals. Either way, if an engine is implementing a very early phase proposal, I think there will still be some extra testing complexity no matter what. |
Oh that's perfect, but it's just not applied across all proposal repositories right now. That means that the |
I think we should make this a requirement rather than a recommendation, and be strict about it for every single little test. The multi-memory proposal, for example, would not fix/update/remove the existing no-multi-memory test that Alex linked above. That test would be left as-is. Instead, it would create new multi-memory tests in new files inside the proposal-specific directory (the way that most tests are done). This allows engines/tools to run the tests and use just a test's filename/path to determine which features to enable or disable, which means that both the old no-multi-memory test can pass when multi-memory is disabled and the multi-memory tests can pass when it is enabled. And then we would have the final step where, when we merge a proposal into the main spec, we remove the old spec tests that were asserting against things that are now always allowed, like the old no-multi-memory test. |
We could implement this by making the testsuite aggregator script ignore all the tests outside of a proposal's test subdirectory — even if the proposal repo ignored the above requirement/recommendation and made changes to tests outside that subdirectory. |
That sounds good to me.
I still think it's worth updating the upstream tests to avoid conflicts in phase 3, since otherwise this requires using feature flags in the decoder, which as we've seen, can be a source of inflexibility with real-world consequences. |
This sounds good to me 👍 |
That would be somewhat in conflict with the process. In theory, a feature is not considered standardised before phase 4, and may still be redesigned (or even rejected) in a way that obsoletes such a change. So it may be a bit problematic to upstream corresponding changes before that.
How would this work? Even with an implementation that has a feature flag (which not all will have), how does it recognise the outdated tests when the flag is turned on and distinguishes them from unexpected failures? A clean way for dealing with such obsolete negative(*) tests would be to somehow annotate such tests. For example, imagine test infras recognise an annotation (*) I believe our backwards-compat requirements imply that this problem only exists for negative tests. |
I would be happy with an obsoleted-by annotation. |
The way that Wasmtime does this is based on the file path. If the test is under
Is the idea that a proposal reaches phase 3 and then, as described by @tlively in the OP, the champions would add this annotation to the upstream spec for any tests that get obsoleted by their proposal? Are they also responsible for adding it to all other proposals that have forked the spec tests? I think this annotation would solve the issue in the context of a single test file or a single copy of the test suite in either the main repo or a proposal repo, but doesn't address the reality that engines and toolchains are testing against what are effectively multiple copies of the test suite, at various states of lagging-behind- I think #1788 (comment) is still the best way to avoid |
Are you saying that none of the pre-existing tests would run with the new feature turned on? That's very sub-ideal, isn't it? Many old tests might potentially be affected by the implementation of a new feature.
No, no upstream changes at all. You only annotate (individual test assertions) as obsolete in the respective proposal repo. And since it's just an annotation, it should usually merge effortlessly when combining multiple proposal repos. This addresses exactly that scenario, but without losing all the existing test coverage for new features. |
There are several spec tests that test that a given module is malformed, where that same module would be considered well-formed under some mature-but-not-yet-standard proposal. For example, there is a test that modules with multiple memories are malformed, but this test is invalidated by the multi-memory proposal, and indeed the multi-memory proposal adds tests in the other direction, testing that modules with multiple memories are well-formed.
These contradictory tests are a source of complexity for engine and tool developers when they implement mature proposals like multi-memory because they either must change their decoding behavior based on enabled features or must selectively skip tests to resolve conflicts. WebAssembly/tool-conventions#233 discusses an instance of real-world toil caused by the former strategy, and of course the latter strategy is not ideal, either.
I propose that we reduce the complexity for implementers here by updating the spec tests to avoid conflicts with mature proposals. Note that we already have to update the spec tests this way once a proposal is finished and merged into the spec; I'm simply proposing that we remove the conflicts earlier.
Concretely, I propose that we add the following item to phase 3, where it is expected that engines and tools will implement the proposal and run into these conflicts:
Does this seem like a good idea?
The text was updated successfully, but these errors were encountered: