Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input events dispatch to top-level frame #1847

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

sadym-chromium
Copy link
Contributor

@sadym-chromium sadym-chromium commented Oct 22, 2024

As discussed in w3c/webdriver-bidi#795, the actions should be dispatched from the top-level browsing context.


Preview | Diff

index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
@sadym-chromium
Copy link
Contributor Author

@whimboo WDYT?

@OrKoN
Copy link
Contributor

OrKoN commented Oct 23, 2024

The following two things seem to be still missing:

  1. steps to convert the coordinates to the top-level browsing context coordinate space if they are frame scoped. Edit: unless this is covered by the implementation-defined bits?
  2. apply the same fix to other actions (I assume we do actually want to dispatch keyboard events through the top-level frame as well)

@whimboo
Copy link
Contributor

whimboo commented Oct 23, 2024

The following two things seem to be still missing:
1. steps to convert the coordinates to the top-level browsing context coordinate space if they are frame scoped. Edit: unless this is covered by the implementation-defined bits?

Yes, absolutely. By changing the way where we dispatch the events the coordinates should be exactly the same. Not taking care of offsets would cause quite a lot of regressions for those users who make use of actions a lot.

Therefore see:

web-platform-tests/wpt#48147
web-platform-tests/wpt#48123

Given that Chrome seems to already dispatch actions in the top-level browsing context the referenced tests are failing.

2. apply the same fix to other actions (I assume we do actually want to dispatch keyboard events through the top-level frame as well)

Yes, we would have to do it for all the pointer and wheel input sources.

Keyboard actions are more challenging because they could allow users to trigger shortcuts that access restricted browser features. This would require additional checks to determine what actions are permitted and which ones should be blocked. Maybe we should have a follow-up issue for it?

@sadym-chromium
Copy link
Contributor Author

  1. apply the same fix to other actions (I assume we do actually want to dispatch keyboard events through the top-level frame as well)

Aren't all the actions end up in perform implementation-specific action dispatch steps?

@sadym-chromium
Copy link
Contributor Author

@OrKoN @whimboo I added calculation of the offset relative to the parent browsing context, but IDK how well it will work

@sadym-chromium sadym-chromium requested a review from OrKoN October 23, 2024 13:38
Comment on lines +8293 to +8294
such that trusted events corresponding to the entries in
<var>list of events</var> are dispatched.
Copy link
Contributor Author

@sadym-chromium sadym-chromium Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively we can "hand-wave" hear like "These steps must be equivalent to user trying to perform the given input device manipulations on context through the top-level browsing context."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@sadym-chromium sadym-chromium Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OrKoN @whimboo we can keep specification implementation-specific, just clarify the events are dispatched via top-level browsing context. I'm not sure if that approach would work with the events intercepted by the top-level frame.

@whimboo
Copy link
Contributor

whimboo commented Oct 23, 2024

@OrKoN @whimboo I added calculation of the offset relative to the parent browsing context, but IDK how well it will work

I think that we should have more wdspec tests (similar to mine for mouse) so we cover at least each input source (including sub sources) and verify implementations against. Did you get Chrome working for my example?

@OrKoN
Copy link
Contributor

OrKoN commented Oct 23, 2024

@sadym-chromium it looks like each action type dispatches on the context parameter https://w3c.github.io/webdriver/#dfn-dispatch-a-keydown-action

  1. apply the same fix to other actions (I assume we do actually want to dispatch keyboard events through the top-level frame as well)

Aren't all the actions end up in perform implementation-specific action dispatch steps?

that's right, overlooked that.

@OrKoN
Copy link
Contributor

OrKoN commented Oct 23, 2024

Keyboard actions are more challenging because they could allow users to trigger shortcuts that access restricted browser features. This would require additional checks to determine what actions are permitted and which ones should be blocked. Maybe we should have a follow-up issue for it?

so actually I am not sure, even with this change it is not quite defined what it means that the actions are dispatched to the top-level browsing context. For example, in Chrome that is the case but still would not give you access to browser shortcuts.
Perhaps we need a better definition of what the dispatch of an event means, perhaps hook into native event handles in https://www.w3.org/TR/uievents/#handle-native-mouse-move-id?

@whimboo
Copy link
Contributor

whimboo commented Oct 23, 2024

so actually I am not sure, even with this change it is not quite defined what it means that the actions are dispatched to the top-level browsing context. For example, in Chrome that is the case but still would not give you access to browser shortcuts.

Yes you are right. I mixed it up with the native event dispatching that I'm working on as well right now. In those cases we would have that particular issue but not when dispatching it in the content process of the top-level browsing context.

must be equivalent to performing the given input device manipulations
on <var>context</var>, such that trusted events corresponding to the
entries in <var>list of events</var>are dispatched.
dispatch steps</dfn> on a <var>context</var>, and a <var>list of events</var>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep browsing context indicating the context var type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I missed it.

index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
@@ -8392,7 +8438,7 @@ <h3>Processing actions</h3>
</dd>
</dl>

<li><p>Return (<var>x</var>, <var>y</var>)
<li><p>Return (<var>x</var> + <var>parentOffsetLeft</var>, <var>y</var> + <var>parentOffsetTop</var>)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the offset be applied to all origins or only the viewport? I think the origin pointer might already be in the top level context coordinates and not sure if the element origin does some adjustments already.

Copy link
Contributor Author

@sadym-chromium sadym-chromium Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect the origin "pointer" to be relative to "context" param of "PerformActionsParameters" of the command.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does not seem to follow from the spec text since input sources are per top-level browsing context.

</li>
<li>
Let <var>navigable</var> be <var>context</var>'s <a>active document</a>'s
[=navigable/parent=].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[=navigable/parent=] is a property of a navigable and not the document. Did you mean to get the current navigable instead of a parent (e.g., https://html.spec.whatwg.org/#node-navigable)?

sadym-chromium and others added 2 commits October 24, 2024 10:08
Co-authored-by: Alex Rudenko <[email protected]>
Co-authored-by: Alex Rudenko <[email protected]>
Copy link
Member

@jgraham jgraham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that in principle this is the model we need to change to in order to work with modern browser designs.

However I think the change as written is insufficient, and makes the spec contradictory. In each action we still claim we're dispatching events to |context| which, afaict after these changes, is not the parent context but the actual child context. The spec is unfortunately rather badly written here, so the text you've modified is apparently normative, but exists in what is essentially an informative section describing the overall model.

I also suggest (perhaps as a followup) that we make it possible for specific actions to provide the context that is used for calculating origins (which must in all cases be an ancestor of the top-level command context). That makes it much easier to construct action chains that interact with both the top level document and iframes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants