Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Support HTMLRewriter replacing HTML with content from a ReadableStream or Response #2758

Open
1000hz opened this issue Sep 20, 2024 · 24 comments
Assignees
Labels
api feature request Request for Workers team to add a feature

Comments

@1000hz
Copy link
Contributor

1000hz commented Sep 20, 2024

Currently, HTMLRewriter only supports replacing HTML content with strings:

kj::String unwrapContent(Content content) {
return kj::mv(JSG_REQUIRE_NONNULL(content.tryGet<kj::String>(), TypeError,
"Replacing HTML content using a ReadableStream or Response object is not "
"implemented. You must provide a string."));
}

Being able to accept content via a ReadableStream or Response would be very useful for improving the latency of the response to the end user when (for example) embedding content from an upstream fetch into another HTML document.

This depends on first adding underlying support for this in https://github.com/cloudflare/lol-html which will then need to be exposed in the runtime.

@jasnell jasnell added feature request Request for Workers team to add a feature api labels Sep 20, 2024
@IgorMinar
Copy link
Collaborator

Message from @dotjs from this morning:

We discussed this morning and there is some work from the Rust perspective. More problematic is how to integrate with kj. We need their streams to work with Rust streams and we would need some way to make async Rust work with kj-runtime. So I think we would need some runtime effort as well.

That's how we landed on this issue on workerd.

@jasnell @anonrig I'm not a kj expert but I know you have a lot of experience. Do you understand what Andrew's message above means and how much work it would be for us to fix it in the runtime? Is that something you would be comfortable doing or do we need someone else from the runtime.

As for priority, this is become a blocking issue for wider Fractus rollout because the lack of this feature means we can't support streaming SSR, which is a big performance problem for us.

@jasnell
Copy link
Member

jasnell commented Sep 20, 2024

It's been a while since I've dug through this code, would need some time to analyze it again.

@IgorMinar
Copy link
Collaborator

Could you take a peek please? We need to get unblocked soon (within the next few days), and if this path will take longer than that, then we'll most likely reach for an less performant alternative that supports streaming which in the end is going to be on overall perf boost for us even if the transformation itself is inefficient.

Our current top alternative is something like https://trumpet.gofunky.fun/ which seems fine, but we'd have to turn on the nodejs_compat layer which further complicates things but might be worth it if we get unblocked.

@dotjs
Copy link

dotjs commented Sep 23, 2024

cc @inikulin , @orium who can provide input from lol-html perspective

@kentonv
Copy link
Member

kentonv commented Sep 23, 2024

We don't need full Rust/KJ stream or async integration here.

We just need lol-html to support an API where we can incrementally provide chunks of bytes into the element body and it'll write those back out into the response stream. For now it's fine if lol-html even thinks the API is synchronous -- we'll continue tricking it with fibers just like we always have.

@kentonv
Copy link
Member

kentonv commented Sep 23, 2024

That said, I think it's more than a few days of work.

@jasnell
Copy link
Member

jasnell commented Sep 23, 2024

Yep, just going through the code a bit here and I'd estimate that it's probably about a week worth of dev at a minimum. The existing fiber model should work well. I think the limitation is more on the lol-html side (as @kentonv suggests, there needs to be an API where lol-html will accept content incrementally chunk-by-chunk, which is likely the most significant part of what's needed here.

@inikulin
Copy link
Contributor

inikulin commented Sep 24, 2024

It's doable in the sync manner, but can't say it's very straightforward:

  • We need to introduce 2 new traits:
/// A trait for byte chunk iterator that can be inserted as content in rewritable units mutations.
pub trait ChunkIterator {
    fn next(&mut self) -> Option<&[u8]>;
}

/// A trait for converting into [`ChunkIterator`].
pub trait IntoChunkIterator {
    type IntoChunkIter: ChunkIterator;

    fn into_chunk_iterator(self) -> Self::IntoChunkIter;
}

impl IntoChunkIterator will be passed as content to mutation-related functions (like this one: https://github.com/cloudflare/lol-html/blob/master/src/rewritable_units/mutations.rs#L56). For convince we would need to implement IntoChunkIterator for &str, String, etc.

@IgorMinar
Copy link
Collaborator

@inikulin do you have an idea about how much work this is? your plan is very detailed which gives me hope that you've done the hard part already, but would it be possible to implement this within the next few days or is that out of the question? thanks for your help

@inikulin
Copy link
Contributor

inikulin commented Oct 4, 2024

@IgorMinar no work is done on that, I'm just giving the directions as the original author of the library. We're discussing with Andrew G who can carry this work as I'm currently occupied with other stuff.

@IgorMinar
Copy link
Collaborator

@inikulin thank you for the update. I appreciate all your and @dotjs's support on this.

@IgorMinar
Copy link
Collaborator

@inikulin @dotjs do you have any update for us please?

@dotjs
Copy link

dotjs commented Oct 11, 2024

Apologies been a busy week - @kornelski should be able to start looking at this next week.

@IgorMinar
Copy link
Collaborator

@jasnell assuming that @kornelski will do that lol-html changes soon, would you be able to do the workerd integration this week please? We see escalations related to this issue so we'd like to get them resolved ASAP.

@jasnell
Copy link
Member

jasnell commented Oct 16, 2024

This week for me is unlikely. I'll talk to @kflansburg to see who may be able to work on this this week.

@kornelski
Copy link

I'm looking at this now

@IgorMinar
Copy link
Collaborator

Thank you @kornelski, I'm working with @jasnell and @southpolesteve on staffing for the runtime part of this.

@IgorMinar
Copy link
Collaborator

@kornelski any luck? could you please post an update? thank you

@kornelski
Copy link

I need reviewers: cloudflare/lol-html#229

@kornelski
Copy link

How text encodings should be supported?

  • Do you expect to only have UTF-8 inputs? (the current API expects UTF-8)
  • Do you expect to have inputs only in the same encoding as the output document? (pass through without conversion)
  • or do you expect to merge documents with any encoding with documents with any encoding, and have the handlers perform any-to-any text conversion for you?

@IgorMinar
Copy link
Collaborator

@kornelski we expect only utf-8 encoding support at the moment.

@kornelski
Copy link

You can try it out now: https://github.com/cloudflare/lol-html/tree/streaming-handlers-chunked

@IgorMinar
Copy link
Collaborator

Thank you @kornelski, @southpolesteve have you decided who from the runtime team could help with the runtime bits? Is this something @anonrig could take a look at? Based on the previous discussion it seems that the workerd part of this work is minimal, so my educated guess is that @anonrig should be able to tackle it quickly unless there are more surprises and we need more help from lol-html and @kornelski.

@southpolesteve
Copy link

This has been assigned to @npaun who is out today but will start looking tomorrow

@npaun npaun self-assigned this Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api feature request Request for Workers team to add a feature
Projects
None yet
Development

No branches or pull requests

9 participants