-
-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RubyConf] Create scrubber for replacing double breakpoints into paragraph nodes #284
base: main
Are you sure you want to change the base?
[RubyConf] Create scrubber for replacing double breakpoints into paragraph nodes #284
Conversation
@torihuang @josecolella Thanks again for this PR! Looking at the failing tests, it seems like the HTML4 and the HTML5 parser handle newlines differently, and that's causing a failure one way or the other depending on whether there are newlines in the expected result or not. Demonstration: #! /usr/bin/env ruby
require "bundler/inline"
gemfile do
source "https://rubygems.org"
gem "loofah"
end
input = "<html><body><h1>Hello</h1><div>World</div></body></html>"
doc = Nokogiri::HTML4::Document.parse(input)
doc.at_css("h1").add_next_sibling("<b>there</b>")
doc.at_css("body").inner_html
# => "<h1>Hello</h1>\n" + "<b>there</b><div>World</div>"
doc = Nokogiri::HTML4::Document.parse(input)
doc.at_css("h1").add_next_sibling("<p>there</p>")
doc.at_css("body").inner_html
# => "<h1>Hello</h1>\n" + "<p>there</p>\n" + "<div>World</div>"
doc = Nokogiri::HTML5::Document.parse(input)
doc.at_css("h1").add_next_sibling("<b>there</b>")
doc.at_css("body").inner_html
# => "<h1>Hello</h1><b>there</b><div>World</div>"
doc = Nokogiri::HTML5::Document.parse(input)
doc.at_css("h1").add_next_sibling("<p>there</p>")
doc.at_css("body").inner_html
# => "<h1>Hello</h1><p>there</p><div>World</div>" In any case, now that I understand what's going on, I'll wrap this up by tomorrow! |
Sounds good. Thanks @flavorjones |
@flavorjones were you still interested in getting this merged? |
@josecolella Totally! I've been really distracted the last few weeks, but I will absolutely circle back on this. |
@flavorjones Any update here? |
@josecolella Really sorry for the delay. This was harder than I expected to wrap up (at least in a way that I didn't think was gross). I will be spending a few weeks (at my new job!) on the sanitizer stack starting in late October and will do my best to get this merged then. |
Why?
What?
:double_breakpoint
) that replaces double breakpoints into paragraph nodes (thank you @flavorjones )How did we test?
Important
There is a failing test right now where the expectation and actual result match except for newline characters. In discussing with @flavorjones, this might be related to minitest and how it formats html
References #279