-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CSSSanitizer to sanitize_html #731
Conversation
9436e63
to
7fef60c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to unblock
|
||
# Clean the HTML content | ||
sanitized_content = bleach.clean( | ||
html_content, | ||
tags=allowed_tags, | ||
attributes=allowed_attrs, | ||
strip=True, # Strip disallowed tags completely | ||
protocols=["http", "https"], # Only allow http and https URLs | ||
protocols=["http", "https"], # Only allow http and https URLs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] Looks like this accepts a dict vs a list,
https://bleach.readthedocs.io/en/latest/clean.html#allowed-protocols-protocols
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
def sanitize_html(html_content): | ||
""" | ||
Sanitize HTML content to allow only safe tags and attributes, | ||
while disallowing JavaScript and unsafe protocols. | ||
""" | ||
# Define allowed tags and attributes | ||
allowed_tags = bleach.ALLOWED_TAGS # Allow all standard HTML tags | ||
allowed_tags = set.union(bleach.ALLOWED_TAGS, set({"span"})) # Allow all standard HTML tags | ||
allowed_attrs = {"*": ["className", "class", "style", "id"]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[sanity check] className
is from JSX. When React renders JSX, it converts to class
in the resulting HTML. I'm not sure including the JSX className
attribute in this HTML field would have an effect, as class
does, when it renders. The dangerouslySetInnerHTML
assumes the input is raw HTML; it would not transpile JSX -> HTML, afaik.
allowed_attrs = {"*": ["className", "class", "style", "id"]} | ||
css_sanitizer = CSSSanitizer(allowed_css_properties=["color", "font-weight"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[suggestion] I wonder if it's worth supporting the default bleach css properties, similar to extending bleach.ALLOWED_TAGS
. Looks like not supplying allowed_css_properties
would fallback to the defaults.
[suggestion x2] Regardless, for styling, I might recommend considering relying on the Paragon CSS utility classes vs. hardcoding any colors, font-weights, etc. (e.g., font-weight-bold
, text-brand
, etc.) within the custom HTML.
|
||
def sanitize_html(html_content): | ||
""" | ||
Sanitize HTML content to allow only safe tags and attributes, | ||
while disallowing JavaScript and unsafe protocols. | ||
""" | ||
# Define allowed tags and attributes | ||
allowed_tags = bleach.ALLOWED_TAGS # Allow all standard HTML tags | ||
allowed_tags = set.union(bleach.ALLOWED_TAGS, set({"span"})) # Allow all standard HTML tags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I believe {"span"}
is equivalent to set({"span"})
.
95eb176
to
3b2f8a7
Compare
3b2f8a7
to
c2b2aab
Compare
Description
Bleach documentation suggests that if you want to use stye elements in your HTML tags you must also include a CSS sanitizer when calling bleach.clean.
https://bleach.readthedocs.io/en/latest/clean.html#allowed-attributes-attributes