[GHSA-5jfw-gq64-q45f] HTML Cleaner allows crafted scripts in special contexts like svg or math to pass through #5031

byt3n33dl3 · 2024-11-22T06:37:54Z

Updates

CVSS v3
Description
References
Summary

Comments
This update addresses a critical Cross-Site Scripting (XSS) vulnerability in the lxml-html-clean library, affecting versions < 0.4.0. The vulnerability arises from improper handling of special HTML tags such as , , and , allowing malicious scripts to bypass the HTML cleaning process.

The proposed improvement includes additional context about the exploit scenario, mitigation techniques, and real-world implications of the vulnerability. It also provides actionable examples and references, making the advisory more comprehensive and user-friendly. These enhancements ensure developers understand the risks and adopt best practices to secure their applications effectively.

This contribution aligns with the goal of the GitHub Security Advisory to provide detailed, actionable, and accurate information to the developer community for maintaining software security.

github · 2024-11-22T06:37:54Z

Hi there @frenzymadness! A community member has suggested an improvement to your security advisory. If approved, this change will affect the global advisory listed at github.com/advisories. It will not affect the version listed in your project repository.

This change will be reviewed by our Security Curation Team. If you have thoughts or feedback, please share them in a comment here! If this PR has already been closed, you can start a new community contribution for this advisory

darakian · 2024-11-22T21:22:07Z

I'm not sure I agree that this is an improvement. This reads to me as a fluffing up of the text which degrades readability. The two references you add are also duplicative with what we already have on record. Maybe I'm missing it, but can you point out what new context you're adding to the advisory?

byt3n33dl3 · 2024-11-23T02:30:04Z

I'm adding an improvement on the lxml_html and related stuff about more possibility related to cross-site vuln

Phishing Attacks through SVG Payloads

Scenario: An attacker crafts an HTML payload containing an <svg> element with malicious JavaScript embedded in a <script> tag. This payload is passed through `lxml_html_clean,` which fails to sanitize it effectively due to improper handling of <svg> context-switching. When the sanitized output is rendered in a browser, the JavaScript executes.

Reflected XSS in Web Applications and DOM-Based XSS through JavaScript Integration

Scenario: A web application accepts untrusted input from query parameters or form submissions and sanitizes it using lxml_html_clean. An attacker crafts a payload with <math> tags containing event handlers such as onclick or onmouseover, which bypass the sanitizer and execute in the browser.

Scenario: An attacker embeds <svg> or <noscript> elements containing scripts that interact with client-side JavaScript. When the sanitized HTML is dynamically injected into the DOM via JavaScript, the browser interprets the malicious scripts embedded in these tags, bypassing the intended sanitization.

So mostly I want to add what other possibility of exploitation from this vuln, maybe in short it was

Stored XSS
(DoS) via Resource Exhaustion
DOM Clobbering
Open Redirects via Sanitized Links
Code Execution via Polyglot Payloads
Cross-Origin Data Exfiltration
Arbitrary Code Execution in Legacy Browsers
Bypassing Content Security Policy (CSP)

Summary

While XSS is the most prominent vulnerability due to the mismanagement of these tags, the improper handling of , , and elements in lxml_html_clean creates opportunities for various exploits, from DoS and DOM clobbering to sophisticated bypass techniques. These scenarios emphasize the importance of upgrading to the patched version of lxml and implementing robust additional validation techniques when handling untrusted HTML content.

byt3n33dl3 · 2024-11-23T03:09:01Z

Scenario of Execution

HTML Injection

Attackers exploit the vulnerability to inject untrusted HTML content that appears sanitized but retains harmful structure due to context-switching issues. For example:

Misused <math> or <svg> elements with unexpected attributes.

Embedded malicious iframes or forms disguised in legitimate-looking content.
Impact: Enables phishing attacks or tricking users into submitting sensitive data to malicious endpoints.

Stored XSS

In applications that persist sanitized HTML in databases or logs, malicious content can bypass sanitization and remain dormant until displayed in a vulnerable context. For instance

Injected <noscript> tags might trigger scripts in browser contexts with JavaScript disabled.

Hidden scripts in SVG animations ( or elements) may activate under certain conditions.
Impact: Persistent execution of malicious scripts whenever the compromised content is viewed, amplifying the attack surface.

Denial of Service (DoS) via Resource Exhaustion

Scenario is when An attacker creates complex nested

<svg> or <math> elements with recursive attributes or oversized payloads

designed to consume excessive parsing resources. Since lxml may not handle such payloads efficiently, this could lead to:
High memory or CPU consumption on the server during sanitization.
Application crashes or reduced availability. Impacting to a DoS attacks could disrupt services relying on lxml for processing untrusted HTML inputs.

DOM Clobbering

Leveraging improperly sanitized HTML to insert elements with unexpected IDs or names that overwrite critical DOM properties. For example

<svg id="submit" onclick="maliciousFunction()"> could override a legitimate form’s submit functionality.

Impact: Manipulation of the client-side DOM behavior, potentially hijacking user actions or breaking application functionality.

Open Redirects via Sanitized Links

For example Attackers inject sanitized tags with event handlers or redirection payloads hidden within or . For instance

<svg><a href="javascript:evilFunction()">Click here</a></svg>.

Impacting to a Exploitation of open redirects to conduct phishing or malware distribution campaigns.

Code Execution via Polyglot Payloads

Context-switching behavior may allow injection of polyglot payloads that are interpreted differently depending on the parser or runtime environment. For example
Mixed HTML, SVG, and JavaScript elements that produce varied behaviors when sanitized, rendered, or executed.
Leading to Bypasses both sanitization and execution safeguards, leading to remote code execution (RCE) in some contexts.

Cross-Origin Data Exfiltration

Malicious

<svg> or <math> elements exploit context-switching to bypass same-origin policies indirectly.

For example Embedding

<svg><foreignObject> to extract sensitive data by manipulating browser rendering behavior.

Impact: Sensitive user data could be leaked to an attacker-controlled domain.

Improve GHSA-5jfw-gq64-q45f

300d06c

github-actions bot changed the base branch from main to byt3n33dl3/advisory-improvement-5031 November 22, 2024 06:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GHSA-5jfw-gq64-q45f] HTML Cleaner allows crafted scripts in special contexts like svg or math to pass through #5031

[GHSA-5jfw-gq64-q45f] HTML Cleaner allows crafted scripts in special contexts like svg or math to pass through #5031

byt3n33dl3 commented Nov 22, 2024

github commented Nov 22, 2024

darakian commented Nov 22, 2024

byt3n33dl3 commented Nov 23, 2024

byt3n33dl3 commented Nov 23, 2024

[GHSA-5jfw-gq64-q45f] HTML Cleaner allows crafted scripts in special contexts like svg or math to pass through #5031

Are you sure you want to change the base?

[GHSA-5jfw-gq64-q45f] HTML Cleaner allows crafted scripts in special contexts like svg or math to pass through #5031

Conversation

byt3n33dl3 commented Nov 22, 2024

github commented Nov 22, 2024

darakian commented Nov 22, 2024

byt3n33dl3 commented Nov 23, 2024

byt3n33dl3 commented Nov 23, 2024

Scenario of Execution

HTML Injection

Stored XSS

Denial of Service (DoS) via Resource Exhaustion

DOM Clobbering

Open Redirects via Sanitized Links

Code Execution via Polyglot Payloads

Cross-Origin Data Exfiltration