-
Notifications
You must be signed in to change notification settings - Fork 815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TextArea crashes Python on html language #4152
Comments
I think the problem is the size of the html file. I experimented with reducing this to ~1,000 lines and eventually the app does load. Running the original example with
Backtrace
|
Yeah, I had a very quick play yesterday (not enough to draw any useful conclusions) and it felt like it was a tree-sitter issue; for example when I loaded the file as-is but said it was python code it loaded pretty much instantly. Didn't seem to be an actual problem with |
Delving deeper... from pathlib import Path
from time import perf_counter
from tree_sitter_languages import get_language, get_parser
html_path = Path("snapshot_report_snip.html")
# https://github.com/Textualize/textual/issues/4152
html_text = html_path.read_text()
language = get_language("html")
parser = get_parser("html")
syntax_tree = parser.parse(bytes(html_text, "utf-8"))
textual_query_path = Path("html_textual.scm")
# https://github.com/Textualize/textual/blob/main/src/textual/tree-sitter/highlights/html.scm
textual_query_scm = textual_query_path.read_text()
test_query_path = Path("html_test.scm")
# https://github.com/tree-sitter/tree-sitter-html/blob/master/queries/highlights.scm
test_query_scm = test_query_path.read_text()
print("Running 'html_test.scm' highlight query...")
start = perf_counter()
test_query = language.query(test_query_scm)
test_captures = test_query.captures(syntax_tree.root_node)
end = perf_counter()
print(f"{len(test_captures)} captures took {end - start:.4f} seconds")
print("=====")
print("Running 'html_textual.scm' highlight query...")
start = perf_counter()
textual_query = language.query(textual_query_scm)
textual_captures = textual_query.captures(syntax_tree.root_node)
end = perf_counter()
print(f"{len(textual_captures)} captures took {end - start:.4f} seconds")
print("=====") The issue seems to be a combination of the query pattern in tree-sitter/highlights/html.scm and the size of the file. Reducing the example snapshot html to ~1,000 lines will eventually work, but experimenting with a different |
I haven't delved deep enough yet to understand the from pathlib import Path
from time import perf_counter
from tree_sitter_languages import get_language, get_parser
highlights_query = """
(tag_name) @tag
(erroneous_end_tag_name) @html.end_tag_error
(comment) @comment
(attribute_name) @tag.attribute
(attribute
(quoted_attribute_value) @string)
(text) @text @spell
((attribute
(attribute_name) @_attr
(quoted_attribute_value (attribute_value) @text.uri))
(#any-of? @_attr "href" "src"))
[
"<"
">"
"</"
"/>"
] @tag.delimiter
"=" @operator
(doctype) @constant
"<!" @tag.delimiter
"""
html_path = Path("snapshot_report.html")
# https://github.com/Textualize/textual/issues/4152
html_text = html_path.read_text()
language = get_language("html")
parser = get_parser("html")
syntax_tree = parser.parse(bytes(html_text, "utf-8"))
start = perf_counter()
test_query = language.query(highlights_query)
test_captures = test_query.captures(syntax_tree.root_node)
end = perf_counter()
print(f"{len(test_captures)} captures took {end - start:.4f} seconds") |
Fixes Textualize#4152 by removing all `((element (start_tag (tag_name) @_tag)` patterns from the `html.scm` highlights query file. These patterns will cause a segfault on relatively large documents and even just one seems a massively expensive operation from some quick testing. All tests pass after removing these and I couldn't see they were actually used anywhere in syntax highlighting, but please correct me if I'm wrong!
* fix(tree-sitter): remove slow html highlight patterns Fixes #4152 by removing all `((element (start_tag (tag_name) @_tag)` patterns from the `html.scm` highlights query file. These patterns will cause a segfault on relatively large documents and even just one seems a massively expensive operation from some quick testing. All tests pass after removing these and I couldn't see they were actually used anywhere in syntax highlighting, but please correct me if I'm wrong! * run tests in ci * Update changelog --------- Co-authored-by: Darren Burns <[email protected]>
Don't forget to star the repository! Follow @textualizeio for Textual updates. |
The
TextArea
widget crashes the Python process when trying to render the attached HTML file with thehtml
language: snapshot_report.html.zip.There typically isn't any stacktrace for me, the entire terminal window closes with a
Python quit unexpectedly
message. I was able to observe the following error though:I used the
pytest-textual-snapshot
package to create the original HTML file. I've tested this on ARM Mac. This includes with thesyntax
extras installed. When you remove the language param the file is rendered correctly.The text was updated successfully, but these errors were encountered: