Improve performance x100 for bigger pages #53
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR optimizes finding & scoring process of individual candidates.
I had problems parsing big pages, like this one
Previously, process involved multiple calls to
Floki.text()
which is quite slow. Parsing that page took more than 5 minutes on M1 Pro. After my changes, it completes in 2s.How it works:
Helpers.text_length
,Helpers.count_character
andHelpers.find_tag
which either uses these precomputed values or are faster than Floki equivalentsAll tests pass, and it works fine in production. Ouch, and I bumped dependencies a little so I could use Floki traverse functions.
I see that library is not maintained too much, but still decided to give PR a go. Maybe someone else will find my fork useful ;) @keepcosmos thanks for creating this!