Skip to content

Commit

Permalink
Add explanation about fuzzy rules configuration
Browse files Browse the repository at this point in the history
  • Loading branch information
benoit74 committed Jun 25, 2024
1 parent 7aeba97 commit 47747b0
Showing 1 changed file with 17 additions and 0 deletions.
17 changes: 17 additions & 0 deletions rules/rules.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,20 @@
# This file comes from an adaptation of rules present in
# https://github.com/webrecorder/wabac.js/blame/main/src/fuzzymatcher.js
#
# Syncing rules is done manually, based on expert knowledge, especially because in
# warc2zim we are not really fuzzy matching (searching the best entry among existing
# ones) but just rewriting to proper path.
#
# This file is in sync with content at commit 879018d5b96962df82340a9a57570bbc0fc67815
# from June 9, 2024
#
# This file should be updated at every release of warc2zim
#
# Some rules are voluntarily missing because not been tested in warc2zim yet: Twitter,
# Washington Post, WixStatic, Facebook
#
# Generic rules are also ommitted on purpose, we don't need them
#
fuzzyRules:
- pattern: .*googlevideo.com/(videoplayback(?=\?)).*[?&](id=[^&]+).*
replace: youtube.fuzzy.replayweb.page/\1?\2
Expand Down

0 comments on commit 47747b0

Please sign in to comment.