Skip to content

Parses webarchive binary plist and MHTML files to create a markdown checklist index

License

Notifications You must be signed in to change notification settings

elroyjetson/webarchivetomdlist

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

webarchivetomdlist

Parses webarchive binary plist and MHTML files to create a markdown checklist index.

This is simply a proof of concept so that I can save web pages offline in a single file archive, generate a checklist in markdown to index each file.

The parsing of a single file archive can be done on Linux/Windows/MacOS using Chromium based browser by saving as a Webpage, Single File. This creates an archive with all assets as an MTHML file which is a multi-part mime file similar to an email message.

Usage: mhtml.py

After completion, a new line will be added to the ril.md file with the title and original url for the archive.

Safari on MacOS uses the webarchive binary plist file format to save webpages offline.

Usage: webarchive.py

After completion, a new line will be added to the ril.md file with the title and original url for the archive.

It is best to run this as a cron batch or systemd service with find in order to update the reading list from time to time.

Add the following find command changing the -mtime and location of the parser and ril.md list as appropriate.

find . -name "*.mhtml" -mtime -1d -exec ./mhtml.py {} >> ril.md \;

-mtime uses the following codes for times

s       second
m       minute (60 seconds)
h       hour (60 minutes)
d       day (24 hours)
w       week (7 days)

About

Parses webarchive binary plist and MHTML files to create a markdown checklist index

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages