Skip to content

A Python script to easily sync a directory with a vault on Amazon Glacier.

Notifications You must be signed in to change notification settings

luxor99/sync-glacier.py

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

sync-glacier.py

This is a fork of https://github.com/bitsofpancake/sync-glacier.py which didn't work with PDF files

In addition, it will update a Mysql database with Glacier archive details (see below for schema)

A Python script to easily sync a directory with a vault on Amazon Glacier. This makes it easy to upload a directory of backups, for example, into a vault. This script requires boto (see their instructions on how to install it) and PyMSQL

pip install pymysql

To use sync-glacier.py, first edit sync-glacier.py and put in your Amazon Web Services credentials:

access_key_id = ""
secret_key = ""

Then, create a configuration file (see sample.job) with the vault name, region, and directories you want to sync, separated with |.

Run the script and pass in the config file with the command:

sync-glacier.py job_file.job

On the first run, it will download an inventory of the vault. This takes about four hours, after which you'll need to run the script again. The script will upload the files in the given directory that don't already appear in the vault (or that have been updated since your last upload). Once that's done, every time you want to sync changes to your vault, simply run the script again. It'll detect what's been updated and only upload those files.

NOTE: This script doesn't work very well is you have your files stored in an S3 bucket mounted as a directory with s3fs. This is because s3fs is not very good at metadata operations, like listing files and directories. The script currently loops through each file in the directory. As a workaround in this case, you can use sync-glacier2.py, which relies on the database to get metadata instead of the filesystem.


CREATE TABLE `tblDocs` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  `size` int(10) DEFAULT NULL,
  `archiveid` varchar(200) DEFAULT NULL,
  `archivevault` varchar(100) DEFAULT NULL,
  `archivedate` int(10) DEFAULT NULL,
  PRIMARY KEY (`id`),
) ENGINE=InnoDB AUTO_INCREMENT=13437 DEFAULT CHARSET=latin1;

About

A Python script to easily sync a directory with a vault on Amazon Glacier.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%