-
Notifications
You must be signed in to change notification settings - Fork 26
Blob Storage
ElasticInbox designed to easily store millions of emails with linear scalability. To achieve this ElasticInbox stores message sources on scalable Blob storage systems such as Openstack Object Store and AWS S3.
Metadata record for each message contain Blob URI in the following format:
blob://blob-profile/f1ca99e0-99a0-11e2-95f0-040cced3bd7a:[email protected]?c=dfl&e=ekey2
blob://db/f1ca99e0-99a0-11e2-95f0-040cced3bd7a?c=dfl&b=1
Blob URI above has 4 parts:
-
URI schema. Always
blob
. Currently this is the only supported scheme, showing that file is stored on blob storage. -
Host. Blob profile name configured in
elasticinbox.yaml
(see below).db
is a special profile indicating that blob is stored in metadata store (database). - Path. Unique file name generated by ElasticInbox based on Blob Naming Policy.
- Parameters. Various blob attributes such as compression algorithm, encryption key, total block count, etc.
It is possible to configure one or more blob stores in elasticinbox.yaml
. For instance, AWS S3 and OpenStack can coexist in the same setup:
### Blob storage settings
blobstore_profiles:
openstack-example:
provider: swift
endpoint: http://10.0.0.1:8066/auth/
container: elasticinbox
identity: user:elasticinbox
credential: mysecret
apiversion: 1.0
aws-example:
provider: aws-s3
endpoint: https://s3-eu-west-1.amazonaws.com
container: mybucket.mydomain.tld
identity: AWSGENERATEDID
credential: myverylongawssecret
NOTE: Once configured, blob store profile names should never be changed and should be identical on all ElasticInbox nodes. This is because profile names are stored in metadata (as seen above). ElasticInbox does not verify profile configurations on the nodes (this may change in future).
Messages are written to the blob store configured in blobstore_write_profile
parameter, but can be read from any configured blob store.
Each blob profile has following properties:
-
Provider. Blob store provider. ElasticInbox supports most of blob stores through jClouds. However, in default configuration only support for
aws-s3
(AWS S3),swift
(OpenStack),azure
(Microsoft Azure) andfilesystem
is included. - Endpoint. Endpoint for blob store.
- Container. Container name. On AWS S3 it's known as bucket.
- Identity. Identity or username of your blob store account.
- Credential. Credential or password of your blob store account.
- Api Version. Optional.
Support for multi-cloud blob stores enables even greater scalability and flexibility. For instance, you can start with public cloud such as AWS S3 and later move to private cloud. Migration process will be smooth and without downtime. This technique also can facilitate migration from legacy filesystem storage to ElasticInbox.
In hybrid mode, ElasticInbox can use metadata store for small and cloud storage for large blobs.
Email conversations without large attachments are typically producing blobs of a very small size. Depending on the nature of a mail service, number of small blobs (less than 24K) can reach 95% of total email traffic (based on our evaluations). Since most of the cloud providers charge per request, saving small files in local metadata store would be more efficient and economic.
To enable hybrid mode, define threshold of maximum blob size to be stored in metadata store (e.g. Cassandra) in the configuration file:
# Maximum blob size in bytes which can be stored in the database.
# Blobs larger than this value will be stored with the deafult blob profile (blobstore_write_profile).
# If compression enabled, this threshold will be applied to a compressed blob size.
# Set to 0 to disable using database as a blob storage. Maximum allowed value 128K.
database_blob_max_size: 32768
To disable hybrid storage set database_blob_max_size
to 0
. Currently maximum supported blob size in metadata store is 128K. Recommended value is between 12-30K. When compression enabled, threshold applied to the compressed blob size.
In order to save space and network traffic you can enable blob compression from the config:
# Compress objects written to the blob store (including database blobs)
blobstore_enable_compression: true
ElasticInbox uses standard Deflate algorithm (RFC1951) for compression. Compressed blobs will have c=dfl
parameter in URL. This is how compressed files are differentiated from uncompressed ones.
Security is important when considering public cloud. ElasticInbox supports encryption of all blobs before sending them to your cloud provider. For more see Encryption