huImages provides an infrastructure for storing, serving and scaling images. It might not work for facebook or flickr scale image-pools, but for a few hundred thousand images it woks very nicely. Currently it only supports JPEG files.
When you upload an image, you get back a alpha-numeric ID back for further
accessing the image. You can get the URL of the Image via imageurl(ID) and
scaled_imageurl(ID)
. You can get a complete XHTML <img>
tag via
scaled_tag(ID)
.
This module uses the concept of "sizes". A size can be a numeric specification like "240x160". If the numeric specification ends with "!" (like in "75x75!") the image is scaled and cropped to be EXACTLY of that size. If not the image keeps it aspect ratio.
You can use get_random_imageid()
, get_next_imageid(ID)
and
get_previous_imageid(ID)
to implement image browsing.
server.py
implements the image servin infrastructure. huImages assumes that
images are served form a separate server. We strongly suggest to serve them
from a separate domain. This domain should have which been used for cookies.
This is because the existence of cookies usually badly hurts caching
behaviour. "yourdomain-img.net" would be a good choice for a domain name. We
use "i.hdimg.net." for that purpose.
In the first few Versions of huImages Meta-Data and Images where stored in CouchDB. After the fist few dozen Gigabytes it turned out that the huge database files are a kind of headache and we moved to Storing the actual original image data to Amazon S3. The server still is able to handle Content stored in CouchDB and migrates it automatically to S3 where the need arises.
server.py works with any fast FastCGI compliant Webserver. It needs the
Flup toolkit installed to interface to a FastCGI enabled Server. We
use lighttpd for connectiong to it and server.py
contains
configuration instructions for lighttpd. Of course you also can use other
httpd servers instead.
server.py assumes that you have a filesystem which is able to handele very large cache directories with no substential preformance penalty. We have been running the system on UFS2/dirhash and XFS systems with success but it should also work well on modenrn ext2/3 implementations with directory indexing.
When a image is Requested and the original image is not in the Cache, the original is pulled form CouchDB/S3 and put into the filesystem cache. Then the [Python Imaging Library (PIL)7 isused to generate the scaled version of the image. The result is cached again in the filesystem and send to the client.
If the image is requested again, it is served directly from the filesystem by
lighttpd without ever hitting the Python based server.py
.
If you are short on diskspace fou can expire files from the cache directory by just removing the oldest file until you have enough space again.
We will show installation on a Ubuntu 9.10 based Amazon EC2 instance. huImages should qork on every POSIX system but requores a recent CouchDB version. I assume you have an EC2 environment up and running and your EC2-SSH key is named "ssh-ec2" and located in the current directory.
INSTANCE=`ec2-run-instances ami-a62a01d2 --key ssh-ec2 --region eu-west-1 | cut -f2 | tail -n1`
sleep 60
IP=`ec2-describe-instances $INSTANCE | cut -f 17 | tail -n1`
ssh -i ssh-ec2 ubuntu@$IP
You now should be logged into the new Amazon instance
sudo apt-get update -y
sudo apt-get install -y couchdb lighttpd git-core python-pip python-boto python-imaging python-couchdb python-flup
sudo git clone git://github.com/hudora/huImages.git /usr/local/huImages
cd /usr/local/huImages
sudo mkdir /mnt/huimages-cache
sudo ln -s /mnt/huimages-cache /usr/local/huImages/cache
sudo cp examples/lighttpd.conf /etc/lighttpd/lighttpd.conf
sudo vi /etc/lighttpd/lighttpd.conf
Change %%AWS_ACCESS_KEY_ID%%
, %%AWS_SECRET_ACCESS_KEY%%
and %%S3BUCKET%%
to the appropriate values.
sudo /etc/init.d/lighttpd restart
sudo chown www-data.www-data /mnt/huimages-cache /usr/local/huImages/cache
curl -X PUT http://127.0.0.1:5984/huimages
curl -X PUT http://127.0.0.1:5984/huimages_meta
Now you can start putting images into the Database.
Now you can start putting images into the Database. If you don't run on the same Server, you must find a way to make CouchDB accessible to the client. Running a CouchDB cluster on Amazon EC2 might be a good startingpoint. An other (easier) approach is simply running the client on the same machine as the server. Under extreme circumstances image serving can happen without access to CouchDB but you loose some of the features.
Now ensure the required environment variables are set. Here are some sample values:
AWS_ACCESS_KEY_ID=AAOWSMAKNATAM5
AWS_SECRET_ACCESS_KEY=aHo789V1H1Kzrs3yIaj7Uvxtskz6fUvgpa6n
IMAGESERVERURL=http://i.hdimg.net/
HUIMAGESCOUCHSERVER=http://admin:[email protected]:5984/
HUIMAGES3BUCKET=originals.i.hdimg.net
Now you should be able to use it like this:
>>> import huimages
>>> imagedata=open('./test.jpeg').read()
>>> huimages.save_image(imagedata, filename='test.jpeg')
'23EQ53G6WZTGF5675CUJQFKBIS6UWWOL01'
>>> huimages.imageurl('23EQ53G6WZTGF5675CUJQFKBIS6UWWOL01')
'http://i.hdimg.net/o/23EQ53G6WZTGF5675CUJQFKBIS6UWWOL01/test.jpeg'
>>> huimages.scaled_imageurl('23EQ53G6WZTGF5675CUJQFKBIS6UWWOL01', size="150x150!")
'http://i.hdimg.net/150x150!/23EQ53G6WZTGF5675CUJQFKBIS6UWWOL01/test.jpeg'
>>> huimages.get_length('23EQ53G6WZTGF5675CUJQFKBIS6UWWOL01')
87761
>>> huimages.scaled_dimensions('23EQ53G6WZTGF5675CUJQFKBIS6UWWOL01', '320x240')
(240, 240)
Call pydoc huimages
for further documentation. Most useful ist
scaled_tag()
which can create an image tag including dimensions for faster
rendering and tries hard to generate a meaningful file name and alt tag to
make the image easier to be found by search engines. You can use the
environment variable HUIMAGESALTADDITION
to add an extra text to all alt
tags
Malicious users knowing the ID of an image can consume great amounts of CPU time, bandwith and diskspace.
Users knowing the ID of an image can pass that one to unautorized users.
Nobody should be able to see images on the server unless he knows the ID or has access to the CouchDB or S3 bucket. Be sure that your S3 bucket does not provide public read access!
This distribution includes huimages.imagebrowser
, a Django[django Application using huImages to produce
a (very basic) flickr like experience. It allows uploading and tagging of images and browsing by tag. Upload
comes with a multi file uploadimplemented with SWFUpload.
- huDjango comes with an hudjango.storage.ImageServerStorage, which integrates huImages and Django
- Blogpost about image Serving (in german)
- django-photologue - somewhat similar application