Uploading images to Cloudinary directly from S3

If you want to serve images from a CDN for improved performance, Cloudinary provides a managed hosting solution: imagine a CMS-style, Flickr-like web interface, with something scalable like Amazon S3 sitting behind it all. Indeed, Cloudinary's API offers an option of uploading images from S3 without having to down- and upload them to your client host, so I sneakily suspect they make heavy use of it themselves! You do need to ask them to manually whitelist your S3 bucket before you start, but that's fairly straightforward.

Recently, a prospective client asked me about automating the importing of many images from S3 to Cloudinary. Below I provide an improved version of the prototype I built to investigate their request. There are three files: a proxy handler object to wrap up both S3 and Cloudinary APIs; a settings file; and a simple executable to bring the two together.

S3-to-Cloudinary proxy handler: cloud2cloud.py

Although our local scripts never download images directly, they do need to query S3's buckets, in order to retrieve a list of (potentially thousands of) images. They then send the filename of item on this list in turn to Cloudinary, with instructions to remotely upload them from the relevant bucket. The proxy object must therefore include both the boto (S3 and other Amazon webservices) and cloudinary Python libraries as shown below:

#!/usr/bin/env python
 
import boto
import cloudinary.uploader
 
class Cloud2Cloud:
    """
    S3-to-Cloudinary proxy handler.
    """
    s3conn = None
    bucket_name = None
 
    def __init__(self, settings):
        """
        Set up an S3 connection instance; configure global Cloudinary settings.
        """
        self.s3conn = boto.connect_s3(settings.s3_access, settings.s3_secret)
        cloudinary.config(
          cloud_name = settings.cloud_name,
          api_key = settings.cloud_access,
          api_secret = settings.cloud_secret
        )
        self.bucket_name = settings.s3_bucket
 
    def process_bucket(self):
        """
        Process the filenames of items in an S3 bucket.
        """
        bucket = self.s3conn.get_bucket(self.bucket_name)
 
        for key in bucket.list():
            self._process_key(key)
 
    def _process_key(self, key):
        """
        Proccess an S3 bucket item by telling Cloudinary to upload it.
        """
        s3_url = 's3://%s/%s' % (self.bucket_name, key.key)
        print "Transferring '%s' directly..." % key.key
        cloudinary.uploader.upload(s3_url)

I hope the structure is self-explanatory: set up the proxy object; process an S3 bucket; which in turn processes each filename by instructing Cloudinary to upload it. At no point are the images themselves present on the machine running this script.

Settings: settings.py

A simple settings file can be used to store all configuration required. Here's an example below:

# You will need to contact Cloudinary support and ask
# them to whitelist the S3 bucket mentioned here
s3_access = '...'
s3_secret = '...'
s3_bucket = '...'
 
cloud_access = '...'
cloud_secret = '...'
cloud_name = '...'

I don't include a shebang line (!#) as this code isn't really meant to be executed directly as a command, but there's no harm in doing so.

You can check in this empty example settings file as e.g. example.settings.py, but never check in your live settings: your S3 and Cloudinary secret keys should stay secret. Consider configuring your version control to always ignore files called settings.py e.g. via a .gitignore file.

(The cool kids use environment variables these days, of course: that option has its own associated difficulties, but if you're interested try that instead.)

Executable: go.py

A simple executable ties these two files together. It's the simplest application of dependency injection: the proxy doesn't hardcode any particular settings; instead, the executable takes a settings bundle and fires it into the proxy's __init__() constructor method.

#!/usr/bin/env python
 
import settings
from cloud2cloud import Cloud2Cloud
 
c2c = Cloud2Cloud(settings)
c2c.process_bucket()

And that's it. c2c.process_bucket() is basically your go() function, and the proxy will now go off and process your bucket.

As I said at the start, this code has evolved from effectively a prototype. As such, it's quite basic and you might want to do work on it yourself before running it in a production environment. But I hope it shows how easy it is, to manipulate these two cloud-based APIs in turn, and watch your own script silently tick away, while the services do all the heavy lifting!

Comments

Hi, please check out auto-upload mapping. You can set a folder in your Cloudinary account to read from your S3 bucket (with a prefix, optionally) automatically when an image is accessed.

Google "Cloudinary Lazy migration and automatic upload of S3 images" (I can't post links here)

Cheers,

Ran