Digital photo archiving and Amazon’s S3 online storage service.

It should be fairly safe to assume that any semi-serious digital photographer has some sort of archiving system in place, be it CDs, DVDs, second or third duplicate hard drives, etc. because we’ve all experienced some form of data loss. Generally one approaches this topic with the goal of finding an ideal physical storage medium… But what about all the recent talk about cloud storage?

And what is cloud storage anyway? It’s data stored on remote servers, typically owned by someone else, accessed through the internet. The benefit is that it’s physically separate from one’s other copies and prevents total data loss due to a local disaster (fire, storm, war, etc.). Downsides can include that it’s difficult to quickly transfer multi gigabyte file sets even with decent high speed internet access; third party reliability (being able to serve the archived files or even just staying in business for a few years); there are generally costs involved. While there are some free options, those have restrictions.

I’ve contemplated this for some time and about 4-5 years ago joined PhotoShelter as a possible solution. While PhotoShelter is an excellent service, it’s not a simple, no frills online storage service, therefore one pays for additional features which may or may not be relevant to one’s needs. For pure online data storage, I’ve decided to give Amazon’s S3 service a try.

What is S3? It’s pay as you go redundant online data storage at a reasonable set rate without volume restrictions. I can imagine some thinking, “well I already pay for web hosting, can’t I just use that?” Yes and no. It will depend a lot on your web host. You may get away with 5 or 10GB, but eventually you’ll be called out, as I discovered when I was too lax at letting certain ftp accounts accumulate. My particular host has a no storage clause tucked away in the TOS, meaning if the files are not relevant to the operation of the website, they must be removed. This generally means no bloated ftp accounts. For photographers it’s possible to keep files on the http side of the hosting account by creating web galleries, etc., but at some point the web host will grow unhappy. In my case the threshold seems to be in the 30GB neighbourhood. And I’m not even storing photos for the purpose of online archiving, just galleries created for various clients and projects that have added up and eventually have to be removed or down-rezzed. Here’s a link to an unhappy Dreamhost customer who thought she could use the service for storage purposes. Another option could be a service like Flickr or Smugmug (which is actually hosted on S3). For $2 per month, Flickr will allow unlimited storage, though there are bandwidth restrictions. My issue with Flickr is whether one could actually download a larger number of images quickly in one sitting, in addition to some slightly sketchy privacy issues. While Flickr does allow users to set privacy to eliminate public access, will it also ensure that none of their third party developers/partners will not have some sort of access to these files? You would think that they won’t, but am somewhat suspicious about this aspect of social networking sites.

Back to S3. If one spends some time over at the Amazon Web Services website it can quickly become confusing and/or overwhelming. Much of the AWS service is aimed at web developers. Photographers can skip straight to S3. Even there some of the jargon is a bit obscure for the average photographer. What matters is that it’s an online redundant storage solution at a relatively low price.

Why S3 and not some other service? I suppose this is somewhat subjective, but from my point of view the primary factors were cost and reputation. I’ll address cost a bit later, but in terms of reputation, Amazon is a huge company with the resources to create and maintain a redundant storage solution. The bottom line is they’re likely to be in business for the long run, which can’t be guaranteed for smaller operations. Of course I’m not advocating that anyone moves their entire archive to S3 in place of maintaining a local, physical archive. As we’ve seen from the recent great recession, there is no such thing as too big to fail, therefore don’t rely on anyone else as the sole solution. For some history on this from a photographic perspective, read up on the collapse of the Digital Railroad photo site.

How to set up S3:

Sign up for Amazon Web Services (AWS) via the link at the top of the page.
If you don’t currently have an Amazon account, sign in as a new user.
Sign up for S3 on the S3 page by following the link under the Products category, or go here and click on the Sign up for Amazon S3 button.

When you sign up for AWS, Amazon creates a set of unique access credentials. These are different from your sign-in credentials used to access your AWS account. The access credentials are two sets of alpha-numeric keys. One is called the Access Key, the other is called the Secret Access Key. They are respectively the user id and password for accessing your S3 account.

Amazon has a number of user guides to explain S3 via a number of methods such as javascript, php, etc., but are all aimed at web developers. Photographers can ignore these and instead use a number of current, photographer friendly tools. Photo Mechanic includes an upload function to S3, as does the ftp program Cyberduck. I’m sure there are other options, such as Firefox plug-ins I’ve seen discussed on other blogs, but my current workflow uses both Photo Mechanic and Cyberduck, so it’s a welcome option to be able to use familiar tools.

While Cyberduck is typically thought of as an ftp tool, S3 is not accessed via traditional ftp. But Cyberduck makes it appear as though one is accessing an S3 account via ftp, which keeps the user experience very friendly and consistent. One in fact accesses S3 via https, in other words, secure http. You might read somewhere that it is a folderless storage system, but not to worry, you can upload contents in folders and Cyberduck will translate it into the proper http for S3 and will display folders when you later access the files, allowing you to maintain a familiar organizational structure. One of the S3 terminologies which might seem a bit unfamiliar is what they call a bucket. It’s more or less a root folder that one creates in which to store files, and one can create up to 100 buckets. The one catch with buckets is they must be uniquely named; not just for one’s own account, but across all S3 users. Each bucket becomes the prefix of the http address for each file uploaded, meaning each file will have a unique, permanent web address. For example, if you’re uploading archive photos from 1999, and your name is Joe Smith, you could create a bucket called joesmitharchive1999. The bucket will appear as a folder in Cyberduck, into which you can then upload your files/folders. Because the system works via http, one can access the files from a web browser, but it’s not quite that simple. Each file is set to be either publicly or privately visible. If one uploads via an ftp program like Cyberduck, that setting is located in the preferences. But even if left at publicly visible, it’s not as if anyone will be able to see all of your uploads, or even find them with Google, etc. The only way to find the files is if the name of your bucket(s) and also the exact file name is known. For example: https://joesmitharchive1999.s3.amazonaws.com/january/filename.jpg where joesmitharchive1999 is the name of your bucket, which holds a folder called january, which in turn contains a file called filename.jpg. Since uploads to S3 are done via https (secure and therefore encrypted), chances are very remote that someone will be able to sniff out this information. So while it might seem like you’re putting all your files out there in cyberspace, the odds of someone finding them are very low if left as publicly accessible, and zero if set to private. If this is a concern, it’s possible to change Cyberduck’s preferences for upload, or after upload, to change privacy settings. On the flip side, there are benefits to having images stored with public access. One would be for serving files to a website, but based on what I’ve read from others, the S3 service can at times be fairly slow for this purpose. A better Amazon service would be Amazon CloudFront. An example would be if one has a popular video file that is frequently accessed but is on a bare budget web hosting account. If your site suddenly gets a huge spike in traffic, your host might suspend service due to what they deem excessive bandwidth use, even if you’re on a so-called unlimited plan. Another aspect to S3 is remote file delivery to clients, such as if you’re away from the office and don’t have direct access to your regular archive. It would be possible to send a client a given file’s S3 web address and allow them to download it at their leisure. Or you could access it from the web link, or via Cyberduck, then email it to them. Client delivery via S3 could be quite useful for very large files. The only catch is it will cost you money. While it’s only pennies, it could add up with a relatively large file linked to a popular web page.

This is a good time to address costs. S3 is an attractive option for long term storage where files will be infrequently accessed, most typically only for disaster recovery or when access to one’s main archive is not possible.

Full pricing details can be seen here.

As of November 2010 (prices in $US):

Storage (Designed for 99.999999999% Durability)
$0.140 per GB – first 1 TB / month of storage used
$0.125 per GB – next 49 TB / month of storage used

Reduced Redundancy Storage (Designed for 99.99% Durability)
$0.093 per GB – first 1 TB / month of storage used
$0.083 per GB – next 49 TB / month of storage used

Data Transfer

$0.000 per GB – first 1 GB of data transferred out per month
$0.150 per GB – up to 10 TB / month data transfer out

Requests
$0.01 per 1,000 PUT, COPY, POST, or LIST requests
$0.01 per 10,000 GET and all other requests (DELETE is free)

As of late June 2010 Cyberduck now supports the Reduced Redundancy Storage (RRS) option. It can be set as the default in preferences under the S3 tab, or individual files can be changed between regular and RRS via the info window.

For example, PhotoShelter’s basic plan offers 10GB for $10 US per month. For the same price, one could store 66.7GB on S3 at the regular rate, or 100GB at the reduced redundancy rate, though that’s not taking into account upload/download fees. But for pure storage, S3 is attractive and scales incrementally rather than in PhotoShelter’s coarsely tiered jumps (though as of July 1, 2010, additional storage rates have been reduced by as much as 50%). Where one has to be careful with S3 is if using it as a file hosting solution because one has no control over how often visitors access a given page. With a very popular site, thousands of hits could quickly add up to relatively significant fees. In this respect, the best plan would be to host as much as possible with a regular ‘all inclusive’ web host and push it as far as possible while using S3 purely as an online long term archive.

My immediate goal for S3 is as an online archive for client work. The benefits are twofold: Mitigate potential data loss in case of a local disaster, either for me or for a client. Access client work while traveling because it’s happened enough times that I’ll be out of town and get an email from a client that can’t find a given file. In the longer term I will also archive personal work, but first I want to see how much storage I’ll require for client work and how much it will cost. $0.15 per GB seems cheap, and it is, but considering that my current client archive is nearing 2TB, it would translate into $300 per month simply for storage, which isn’t cheap (for me). Therefore I’ve decided on the following plan: Only final Jpeg images will be archived to S3. Images greater in resolution than 10MP will be resized to 3600 pixels and saved to a final size of approximately 2.5MB per file. At the same time as the upload to S3, I’ll create a low resolution (800 pixel) set of images to keep on my laptop. This low resolution set will allow me to quickly search for and identify images, even when on the road, in order to quickly find them in S3. It’s currently the only solution I can think of to offset the fact that there is absolutely no graphical user interface with S3, meaning there is no way to browse and identify images in S3 without first downloading them (an advantage with PhotoShelter or Flickr, etc.).

So there you have it. If you’re fed up with random hard drive crashes and the hassles of keeping things organized in a local archive, consider Amazon’s S3 service as an additional measure for safekeeping your valuable digital images. While there are other online archiving options available, such as the fully-featured PhotoShelter site, S3 offers a simple, low cost, redundant and secure service for those looking to park images (or any other files) as a long term, low traffic archiving solution.

3 thoughts on “Digital photo archiving and Amazon’s S3 online storage service.”

  1. Very good information. And very good writing skills.
    I’ve been considering using S3 for a long time but always got confused with the pricing.
    Now I feel fairly confident about S3 after reading your article.
    Thanks for sharing 🙂
    Saloni

  2. I recently signed up for S3 to archive all of our baby photos as a recent HD crash got me thinking about what a catastrophe it would have been if we hadn’t had the original on a portable HD. My only question/problem is that it has literally taken it ALL DAY to upload my photos. Is this normal?

Comments are closed.