I’ve been archiving photos. I have a lot.
I haven’t always been good at tracking all the pictures I’ve taken. But a while back, around the same time I bought my last desktop computer and actually had hard drive capacity to put them all in one place, I decided I wanted to give Picassa a try for managing my collection. Before that point I had little micro collections, and collections with a lot of overlap between them, on CD-Rs, DVD-Rs, external hard drives, internal drives, memory sticks, memory cards, and scattered across a collection of systems. Picassa didn’t force me to organize them, but I did at that time put in the effort of amalgamating the collections onto one drive on the new computer, a mega collection I’ve since diligently kept updated as my central collection.
That was step one.
A few weeks ago I started thinking about how I could backup, archive, and protect that collection. The thing is that when I do a quick ‘get info’ on the folder, there are currently one hundred and sixty gigabytes of data. One. Six. Zero. Followed by nine zeros. Or, about twenty DVDs worth of photos.
Online or cloud backup is not impossible, but there are limitations: simply transferring that much data to anywhere is time and resource intensive, not to mention that all that space would cost me significantly for remote hosting — or at least it isn’t going to be free. Couple that with the fact that 160GB of photos is roughly a hundred thousand files, and… well… data management issue.
I decided to simplify my problem using a three-fold approach.
Fold one… compress and zip. Most modern zip programs allow you to, through some clever and somewhat hidden options, create large zip files that break into parts. So on, say, my 2006 photos (which I was just working on and know the numbers offhand) a folder with 26GB of photos, I compress and break it into one zip file with 260 x 100MB parts. In the end of this effort I have one archive made up of two-hundred and sixty files — not compressed much because a lot of the files are JPGs and already compressed — rather than the eight-thousand or so files and folders I had before.
Fold two… parity file creation. A little trick I learned back in the days when I would occasionally download stuff from newsgroups was the parity file. A clever little program takes a collection of files — a set of zip-parts, for example — and analyzes them. The result is a collection of parity files — PAR files. The point parity files is that, if anything is damaged in storage or transfer of the original files, the information to restore everything back up to working order — with up to (default settings) 10% degradation of the originals — can be quickly done if you have parity files in their place. Don’t ask me about the math or science… it just works. But I’ve been taking my large collections of zip-parts, sub-dividing them into (max) 100-part groups, then creating 10% parity files based on those originals.
Fold three… multiple and scattered backups. When I’m done running all the little compressions and parity software, a process that will take hours and hours of CPU time before it’s done, I’ll have somewhere between 2000 files representing 180GB of data. The plan — as I’ve slowly started implementing already — is to create at least two copies of each of those files somewhere; maybe DVDs, maybe external drives, maybe scattered on a couple of cloud-services. If disaster every strikes and I lose my original Picassa folder, I find the parts to rebuild that collection. If some of those files are damaged, I look to the second backup. And if both copies of the backup are damaged, I rebuild it with the parity files.
It is an epic effort, but ten years of photography — and an ongoing plan for keeping future photos safe — is probably worth it.