Photo Archiving Adventures

8 April 02012 (436 days ago)629 views4 minutes of your time

I’ve been archiving photos. I have a lot.

I haven’t always been good at tracking all the pictures I’ve taken. But a while back, around the same time I bought my last desktop computer and actually had hard drive capacity to put them all in one place, I decided I wanted to give Picassa a try for managing my collection. Before that point I had little micro collections, and collections with a lot of overlap between them, on CD-Rs, DVD-Rs, external hard drives, internal drives, memory sticks, memory cards, and scattered across a collection of systems. Picassa didn’t force me to organize them, but I did at that time put in the effort of amalgamating the collections onto one drive on the new computer, a mega collection I’ve since diligently kept updated as my central collection.

That was step one.

A few weeks ago I started thinking about how I could backup, archive, and protect that collection. The thing is that when I do a quick ‘get info’ on the folder, there are currently one hundred and sixty gigabytes of data. One. Six. Zero. Followed by nine zeros. Or, about twenty DVDs worth of photos.

Online or cloud backup is not impossible, but there are limitations: simply transferring that much data to anywhere is time and resource intensive, not to mention that all that space would cost me significantly for remote hosting — or at least it isn’t going to be free. Couple that with the fact that 160GB of photos is roughly a hundred thousand files, and… well… data management issue.

I decided to simplify my problem using a three-fold approach.

Fold one… compress and zip. Most modern zip programs allow you to, through some clever and somewhat hidden options, create large zip files that break into parts. So on, say, my 2006 photos (which I was just working on and know the numbers offhand) a folder with 26GB of photos, I compress and break it into one zip file with 260 x 100MB parts. In the end of this effort I have one archive made up of two-hundred and sixty files — not compressed much because a lot of the files are JPGs and already compressed — rather than the eight-thousand or so files and folders I had before.

Fold two… parity file creation. A little trick I learned back in the days when I would occasionally download stuff from newsgroups was the parity file. A clever little program takes a collection of files — a set of zip-parts, for example — and analyzes them. The result is a collection of parity files — PAR files. The point parity files is that, if anything is damaged in storage or transfer of the original files, the information to restore everything back up to working order — with up to (default settings) 10% degradation of the originals — can be quickly done if you have parity files in their place. Don’t ask me about the math or science… it just works. But I’ve been taking my large collections of zip-parts, sub-dividing them into (max) 100-part groups, then creating 10% parity files based on those originals.

Fold three… multiple and scattered backups. When I’m done running all the little compressions and parity software, a process that will take hours and hours of CPU time before it’s done, I’ll have somewhere between 2000 files representing 180GB of data. The plan — as I’ve slowly started implementing already — is to create at least two copies of each of those files somewhere; maybe DVDs, maybe external drives, maybe scattered on a couple of cloud-services. If disaster every strikes and I lose my original Picassa folder, I find the parts to rebuild that collection. If some of those files are damaged, I look to the second backup. And if both copies of the backup are damaged, I rebuild it with the parity files.

It is an epic effort, but ten years of photography — and an ongoing plan for keeping future photos safe — is probably worth it.


Your Turn...

8 Comments »

  • Chris Christou said:

    My plan is to archive everything to two sets of DVDs, and then store the second set at another family member’s house. Later on I’ll buy a cheap external drive as well, but this is a ‘good enough’ start.

  • 8r4d (author) said:

    Gotta start somewhere. I’m thinking no matter what I do, it’s going to be a long and costly process. I will say though, I much prefer to be taking the pictures. ‘Watching files compress’ is the new ‘watching paint dry.’

  • Chris Christou said:

    JPG files don’t compress much. I’m not sure about RAW files. Are you trying to compress your files to have one larger cabinet file? I’d think its only worth spending time on archiving (and possibly parity files if it gives you peace of mind) and not on compressing. If one large file corrupts, you risk losing all the cabinet’s files. If one portion of a disc/drive corrupts, you only risk losing select files.

  • 8r4d (author) said:

    Actually it’s a complicated sort of workaround to an online storage/backup issue. Through the shared hosting package I use for this site and my gallery, etc, I actually have a whopping 250GB of storage (of which I’m only using about 6GB right now) where I *could* (and eventually hope to) upload all those photos. But oddly enough the host limits me to a set quota count of files — a quota I’d eat through with only about 20% of my photos, given with all the other stuff I’e got online it’s about 75% full already. I wanted to create a set of files I could (at some future point) archive on my hosting package if I decided to go that route, even if that means uploading them at a trickle-pace over the next 6-12 months to avoid over-taxing my bandwidth. IE, keeping the file count low, even though all the data is there, helps me avoid the quasi-deceptive limit of the host.

  • Chris Christou said:

    Make sure you check your terms of usage. I considered that, but I’m not allowed to use my web hosting for personal storage ala Dropbox :/ It does seem like the penultimate way to archive all ones photos though.

  • 8r4d (author) said:

    Good tip. Though I specifically have activated (and am using on a small scale) a cloud storage feature in my hosting package, so it shouldn’t be a problem.

  • Stephen said:

    I have been archiving everything we have to two sets of DVDs (storing the second set offsite) and mirroring to a couple of hard drives (individual files and DVD iso images).

    We have a collection of personal photos from multiple cameras that is just over 400GB.

    I have been making md5/sha1/sha512 checksums of all files. This is important, as I have found files from the first photos (circa 2002) that have errors from DVDs rotting. That has resulted in one or two DVDs being reburned from either the second set (which checked out) or the iso file on the hard drive.

    I have written some scripts that simplify the process. Some sort the photos from a common folder to a reasonable directory structure $(Camera Owner)/$(YEAR)/$(MONTH)/$(DAY). I have other scripts that take care of all the checksums (actually these are not used anymore with md5deep and other utilities taking care of it all in a command line or two “md5deep >> md5sums.txt”). Sometimes a script will take all day to run (especially when processing tens of GB of files), but I don’t have to sit and watch it run.

    In terms of the offsite HD storage/dropbox/web storage issue, perhaps the best solution may be to populate a HD with a current image, and then trade HDs with a friend with broadband. Then every month or two sync them up (rsync via ssh).

    Don’t bother with zip files. On a DVD, you can at least put the disk into any computer (and some DVD players) and immediately start working with your photos.

  • 8r4d (author) said:

    All good ideas, too. Though, I’m still liking the idea of going a step further than just checksums and generating the parity files for the photos. Fights the “disc rot” syndrome if nothing else. I know the zips are a hassle, too, but I still have that file count problem w.r.t. online storage if I leave those 150K+ files unzipped.


About the Author
Brad has been filling the web with half-witted observations about his little universe for nearly as long as the web has been around. His first website was an awesome collection of animated GIFs displayed on a white background. (Did I mention it was awesome?)