Jump to content

Archiving 216,000 images - one approach


Recommended Posts

Thought this might be of interest. I work for a largish newspaper which

is switching to a whole new system for handling pictures, both the live

feed from the wire services, and our own staff pictures. We store every

picture we've actually published, wire or staff, plus 3 weeks worth of

"current" wire photos - total; 216,000+ images at the moment.

 

I asked the technology editor, who has been leading the team designing

the hardware/software system, what we are using to store/backup this

image load. The answer - a 3.5 Terabyte hard-drive array, with 2 more

3.5 Tb arrays as mirrored backup (one off-site). 10.5 Tb in all.

 

She told me several interesting things that went into their decisions

in designing the system.

 

a) Expect to transfer everything to new media every 5 years, regardless

of what you use. NOTHING is reliable after 5 years, and you'll probably

need new connectivity options anyway (firewire 1600, 3200, USB 3.0, 4.0

will get here eventually). So use hard drives - they are as reliable as

anything else up to 5 years, and much, much faster to replicate when

the time comes. So no tape, no MO, no CD/DVD.

 

b) Her take on CD/DVDs - think of them as sticky notes. Great for

communicating with people, but you wouldn't write your marriage licence

or will or other documents you had to preserve on a sticky note (well,

maybe Brittany Spears (sp?) would - grin).

 

c) Her final comment? "If you really want it to last? Well, you COULD

just keep shooting film." I kid you not.

 

Incidentally, the individual photographers write their "takes" to CDs -

one per day. One photographer showed me his CD collection - which fills

a box for a 20" CRT tube. He HATES trying to find old pix.

 

At any rate, that's one organization's approach to mass image storage.

Link to comment
Share on other sites

A couple of years ago it might not have been so feasible, but nowadays hard drives are

cheap. If you're producing a large amount of digital photography and you want it to last

forever, set up a RAID. A couple grand in storage costs shouldn't mean anything. (if you

need help, remember what you would have spent on film) MO is a very reliable way to back

up but its fairly expensive.

Link to comment
Share on other sites

Well that's a pretty sound approach - the latest HD's having an MTBF of 1 000 000+ hours and HD technology having been around for 40+ years makes it all mature. However, I think rules such as "5 years" are too hard and fast - it's all got to do with probabilities and the status of the storage array should be monitored instead of relying on such simple rules.

 

Shooting film has its own problems too: storage materials, temperature and humidity. Not really easier than digital, but different.

 

I'd also advice to use common image formats, not exotic ones, as conversions can be time taking if support ends for a format.

Link to comment
Share on other sites

I am just an individual but have chosen recently the "same" aproach, adapted to my

scale.<br>

I did this after realizing that I'd have to copy all my CDs to DVDs to keep double copies of

everything since I had not done it yet. Finaly Hard drives are cheaper per Mb, quicker and

easier to scan... So I use an external HD for the "off-site" copy, and another in the

computer. Easy and as safe as anything esle.<br>

As for film, I know a friend who was victim of a flood, his film was lost... So I guess even

film isn't that safe !<br>

Lenny<br>

<a href="http://afimage.com">AFimage.com</a>

Link to comment
Share on other sites

In one RAID system we installed at a customer; while I was working at a disc drive company; they had a direct hit by lightning; that fried the power supplies. The path of the lightning that killed the drives was thru the modem; ie administrators dial up link to check on the server. This was when Raid first came out. The entire rack mount of drives mostly was breached. We had to have post morteums done on the drives. The hda's had arcing paths fom the recording heads to the discs. We saved alot of the data; via expensive data recovery methods. The customer wasnt impressed with the so called RAID system being bullet proof. The rack mount system's grounding was done by an expert ESD & lightning engineer; with decades of experience. Every time I here that a RAID system is always bullet proof; I think of the mess we had in Florida at a huge client; who lost data.<BR><BR>A good friend had his data on his PC in a RAID IDE config; with new drives. The power supply failed; and destroyed 3 out of 4 hard drives. He is a computer consultant; with experience back to the CPM days. Now he uses drives on separate PC's; so one power supply failure doesnt again kill all the drives in his PC. <BR><BR>Using a raid system; having critical data on another PC; and using CD/DVD is what we use at our place. We also upload to a secure server; a customers scanning work we do. With many non related storage areas; the data is very secure.
Link to comment
Share on other sites

Working in the computer industry, I would NOT archive to RAID subsystems alone. Many, many people have experienced single drive failures and the like, from which a typical RAID system can easily recover. But many have also had other failures than have lead to significant loss of data. Since the drives are Read-write, there is also the risk of inadvertently overwriting or wiping the data.

 

If you want a real archive system, you also need to write to Read Only or write protectable media, and as the original poster suggested, plan on upgrading the media and storage sub-systems on a periodic basis. You also need to randomly test image retrieval on a quarterly (or so) basis.

 

Take care!

Link to comment
Share on other sites

"Her final comment? "If you really want it to last? Well, you COULD just keep shooting film." I kid you not."

 

And she wasn't kidding either.

 

"One photographer showed me his CD collection - which fills a box for a 20" CRT tube. He HATES trying to find old pix. "

 

I'm sure he does. And I'll bet that it would be easier to find an image in his collection if it had all been on film.

Link to comment
Share on other sites

I used to work in the microfilm business where you have to keep track of large numbers of images.

 

The key is to have a system for identifying each roll or CD and making some kind of record on paper or in a computer of what is on each roll of film or CD.

 

When you need to find something you scan the index either visually or electronically to get the right roll or CD, and then you scan the film or CD to get the image.

 

Indexing can be done at varying degrees of granularity. It takes a lot of time to index each frame, so just knowing that a particular roll or CD has the pictures from a given story might be enough.

 

Good practice is to build your index day by day as you shoot the pictures. That avoids the huge burden of going back through huge piles of undocumented material.

Link to comment
Share on other sites

Kelly, I checked some numbers and while there seems to be some confusion, IBM itself reports having shipped the RAMAC 305 "storage system" (ie. HD) in 1957 (http://www-1.ibm.com/ibm/history/history/decade_1950.html), the first disks having been developed around 1953 as you say, but the RAMAC 305 being the first commercial product. In any case, I remembered the date to having been around 1959, so I was a bit mistaken. However, interesting trivia... :-)

 

<p><i>The key is to have a system for identifying each roll or CD and making some kind of record on paper or in a computer of what is on each roll of film or CD.</i>

 

<p>Exactly what I do. Digital searchable metadata (and some digital "contact prints") go a long way to finding the right picture.

Link to comment
Share on other sites

The idea that with thousands of images it is easier to find one if you use film is utter bull. It is FAR easier to find an image with a computer based storage and backup system than film provided you follow these simple steps:

 

1. use a database that contains all the EXIF info for the photos and a small thumbnail.

 

2. Add meta data to each image (or group of images) when you upload them into the computer the first time. Add things like event names, people's names, category, etc. This is the key and if you do it religiously and the comments are relevant, no issue.

 

3. Load images into your database.

 

4. When you archeive your images have a field in the database saying where it gets archeived.

 

5. Keep a copy of your database with your on-site and off-site archieves.

 

Now you are done. When you want to look up a photo just type in a search criteria and your computer tells you the rest. Want photos of mom, type in mom and in seconds you have all the mom photos you EVER shot. Want to restrict it to this year, filter for year. Want to know what lenses you use the most, write a query that gives a % use by lens (it's all in the EXIF data). Imaging how long it would take to get this information from a collection of thousands of slides (let alone negetives).

 

That's it.

Link to comment
Share on other sites

I totally agree with the three points made. I was using CDs before, and

was awaiting for DVD prices to drop (which they have, by a factor of 3

for readers, and 5 for media), but in the while I concluded that even

DVDs are a waste of time. Hard drives are not that

expensive. Electronic stores regularly have sales during which it is

possible to get (after mail-in rebates) 250G drives for about $125.

This is just 1$/G if you mirror your drives so that you can unplug one and store it somewhere else. <a href = "http://terragalleria.com/stock-photography.html">Terra Galleria stock photo</a>

Link to comment
Share on other sites

Oskar; the actual IBM drive was about 1956; but it took several years to design.<BR><BR>MTBF has ALWAYS been abit of a bull-dung/BS spec in the disc drive industry. Marketing at company A releases a higher number than B; and B then makes a higher number in the next glossy spec released; for a new drive.<BR><BR>Most of you dont know the BS that goes into "creating" the MTBF specs at the lessor companies; much sadly is totally BS. The better companies run calcs on each compoment; ie MIL spec MTBF's; but none of the new chips are ever in these old specs. The mechanics are run thru alot ot testing; ie accelerated tests; to try to weed out possible failure modes. This doesnt catch unknown failure modes; corrosion of the media; pole tip corrosion of the heads; media lubricant failure; leaks in the hda's ; manytimes. <BR><BR>Sometimes the field failures are directly opposite to the MTBF gurus models. Once we installed temp probes on drives; and recorded failures at a customer. The hotter running units had radically less failure rates; than the cooller running units. This was directly opposite to the gurus models. <BR><BR><h3><b>A claimed MTBF of 1,000,000 hours does NOT mean a SINGLE drive will last for a million hours.</h3></b><BR><BR><BR>What it does mean; is the disc company claims the MTBF of a large set of drives is 1 million hours; over a maybe 1 or 2 year period. MTBF of the group of drives drops radically as the group ages.<BR><BR>A MTBF of 1,000,000 means a group of 1000 drives will have one fail every 1000 hours; over the brief sweet time the drive is warranted; in a big contact; with a major customer. When the computers are scrapped out 4 years later; the MTBF of the group might only be 10,000 or 100,000 hours; so a drive in the group fails every 10 or 100 hours. <BR><BR>If a population of older computers has a poor MTBF of the drives; the MIS guys will often scrap out the group quicker than normal. <BR><BR>Drives are alot more better today. I have some that are from 1985 that still work well. <BR><BR>When a drive has many thousands of start stop cycles; and is old; the heads tend to stick to the media; no matter how well the heads and media design is sweated. The accelerated testing that I used to do at drive companies always gives way better results than a user will see with a computer than is rarely turned on. In Humid areas; and with certain chemicals in the air; the combop is abit deadly; sometimes the spindle motor rips the heads of the flexures; instead of the sliders flying. This is why the MTBF spec should be used as a grain of salt; and common sense used in data storage. Some chemicals actually degrade the heads in a drive; and the failure rates are massively lower than claimed.<BR><BR>disc drives are often placed in pressurized chambers; when the used in a satelite; ie vacuum of space. Here the chamber is higher than atmospheric pressure; ie higher than 14.7psia. This is done so leakage testing is easier sometimes. With the higher pressure; the heads fly higher; many times one either uses higher gram load flexures; or a lower bit rate with the higher fly height.<BR><BR>As my first grade teacher said; I know alot about science; but need improvement on spelling and English.! :)
Link to comment
Share on other sites

>"One photographer showed me his CD collection - which fills a box for a 20" CRT tube. He HATES trying to find old pix. "

 

Then his system sucks. I look into my database and find what I'm looking for in under ten seconds with my keywords. Look at the number, open up the box containing those dates--there are the negs.

 

[Mod: That's because he does not follow this forum ;)]

Link to comment
Share on other sites

<p><i>the actual IBM drive was about 1956; but it took several years to design.</i>

 

<p>Yes, I'm aware of that - my reference was to the time the actual product was available and being shipped (however, I wasn't around back then so I can only cite sources :-)

 

<p>You also summed up MTBF pretty well. My original reference to MTBF was mostly pointing out that HD technologies are quite mature (even you have to admit that MTBF isn't 100% bull...), with the emphasis being on <b>regular checks on data integrity</b>. You can take five years or 1 million hours, but this are still just hard and fast rules which poorly approximate the actual fail time for the drive array. Suffice to say is HDs are better now than ever, but nothing lasts forever and nobody knows when it will fail or how (slowly increasing amount of bad sectors or total crash.)

Link to comment
Share on other sites

Oskar; thanks for the reply. <BR><BR>The reason I mentioned a sample of drives for MTBF; is because the disc drive companies I worked for had customers who kept track of their herd of drives; in the PC's they built. Many times folks not in the drive industry wrongly assume that MTBF is a fixed number; for a specific model of drive. It actually is a failure rate over the "usefull period of operation"; which might be 1, 2, 3, 5 etc years; depending on the customer.<BR><BR> As the population of model XYZ year 2004 build of disc drives ages; the MTBF drops each year; then it may really tank (drop like a rock); when secondary failure modes kick in. This massive drop is what is never really possible to be modeled well; since older drives maybe left off more; become 2nd hand drives; corrosion might kick in.<BR><BR> A decent analogy is light bulbs; or people. A household bulb here in the USA might be rated as 750 hours; an average person might live 75? years. At 250 hours for the bulb; or age 25 for a person; the rate of dying is low. At age of 750 hours for the bulb; or 75 for the person; the rate of dying; for that age of bulb/person is way higher. Also there is infant mortality for the bulbs and people; where the rate of dying for the population might be higher.<BR><BR> Both bulbs and disc drives are greatly effected by off on cycles; a large number of cycles hurts the filament and the drives heads and media. In AC bulbs; the filament actually moves abit during the massive influx of current; during cold start. Here the filament moves due the the current flow; in the first part of the ac cycle; due to the earths magnetic field. Bulbs that are slowly turned on; have little "filament rock"; and rarely fail on cold start. Cold start AC bulb Failure rates are higher when the filament is aligned so the earths field moves the filament alot. A high speed movie shows the filament moving. If rotated about 90 degrees and tilted; the filament wont move.<BR><BR> With DC; the filament moves once; and stays there somewhat. The filament diameter gets smaller in diameter; more at one end; as the bulb ages...such is engineering; one can get real deep!
Link to comment
Share on other sites

Yup, statistics and physics are a fun combination...I remember modeling failure rates during my first year at the university at a probability and statistics course, but I've happily forgotten the Poisson- and T-distributions after that... :-)

 

Anyway, as a software guy I might point out that thinking about file formats is important: standardize on a few formats and know them. Think about software evolving and how that will affect file format choices.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...