Jump to content

Long term hard drive storage and data integrity


Recommended Posts

Reading over this thread http://www.photo.net/bboard/q-and-a-fetch-msg?msg_id=00NQRF

Edward pointed something out that makes sense, but it's not something I'd

previously considered. Since hard drives are magnetic media, the information on

them will slowly degrade as the written magnetic fields weaken if the drive is

sitting around collecting dust in a closet.

 

So my question is, how would one go about "refreshing" a drive that isn't in

continuous use? A defrag might work once, but only on some of the files. Would

a virus-scan be sufficient? Or running chkdsk? Or does the information

actually have to be written again in order to bring it back to 100% strength

which would mean perhaps copying the contents of the entire drive to another

drive and then back again, or at least copying the entire contents of the drive

to a new drive (and leaving it there) then formatting the original and using it

to start fresh with new data.

 

I never really considered it an issue before, and I'm not sure how much of an

issue it really is, but it does have me wondering. I think that maybe the rate

at which faster/larger hard drives are introduced is faster than the rate at

which stored data degrades, and it's easy enough every few years to simply back

up the contents of 2 or 3 HD's to one newer/faster/larger model, but I am

curious now. How would one "refresh" an old hard drive that's been sitting

around for years?

Link to comment
Share on other sites

I'm more concerned with the sheer amount of crap that's in the closet, basement, garage, rented storage space, etc. -- I know at best I've only got two decades to do something about it :-)

 

The mere contemplation of this deplorable and steadily worsening state might lead me to sporadic, non-responsive, but eminently doable actions, like grabbing all my old hard drives, copying them to new ones, scrubbing the old ones, and copying the contents back from the new ones.

 

I emphasize the subjunctive mode in "might". Usually I just grab a good bottle of wine, which is one of the collectible problems I CAN manage, and vow to return to the issue when I'm in a better frame of mind.

Link to comment
Share on other sites

PS: Actually, I suppose that unless I had begun managing my wine collection prematurely, I would realize after copying and conjoining all my old data from multiple, creaking drives to new, half-sized drives, that my chore was done and that it would be time to move on to whacking down the wine collection.
Link to comment
Share on other sites

You wouldn't want the drive to just be sitting for years... it contains mechanical parts that ought to move now and then.

 

If refreshing is what you want (I don't know whether your concern is justified, so let's assume it is), then I would do this:

 

1. Run my ImageVerifier app (ImageIngester.com) on the drive to accumulate hashes for each image. (They're saved in a database.)

 

2. Copy the entire drive to another one.

 

3. Rerun ImageVerifier to verify the hashes.

Link to comment
Share on other sites

Frank,

 

I had originally written a much longer response, that I felt was too technical in nature, so I'm going to try and summarize my thoughts better here.

 

1) "Bit rot" (the degradation of magnetic media over time) is not directly related to whether or not the media is in-use, in fact, there isn't really much reliable research onto the actual causes of bit-rot. (There is plenty of research, but this is kind of a 'voodoo' subject.)

 

2) The powering up and down of hard drives is the number one cause of hard drive failure. Hard drives are mechanical devices that require lubrication, lubrication break-down and mechanical seizure can occur after long periods without being powered up. There's no particular advantage to leaving hard drives idle for long periods of time except to save power, and there are lots of dis-advantages. (Research the pros and cons of MAID - Massive Arrays of Idle Disks.)

 

3) While DVD media can be excellent backup material, caveat emptor. Not all DVDs are created the same, and cheap ones will degrade long before a cheap hard drive. (I've had them become unreadable in less than 90 days.) See this paper on helpful criteria for selecting your DVD media: http://unesdoc.unesco.org/images/0014/001477/147782E.pdf

 

 

4) Consider that you're thinking about a well-researched topic that has a LOT of background. Look outside of the photo realm for serious digital archive studies, consider the following, especially sections 3 and 4.4: http://www.dlib.org/dlib/november05/rosenthal/11rosenthal.html

 

5) "Copying" one drive to another does nothing to assure data integrity on its own - it only propagates errors through the archive. Similarly "simply accessing" the drive does not in any way check for errors. See point 6 --

 

6) While software packages like ImageVerifier above provide some level of protection, they're really just ways to avoid implementing well-known, researched, and documented strategies for managing data on hard drives. A simple RAID setup goes a long way - chosing the right one is another matter. If you're at all technically minded, consider using ZFS which provides proven protection against "bit-rot" as well as large-scale failure. http://fortuitous.com/docs/primers/ZFS_Primer.pdf (Once the NetApp vs. Sun issue gets hammered out in the courts, expect ZFS to be available in pre-bundled NAS offerings.)

 

7) As pointed out in the section 4.4 of the paper I listed above homogenity is the killer for any archive - same location and/or same technology can destroy your archive. Keep off-site copies in case your house burns down, don't rely on just DVDs (which could be perfectly obsolete in five years) or just hard drives - mix and match all the variables you can. Duplicate, duplicate, duplicate.

 

8) Maybe digital negatives aren't so bad? While the plastic material they use probably isn't as long-lifed as some of the cotton rag papers - I'd imagine you can get 60+ years out of a digital negative properly printed and stored. (Digital negative in the sense that you print a negative image on clear/slightly opaque material.) You could always re-scan later, or just contact print.

 

Personally, I use cheap USB hard drives for a home archive (no RAID), and backup my files off-site, digitally. I have physical access to the data center where my files are stored, and they are managed via ZFS, so I don't worry much about the home archive going away. I eventually plan to move to digital negatives to avoid media obsolescence.

 

!c (Realizing that it's still a pretty long response =)

Link to comment
Share on other sites

"... back up the contents of 2 or 3 HD's ... curious now. How would one "refresh" an old hard drive that's been sitting around for years?"

 

Plug both drives into the computer, then just drag and drop. Don't worry about the viability of an offline drive that's been sitting for a couple of years. It'll spin up and be just fine.

 

Don't worry about data integrity. The probably of an uncorrectable error is very low; the probably of an undetected error is effectively nil.

 

What you do want to do is to keep multiple, geographically dispersed copies. Also sure to migrate to the new media type of the day every few years (just try to get your files off of an 8in floppy now.)

 

Lastly, print the really important images to archival paper using a pigment based inkjet. Keep the album in cool, dry, dark storage.

Link to comment
Share on other sites

C. A. Church:

 

Sounds like you know exactly what you're talking about. I'm going to bookmark your reply so I can post it the next time this question comes up. Therefore I unofficially dub thee keeper of the Photo.net data integrity issue, with all privileges (of which none are known) and obligations (of which we currently have a large, overflowing bit bucket) thereunto.

 

I realize that writing flippantly is dangerous because people naturally read it the same way, but in your otherwise meticulous response, you missed the implications of one of the points under your #5 above, namely, that copying old data to new media serves two purposes:

 

1. Freezes the "bit rot" state of the old media and trades it for the presumably more reliable "bit rot" rate of the newer media.

 

2. Physically moves your data from a presumably "obsoletizing" medium to one with a currency of hopefully at least a half-decade.

 

Although there is a touch of affective hubris in worrying about one's photographic memorabilia, it is worth considering how to preserve some of it for succeeding generations of your family, who may wonder about where they came from -- assuming there are any such survivors :-)

Link to comment
Share on other sites

Bruce,

 

Hehe, well, I understand exactly what I talk about when I talk about it - but not necessarily all of the associated things un-spoken. *grin* I'm hesitant to accept the privileges you have offered for I fear I may not be able to keep the obligations, and there are others who likely desire it more than I. =) That being said, thank you - and to respond...

 

You are very much correct on points 1 and 2, and not to put too fine of a lens (hehe) on it - but we all live with an unknown level of "bit" errors every day. You may have thousands, or none, and never know it. (Is that bad pixel noise in your image, or bit rot? Who knows? =)

 

I agree with the hubris -- all the talk I see of "permanent archives for our photos" has made me consider writing an article or something to the end of "acceptable permanence". For me, "acceptable permanence" is having the original, or whatever, on-hand long enough to profit and/or enjoy - in the vast majority of cases, it's a few years. After I'm dead - if the image is worth it to society, posterity will take care of it for me. (Did Rembrandt do light-fastness testing on his paint?)

 

Here's a thought to consider for DVD archives: archive on DVD for ten years, then - how long does it take to find, retrieve, and make useful image #135 from year 3, version C? (A version which was actually produced in Year 4.) Managing an archive is far more than storage -- the better part of which is finding and retrieving what you want, when you want. One would do best to figure that out before they go buying any media!

 

!c

Link to comment
Share on other sites

!c,

 

We think so much alike that we'd have a dull night doing a pub crawl unless we ran into some interesting people :-)

 

I think your archive management comment is spot-on. I don't have much trouble messing around with images in PS, but keeping track of my huge but not particularly interesting collection is driving me nuts in PSE. I keep fantasizing about all sorts of nifty database types of things to make it easier.

Link to comment
Share on other sites

Maybe folks are calling the old 1950's/1960's *print thru* ( for magnetic recording tape) now bit rot?. With tape the signal can pass to the next layer with time, high temperature and pressure; ie if the tape is wound too tight. The tape also physically has its binder and lubricants age too. Thus the 3M tape company recommended folks keep tape a max of 10 years; and re-record to a new tape every 7 years. This recomendation is over 40 years old; but was/is for tape; where it has neighbering tape layers to be effected by; to get stuck to mechanically.<BR><BR>Date *bit rot* on a hard disc drive is often nil. But there can be *pole tip corrosion* of the read/write heads. This corrosion is typically small or nil. But I have seen cases where a huge batch of head/sliders got into the field in drives that had built in contaminates that ate away at the pole tips of the read write gaps and all hell broke loose. One looses the ability to read and write; the PW50 widens, one cannot write a narrow data pulse as when the disc drive was first built. Slowly the drive does more and more retries then clusters tend to get lost; then large sections; without the HDA crashing. At one disc drive company I worked at we had a massive recall due to bad heads. <BR><BR>With old non fluid spindle bearings they died after many years. Today most all use a fluid bearing; but they hope the lube batch is great.
Link to comment
Share on other sites

Kelly, I am not referring (specifically) to "pole tip corrosion", and not (specifically) the "de-magnetising of an HDD platter" - but to any situation where the data you write is not what you read. A quick survey around the room I'm currently in (with several 'storage experts' [we deal with terbaytes of data incoming per day, and have to store and provide a full chain of authority and validation of it for years; i.e. regulatory compliance data and evidence for future legal cases]) gives me several different "personal favorite causes" of bit-rot. Which leads me back to my original statement: there can be many causes. However, I do appreciate your addition to the list =)

 

Bruce, I doubt it would be boring, at least the first few times, hehe. Surprisingly, I can write database software all day (and do, actually...) but prefer physical manifests for my work - mostly because I like theact of flipping through the manifests. Interestingly, on ovation tv's recent multi-episode series on photographers, they were interviewing one famous portrait photographer (whose name escapes me). During the interview he pulls up to a computer and says "here's a database of every shot I've ever taken. Let's look them up by name... Ok Reagan, here he is ..." Then proceeds to show a small image on the screen, and walks to the box it pointed him to and pulled out the photo.

 

That's good archive management, for him at least.

 

!c

Link to comment
Share on other sites

Hi

 

perhaps its 'expensive' but my mate has 2 raid arrays set up for the data storage for his business. Each is a raid 5 system made up of a 500gig drives. He runs them "mirrored".

 

As this becomes 'old' he upgrades them to bigger platters (soon 500gig will be small). By doing it more than less frequently he also ebay's off his 'smaller' drives and recoups some of the costs.

 

CA's reference to ZFS is a good one, and perhaps a poor mans version of this could be found in Mac's (have they done it yet?) proposed 'database' file system. Databases offer speedier retrieval than a file system and less fragmentation, dynamic sized ones can even shift themselves a fair bit.

 

On the subject of bit rot, one sector that I don't think gets written to often (and read a lot) will be the boot sector of your computer. Even the slightest level of corruption in there will be evident. So, unless I'm missing something boot sector failure rates would be a good indicator of 'bit rot rates'

 

:-)

Link to comment
Share on other sites

Ohh

 

Copying files from one place to another (like when you get a bigger storage system) need not introduce losses. Losses can be perhaps identified and corrected for with an appropriate algorithm ECC has been around for many years, if you don't mind the overhead I think its applicable to your storage. Mirror's and CRC's will also make statistical likelihood of loss low.

Link to comment
Share on other sites

C.A.; one can also have the non repeatable runout of the disc drives spindle increase with time and thus the positioner arms servo has losses some margin in following the tracks.<BR><BR> Certain gases too tend to challenge the discs surface and sliders/heads too. One can also just have a dumb simple capacitor fail on the main board; a fatigue failure of solder joints, micro hairs growing inside chips, have a bad batch of head preamp ic's, have a duffus open up the case. <BR><BR>In one Raid system I saw in Florida years ago the building got hit by lightning and the IT's remote modem allowed the entire set of array of discs to get mostly goofed up. Post Mortem on the units had arcing on some units on the spindle bearings balls; in their ball bearings, arcing on heads and disc platters. Here one sees some l 'storage experts' cry because they assume. <BR><BR>Another decades old rare failure is where the Magnetic B-H loop on the discs gets *harder because of heat, chemical impurites in the coating, or corrosion where the overcoat was too thin; or got worn off in startup and landing cycles. The IEEE Magnetic society paper writers tend to be abit in a bubble compared to the working at a disc driver maker and actually dealing with customers drives. <BR><BR>The process of writting and reading is different in mag recording. Writing is easier than reading somewhat. Reading requires a narrow gap; writting is abit just with the edges. Thus one might still be able to write to a platter in agreat way; but readback is wonkyier as the gap widens, corrodes, gets clogged/shorted. Read margins can drop with time and the average Joe doesnt know it. *data you write is not what you read* is ancient; probbaly as old as man.<BR><BR> Sometimes in HDA's the gurus data checking/verify software can have a bug too; or its not that robust. I worked in the Magnetic and optical disc drive industry for a few decades; often wierd failures crop up that are obscure. <BR><BR>Once we had Marketing swap out and plaee a clear top on a drive for Comdex and crash; they did this in the Vegas Hotel!
Link to comment
Share on other sites

Chris, as I understand, Leopard supports ZFS. http://themachackers.com/2006/12/19/zfs-on-mac-os-x-105-a-closer-look/

 

(Could be wrong though, as I don't use a Mac myself.) As I aluded to earlier, I would expect to see a LOT more commercial ZFS implementations once Sun and NetApp decide to get over their patent issues. (And some of them being much easier to use!)

 

Also - I might have put that across wrong, but I meant to say that copying propagates errors, not introduces them.

 

Kelly, you definitely understand what you're talking about. I wouldn't want anyone to assume I was disagreeing with you - just not capable of explaining every possible cause, and I'm not sure that we need to. Being aware of the possibility of corrupted data and knowing how to account for it and deal with it is about the maximum most of us need to know. =)

 

!c

Link to comment
Share on other sites

CA

 

sorry ... my hasty wording, not yours.

 

I was meaning that any half decent copy process should look at the original for confirmation at the end. I was meaning that for what ever reason ("bit rot" say) that there is no way to for the process to tell if the bits it is coping are what were written there to begin with. Reed-Solomon (as in CD) will be able to cope with minor losses (by correcting them) and flag irreparable losses.

 

:-)

Link to comment
Share on other sites

As Kelly points out, no single system is proof.

 

I have a mate in Canada loose all his 8x10 glass plates (and a collection bequeathed to him by his teacher from the very early 1900's) in a house fire.

 

by taking backup up and data seriously this can be avoided with digital, if not totally we can at least reduce the risks. I'm sure many photographers carefully archive images for years and still not have the clients order many from 10 years ago. So if its commercial there has to be a cost turning point.

 

I mentioned RAID arrays as they're much larger in capacity, and at least safer than a single hard disk. I'm already reaching 100Gig with my images, and I don't even store scans of 4x5. I imagine for working professionals using RAW it can quickly add up.

 

Its true that film is very slow in READ times (sort of like a WORM drive) but in conjunction with a good error correcting reader (like a Coolscan 9000) becomes a very high density WORM DRIVE. Best yet, we pay for it incrementally.

 

Now, if a huge asteroid strikes the planet perhaps we'll have other things to worry about than our "off site backups"

 

:-)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...