Jump to content

Merging multiple hard drives and deleting duplicates


Recommended Posts

<p>Hi all,<br>

I have multiple hard drives that I have accumulated over time and there are multiple duplicated photos in all of them.I've managed to put every file in a 4Tb hard drive, manually deleting a great number of obvious duplicated photos.<br>

Now, here is my plan to get rid off all of the other 'not-so-obvious' duplicates, e.g., file with different file names. I'll start a brand new Lightroom 5 catalog, using both the option to copy the files to a new location and the option to not import suspected duplicates. In theory, at least in my head, this will go through all the files I compiled in the 4Tb drive and copy them to the new location while skipping duplicated ones.<br>

Does this make sense to anyone else than me? Any other suggestions on how to accomplish this?<br>

<br />Thanks,<br>

Simon</p>

Link to comment
Share on other sites

<p>Hi Chas, <br>

I actually did a test with a few photos, i.e., duplicate photos with different names in different folders in my computer, and when I tried to import the duplicated version it was "greyed out" and I wasn't able to select it. Do you not see this? <br>

I though that LR looked at various types of data (file name, time stamp, and other Exif and other metadata) to compare files.</p>

<p> </p>

Link to comment
Share on other sites

<p>There are many "Duplicate File Finder" programs" available. Digital pictures are simply digital files. Of course, change one byte in the file in an edit and it is no longer a true duplicate.</p>

<p>Here are links to two programs (I have never used either of them):<br>

http://www.pcworld.com/article/2013264/review-auslogics-duplicate-file-finder-frees-up-hard-drive-space-quickly.html</p>

<p>http://www.pcworld.com/article/2025412/review-ashisoft-duplicate-finder-can-get-rid-of-duplicate-files-if-you-help-it.html</p>

<p>Before using any program to remove files en mass, be sure you have <strong>very, very good backups</strong> - two or more logical backups and at least one system image backup. Remember, "Never go nowhere you can't get back from no how."</p>

 

Link to comment
Share on other sites

<p>One problem with deleting duplicate names that I have is that during my change of Canon cameras they have always used the same file format i.e. xxx_IMG.jpg. This means that unless I have changed the file name (which I often do) there may be several files with the same name (albeit taken with different dates etc.). I have now discovered that my iPad also uses this xxx_IMG.jpg format!<br>

All this means is that if you use an automatic search for similar file names and auto delete them you could loose a lot of files.</p>

Link to comment
Share on other sites

<p>Thanks for the warning Jeff! Definitively don't want that!<br>

However, don't these type of programs (including Lightroom) look for many different types of information to identify duplicates, e.g., file name, date, camera, shutter speed, etc?</p>

Link to comment
Share on other sites

<p>I have successfully used a freeware program called VisiPics that uses an algorithm to find duplicates based on the actual images themselves, not filenames or EXIF data. It can "match" images that are edited slightly or resized, as well as images that may not be duplicates but are judged to be similar. Using it on 4 TB worth of images would be a tedious job though!</p>

 

<p>I would suggest first slimming down your duplicates by using a program to rename all your images according to the date/time the photo was taken (from the EXIF data). Don't let the program automatically delete "duplicates" at this stage - that would be something you would do with some kind of manual intervention. At least, I would.</p>

Link to comment
Share on other sites

<p>In the past I have used ThumbsPlus to find similar images. It does a good job and doesn't require them to be an exact match. In fact shots taken at the same shoot are some times (rightly) identified as similar. You can control the amount of similarity required.</p>
Link to comment
Share on other sites

<p>Thanks for the input Colin. Unfortunately, I'm using a Mac and VisiPics only runs on Windows. I'll do some searching around for a similar program for Mac.<br>

After I trimmed my collection from obvious duplicates I end up with 'only' 1Tb of images. I'll do some more testing around with dummy sets using LR and other software and report back!</p>

Link to comment
Share on other sites

<p>About a month ago I stumbled across a Lightroom add-on doodad that supposedly would seek and find duplicates from within the Lightroom environment. It seemed to work okay, although I tried only the free trial version which had very limited functionality. Might be worth Googling for if it sounds like it might do the job for you.</p>
Link to comment
Share on other sites

There is software named "rsync" [0,1,2,3] that can sync, among other ways, based on checksum of files[4] as the case here. There should be a port for Apple OS X[5] & MS Windows[5]; one does exists for Unix-like systems[1].

 

 

[0] The Source: https://rsync.samba.org/

 

[1] Multiple ways to download: https://rsync.samba.org/download.html

 

[2] A tutorial: http://everythinglinux.org/rsync/

 

[3] How it works: https://rsync.samba.org/how-rsync-works.html

 

[4] Option "--checksum" in manual page: https://rsync.samba.org/ftp/rsync/rsync.html

 

[5] Resources for, among other things, running on Apple OS X & MS Windows: https://rsync.samba.org/resources.html

Link to comment
Share on other sites

<p>Thank you parv.! This is turning out to be a more complicated than I though it would be.<br>

<br />Last night I did an experiment: I created a directory with a couple subdirectories within it. I place the same image in all of them, as well as that same image with a different name, a a copy where I edited the image, i.e., original file, original file with different names, duplicated original files, edited files. Lightroom was only able to identify the files with the same name but not the edited ones or some of the ones with different names... <br>

I then did the same experiment using a nice piece of free software that I found online, dupeGuru http://www.hardcoded.net/dupeguru/, and it was able to find all copies of the file incuding the ones with different names, I'm guessing it's doing checksum. It did of course not find the edited version of the file since this file is effectively not a duplicate anymore.<br>

I think that I will have to do this slowly and on a directory to directory basis, using dupeGuru or other piece of software that does some sort of checksum, and not in a giant batch mode. Probably better and safer this way anyways. <br>

<br />Thank you all for the multiple suggestions!<br>

<br />Cheers,<br>

Simon</p>

Link to comment
Share on other sites

<p>Thanks for that reference, Simon. I'm going through my laptop's photo files folder by folder to eliminate duplicates while trying to avoid deleting similar but not identical photos. I somehow screwed up when I setup LR on this laptop and didn't realize until the hard drive had filled up unusually quickly that imported photos were not only being duplicated but also the raw files and JPEGs were being sorted into separate folders with apparently identical dates. Fortunately that mess only covers one year of photos.</p>
Link to comment
Share on other sites

  • 3 years later...
Why not try Duplicate Files Deleter. It will do a thorough search of your hard disk and find out the two or more duplicate files of the same file which may be stored at different locations. This will give you a comprehensive list of all those files and you can decide for yourself what you want to do with them.
Link to comment
Share on other sites

There is a risk using software to delete duplicates, that files with the same names will be deleted even if they're different. I keep the original image names intact, but they roll over after only 10,000 or so. Mine are distinguished by placing them in named directories. Cleanup software may not recognize the directory name as part of the unique identifier. Just as bad, you may be asked to decide for each duplicate. My conclusion is, they're a waste of time and pose a considerable risk. Now software to synchronize two drives or directories can be very useful.
Link to comment
Share on other sites

There is a risk using software to delete duplicates, that files with the same names will be deleted even if they're different. I keep the original image names intact, but they roll over after only 10,000 or so. Mine are distinguished by placing them in named directories. Cleanup software may not recognize the directory name as part of the unique identifier. Just as bad, you may be asked to decide for each duplicate. My conclusion is, they're a waste of time and pose a considerable risk. Now software to synchronize two drives or directories can be very useful.

 

I would depend upon how the programs determine the files are duplicates. Most use some sort of check sum on the file contents. If the program used the SHA-256 hash of the file to define duplicates, I would have a very, very high confidence that the files were, indeed, duplicates of one another.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...