henry_s3 Posted February 28, 2015 Share Posted February 28, 2015 <p>Hello!</p><p>I'm trying to do some research into my family tree and have been sent a file of almost 2,500 images (jpg) of text. Is there a way of searching through the text for given words (in my case the name of a ship) as in any other text-based page (i.e. Command+F). Are there any particular programmes for this? I've googled "How to search for text in images" but it didn't really produce anything... Thanks in advance for your help!</p> Link to comment Share on other sites More sharing options...
Matt Laur Posted February 28, 2015 Share Posted February 28, 2015 <p>Are you trying to find text that's actually portrayed IN the image (as in, a scan of someone's writing on the image, or the ship's name captured in the photograph itself) ... or are you trying to find a file that has the ship's name in the FILE name?</p> Link to comment Share on other sites More sharing options...
Wouter Willemse Posted February 28, 2015 Share Posted February 28, 2015 <p>If it is text inside the images, the trick you look for is called <em>OCR</em> - Optical Character Recognition. Many scanners include software which can do this; software titles include OmniPage, Abbyy Finereader, but also the MS Image Editor supplied with Office can do some, or Microsoft OneNote (which can be used free). FreeOCR seems a fine (and free) choice too (never used it!).<br> The problem is, though, the vast volume, and OCR isn't infallable - complicated document structures can more easily trip it up, and hand-written text for sure.</p> Link to comment Share on other sites More sharing options...
henry_s3 Posted February 28, 2015 Author Share Posted February 28, 2015 <p>Thanks for your quick response, Matt!</p> <p>I have received just under 2,500 of photos of typed files and letters of varying quality, some of which have also had notes scribbled on them in pencil. I'm trying to find any references to particular people and ships e.g. SS Ettrick which may have been made. Here is a link to the file:<br> http://heritage.canadiana.ca/view/oocihm.lac_reel_c10327/757?r=0&s=2</p> Link to comment Share on other sites More sharing options...
JosvanEekelen Posted February 28, 2015 Share Posted February 28, 2015 <p>OCR is indeed the way to go. It's success depends on the quality of the images, if they are from a flatbed scanner you should be ok. I don't know if the smartphone industry is any good in this respect. Perhaps there is an app to convert all kind of pictures to text. Perhaps something to research.</p> Link to comment Share on other sites More sharing options...
henry_s3 Posted February 28, 2015 Author Share Posted February 28, 2015 <p>Yes, Wouter, I was thinking along those lines but the thought of doing this to 2249 (I've just checked up on the exact number) pages is more than daunting...</p> Link to comment Share on other sites More sharing options...
henry_s3 Posted February 28, 2015 Author Share Posted February 28, 2015 <p>It seems I could photograph every page with my smartphone using Evernote, Jos, and apparently it can read text but I've heard very varying reports about the results.</p> Link to comment Share on other sites More sharing options...
JosvanEekelen Posted February 28, 2015 Share Posted February 28, 2015 <p>I was referring to smartphones because these will do OCR from less than ideal pictures, skewed ones, etc. Looking at the example you provided it seems to be quite difficult pictures to recognize, with closed letters etc. On the other hand the lettertype is a serif typeface which is easier to recognize than sans serif ones and the scans seem to be from a flatbed scanner or similar. I don't have OCR software installed so I can't run a test of the file at the moment.</p> Link to comment Share on other sites More sharing options...
JDMvW Posted February 28, 2015 Share Posted February 28, 2015 <p>Adobe Acrobat Pro is another program that will do OCR, maybe better for your application than some of the more dedicated OCR programs (I <em>used</em> to use Omnipage but that's another story).<br> However, expecting <em>any</em> OCR program to pick out the occasional text in a mass/mess of <em>graphic</em> images (pictures of ships, etc.) is a hit-and-miss process at best. However, if the jpgs are images <strong>of</strong> text, then any competent OCR program will do the job.</p> <p>The human eye is probably the quickest OCR program in existence for photographs and such like, and pretty reliable too.</p> Link to comment Share on other sites More sharing options...
JosvanEekelen Posted February 28, 2015 Share Posted February 28, 2015 <p>Out of curiosity I just tried FreeOCR with the image you referred to. It didn do well, even after adjusting the image. After adjusting the lighting and contrast of the image, and upping the sharpening it only recognised a few words. OCR doesn't look very promising for the pictures in question. OTOH lots of WWII documents were scanned and analysed electronically. Perhaps pattern recognition is something to look for as well.</p> Link to comment Share on other sites More sharing options...
lex_jenkins Posted February 28, 2015 Share Posted February 28, 2015 Try a handwriting recognition app. I'm planning to try those with my tablet and PC for transcribing handwritten notes on my family snapshots. Eventually I'll add the metadata to a genealogy record my dad started. My granddad was fairly meticulous about noting names and basic info on the backs of snapshots, so I'm hoping this will ease the process. Link to comment Share on other sites More sharing options...
henry_s3 Posted February 28, 2015 Author Share Posted February 28, 2015 <p>Thanks all round for the suggestions.... Looks like I'll have to wade my way through all the pages after all...:-(</p> Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now