Images disappear from Google Image Search - Could it be due to Copyright Issues?

Discussion in 'Business of Photography' started by frank_holub, Jul 29, 2004.

  1. I'm sorry if this is the wrong forum to post this, but I wasn't sure where it should go... For the past 6 months a number of my images have had some very good placements in the Google Images search. These images were driving 200 to 400(!) page views (not just hits) per day to my site! Last week I noticed that my images had completely disappeared from the Google Images site - That's ALL of them - from the listings AND the cache! My web pages (html) are still in the same place so I know that my whole site isn't missing from Google's database. I do not have any images that are "mature" or would cause a problem with thier "Safe Search" filters so I find it kind of odd that every image dropped off. But, Here is my theory... A few months ago I embedded a copyright notice in the EXIF data of each of my images. I suspect that last week Google started filtering out all images that contained data in the EXIF copyright field... This is likely due to the fact that Google has been getting a lot of flak for caching images... For more info see... http://news.com.com/2100-1032_3-1024234.html Has anyone experienced anything similar? Frank http://www.my-spot.com
     
  2. Frank, Doing a "site:www.my-spot.com" search yields about 100 photos (300 estimated) on google's image search. I downloaded one image to check the IPTC data and it had your copyright info there. Google constantly plays around with their algorithms so it may be that your images just dropped in ranking because of some tweak. Of course, they might be removing copyrighted images and I just hit an older server in which hasn't been updated.
     
  3. I don't think google is removing images based on copyright notices in the exif information — my web logs indicate Googlebot doesn't retrieve the image files, which means that they can't use exif information to make their decision. My belief is that the image search is based primarilly on the value of the ALT label and text surrounding the <img> tag, as well the off-page ranking information that Google uses for text page searches.
    I'm going to guess that it is one of the following:
    • You changed the text in some of your web pages
    • Google updated their algorithm, causing your images to appear much lower in the rankings
    • Some link or series of links to your page were removed and/or penalized by Google as a ‘link farm’
     
  4. I wondered if any of my images (on www.bobatkins.com) were listed by google, so I did a bunch of searches on suitable keywords with the searcg restricted to my website and I was surprised to find that NONE of my images are listed! I'm well indexed by Google for the main (text) portion of my site which has a Google page rank of 5/10 and some of my pages are at or near the top of Google searches. Even a couple of my gallery pages get a 4/10 page rank, so Google knows they are there and so that fact that I could find NONE of my images was somewhat surprising. Does anyone know (or have a good guess) at what criteria Google use to index images? Do they have to be on a page with lots of relevant text. Does Google index via the "alt" tag in the image display HTML or by the filename (I notice many images don't have the search terms in the file name, so I doubt it's that).
     
  5. Bob Atkins wrote:
    Does anyone know (or have a good guess) at what criteria Google use to index images? Do they have to be on a page with lots of relevant text. Does Google index via the "alt" tag in the image display HTML or by the filename (I notice many images don't have the search terms in the file name, so I doubt it's that).
    While Google's actual algorithm is a trade secret, and I haven't seen much in the way of experiments to figure out how their image search works. I don't think that Google depends on any one source of information, such as filenames. I think that they look at several factors in page where the image is embedded or linked to, and then make a decision based on those. For example, I notice that pages which rank high in google's image search tend to have the following properties:
    • The text on the page contains the search term multiple e times and some other text.
    • The ALT and/or TITLE attribute of the <img> tag contain the search term
    • The image file name and/or directory name contains the search term
    • The page which the image is on has links from outside the site pointing to it
    A good example of this is when you search for site:bobatkins.com chart. This brings up http://bobatkins.com/photography/technical/testing2.html, a page which contains a lot of text, repeated references to the word ‘chart’ a link to a file named ‘chart.gif’
    In a similar vein, searching for great egret brings up http://www.naturegraphics.net/animals01.htm, which contains lots of text, the words ‘great’ and ‘egret’ several times, and contains an <img> tag with the title containing ‘great egret’ and points to a file with ‘great egret’ in its name. In addition, there are several easy to find links to the page containing the image — you can get there from the home page in two clicks without needing to wade through a database.
    In contrast, a search for site:bobatkins.com polar bear turns up nothing. The obvious page http://bobatkins.com/photography/images/lores/slides/06_IMG_0021.html contains only one picture of a polar bear. Neither the file name, nor the ALT and TITLE attributes of the <img> tag contain the words ‘polar bear’. There is little text on the page, and most of what there is happens to be automatically generated. The actual words ‘polar bear’ are some of the least visible words on the page; they're in a small hard to see font, instead of in something like an <h1> tag or the title. Finally, I don't think that there is anybody in the world who is linking to your web polar bear page except you, and your link doesn't contain any descriptive text.
    A few other examples of searches which turn up images on your site:
    • Notice that these are all pages with text talking about the search term.
     
  6. David - many thanks for your comments here. They all make perfect sense of course. I'm fairly familar with optimizing text pages for good Google placement and I've done that for both photo.net articles and articles on my own website. However I've pretty much ignored the same rules when it comes to images. Most of the images do have descriptive tags, but like you say, they're not prominant on the page, they rarely appear in the "alt" tags and they don't usually appear in the file names. Looks like if I want to be indexed, I'm going to have to restructure the gallery section of my site - which is BADLY in need of restructuring anyway since it "just grew" out of several different experiments and has no overall theme or consistancy. They're not even a very representative sample of what's in my files. Clearly change is needed. Google is indeed a mystery at times. They shift around their ranking algorithms with no warning, leaving webmasters scratching their heads as to why pages can shift ranking positions overnight. There seems to be an industry selling "Google ranking secrets", but mostly it looks like snake oil. So thanks again for putting me on the right track. Time to get down to creating a "Google Friendly" gallery section!
     
  7. I've been surprised how some of my pictures are indexed by the Google image search.
    This one has dropped a few spots in the standings since I first discovered it. Click on the thumbnail for a larger version:
    [​IMG]
    It used to be #1. It's now #6. Go to http://images.google.com and type: most danger. No quotes. Or go here: http://tinyurl.com/4k2k2
    I'm guessing images are starting to be indexed in part through this project: http://www.espgame.org/. I have never linked to the above image using anything resembling "most danger" so I'm at a loss otherwise. :)
     
  8. I had noticed that my site was getting abundant hits from google searches, but you know, I have never had on sale that resulted from those hits. I suspected that they were just 'people' looking for images, for whatever reason. I did a google search about 3 weeks ago and found that most of my images were there and, that googled had cached them. I didn't like that idea, particularly since there was COPYRIGHT written everywhere on the pages and images, or the fact that the images were being used, without sales, and could be potentially copied. So I did an investigation on the google site and found that there are html codes that you can place on your web pages that will tell the google and other search robots not to copy or cache your images. I would rather visitors to my website come through the front door rather than a window and browse around. The front door of my site explains copyright. Based on sales, someone that is looking for images to just copy will use whatever means while if someone is looking to buy, it doesn't matter that they use the front door. BTW, my webcompany is raising the rent in a month so I may not be there after August, and will find a new host. Marc Epstein www.marcepstein.homestead.com
     
  9. I have had a similar problem with Google recently. A week ago several of my image pages (big images with H1 titles, long descriptions, proper ALT tags and keywords for my own site search function) were still appearing near the top of Google searches. A search on "photo of atlanta skyline", for instance, had placed my Atlanta Skyline at Night photo at #1 for months, maybe even the past year or more. This weekend I noticed a sharp drop in traffic to my stock photography site and decided to check Google keyword searches and sure enough, ALL of my big image pages have been dropped from Google's index! A week ago I had virtually every one of the 2-3K pages indexed, and now they've all been dropped. In fact, when I try "site:slrobertson.com" and then "repeat the search with the omitted results included", Google reports 2,370 pages indexed, but will only list 154 of them. I haven't made any changes to most of my site in a long while and many pages have been indexed with decent PageRank for months. It's most frustrating, to say the least.
    Just for fun, I checked another site with similar content and structure: danheller.com. I swear Dan used to have most of his big image pages indexed as well, but I can't seem to turn one of them up today. I'm hoping that this is just a transient problem with Google's index and cache.
     
  10. Marc Epstein wrote:
    I had noticed that my site was getting abundant hits from google searches, but you know, I have never had on sale that resulted from those hits. I suspected that they were just 'people' looking for images, for whatever reason.
    I wonder if the lack of sales is simply because your visitors don't realize that you sell prints. The pages where search-driven visitors land doesn't say anything about being able to buy a print, so adding that information might result in sales. Copywriting is an art, and one which you need to carefully hone by making modest changes to your web pages and tracking the corresponding impact on your conversion rate.
    Marc Epstein later wrote:
    I would rather visitors to my website come through the front door rather than a window and browse around.
    People can and will bookmark your pages and forward links around. You could theoretically enforce something like this using cookies, but it would be a fair bit of work. My intuition is that you're probably better off making it easy for potential customers to buy images directly from the page they land on. An alterative, which could work if you really are getting enormous numbers of users, would be to sell advertising.
    Scott Robertson wrote:
    A week ago several of my image pages (big images with H1 titles, long descriptions, proper ALT tags and keywords for my own site search function) were still appearing near the top of Google searches.
    You still are appearing near the top of Yahoo image searches for many of your chosen keywords. I suspect that the Google ranking change is because that engine now prefers a larger amount of natural prose than it used to. Unfortunately the folks at Google have a huge incentive to keep tweaking their algorithm: it forces businesses to run a pay–per–click advertising campaign to bring in traffic.
     
  11. David S Wrote:
    You still are appearing near the top of Yahoo image searches for many of your chosen keywords. I suspect that the Google ranking change is because that engine now prefers a larger amount of natural prose than it used to. Unfortunately the folks at Google have a huge incentive to keep tweaking their algorithm: it forces businesses to run a pay?per?click advertising campaign to bring in traffic.
    I really hope that's not what is going on. If that were the case, I would have expected my photo pages to simply get buried in the search results for various keywords, but it seems Google has dropped all of those pages entirely from the index. Other pages on my site, like thumbnail gallery pages which don't really have much natural prose either, have not suffered any reduction in PageRank. Curiously, both my photo library site and Dan Heller's have had every single page under the "images" folder removed from the index. At least, that is my reading of the results of the following searches on Google:

    allinurl: images site:www.slrobertson.com = zero pages found
    allinurl: images site:www.danheller.com = zero pages found

    I still hope to be re-indexed on the next deep crawl of my site.
     
  12. Scott Robertson wrote:
    I really hope that's not what is going on.
    I've got one more guess about what is happening to your site: maybe Google is noticing that much of the text on your pages is nearly identical, due to similar formatting HTML and the standard header and footer, and therefore ignoring images on your pages. If I do a non-image search for site:slrobertson.com I get only a couple pages. If I tell Google to not filter very similar pages, I get thousands of pages.
    Could Google be applying some sort of similar filter for the removal of images on pages with similar text?
     
  13. To disprove the last speculations, note that all the image pages of terragalleria.com are indexed by Google. A search with a string like "site:terragalleria.com" will only return a number of results capped at 1,000, but this does not mean that the other pages are not indexed.
     
  14. I have 2988 images that could be indexed by Google. Only about 300 of the images are indexed when I do a site:my-spot.com on Google Images and the results shown are capped at about 100 images. After Google dismissed my copyright theory by telling me in an email, "Google's Image index doesn't exclude content based on embedded copyrights." I started to think maybe that due to some server problems I was having that Google dropped me into the dead link bucket but then I noticed that about 17 of the top 20 images under 'rose' were all new! I was in the top 20 (1st page) for 'rose' before this. So... Now, a week later, what I really think is that big G just "flushed" the Images database so some fresh photos would come up... Frank http://www.my-spot.com
     
  15. David - Thanks for the feedback about my site. Good comments. But I wonder, if an individual was looking to buy a print or a company was looking for stock images, and they found something that they really liked using Google search, you would think that they would push the "Back" or "Home" button to find the source, eh? I would. Although the purchase option is not included on every one of my pages, they are on some (but your comment is well taken); the Home or Back buttons are there that will take you to a purchase page. I noticed through tracker that certain images were getting hits from Google but I never got a query about them. I am curious about this now, are many of us getting sales through a random Google search for subjects? Is this a big part of yours or others sales? Marc
     
  16. My visits recently dropped by almost a 1,000 per day, and I now realize why; all my images have been removed from Google's image search results as well. I know I am late to the discussion here, but I wonder if anyone has learned or heard anything new on this issue? A quick check shows that Dan Heller's and Scott Robertson's images are still missing, just like mine. Ironic to note that I found this thread by searching Google for "photos missing google image search"
     
  17. The one thing I noticed with all three sites dropped by Google (mine, Dan's and Scott's) is that we all have a sentence at the bottom of every page saying something about the photos being protected by copyright. We just added that line to our pages about a month or so ago, and I wonder if Google drops any photos on pages that it finds with such language. I'm removing the sentence for now to see if it will make a difference.
     
  18. Google actually wrote an email to me. I have copied in here... Hi Frank, Thank you for your note. Google's Image index doesn't exclude content based on embedded copyrights. Google follows standard web protocol in not crawling sites with specific instructions in their robots.txt files. We also do not crawl pages that use meta tags to restrict access by robots. For more information on both of these features, please refer to http://www.google.com/webmasters/3.html#B3 At present, we have no process for manually adding images to our results. We're working continuously to crawl more images to increase the quality and quantity of images in our index. If your images appear on a publicly available webpage, it's possible that we'll add them in the near future. Regards, The Google Team
     
  19. Well, It's been over two weeks and none of my image pages or images have been put back in Google's index. Traffic to my site referred by Google is down more than 50%, but interestingly, traffic from Yahoo is up by at least that amount over the same period of time. In fact, Yahoo traffic has almost made up for the drop in Google traffic these past couple weeks. I have also noticed that Googlebot has crawled literally hundreds of my image pages during this period, but none of them have appeared in the live index. This makes me wonder if my pages are now triggering some new algorithm or filter, like a duplicate content filter - most image pages are substantially the same except for page title and image title. I have a lot of duplicate links (navigation links) on these pages, for instance, but one would think Googlebot is smart enough to figure out that navigation links shouldn't be considered duplicate content (i.e. spam), else most of the web would be penalized. Oh well, I'm still hoping to be picked back up eventually.
    Scott L. Robertson Photography
     
  20. Well, for those still listening on this thread, I have an update about my site's Google index issue. As I said in a previous post, Googlebot has been furiously spidering my site, including my large image pages, over the past few weeks, but until the last couple days, none of those image pages have been re-inserted into the index nor given any PageRank.
    Last week I experimented with one gallery, my Atlanta, Georgia stock photo gallery, primarily by renaming the slideshow photo pages from URLs like /images/usa/georgia/atlanta/slideshow/photo1.htm to URLs like /usa/georgia/photo.usga0001.htm. Every page under the /images directory had been dropped from the index, so this test was intended to see if shortening the URL path and removing the "images" directory name might make a difference. I didn't change the content of the pages other than to remove the prefix "Stock photo of" from the page "title" tags. I thought perhaps the repeated phrase might appear to be spam, though I suspect such page titles are common and not always used to spam the Google index.
    Anyway, this evening I found that the first three image pages of this slideshow sequence, beginning with this photo of the Atlanta skyline at night, have been put back in the index. They all have "Atlanta" in the page title and in the page text, as does the parent gallery page. If the remaining pages don't eventually appear as well, it looks like I may have to give up on Google for now or re-engineer my pages, though it may be futile to keep doing this every time there is an algorithm change with a popular search engine.
    Scott L. Robertson Photography/slrobertson.com
     
  21. Scott, thanks for sharing your thoughts and experiment. Now it is my understanding in an attempt to keep up with constantly changing web blogs, that Google now spiders many/most websites far more often then it did in the past looking for changes, but makes visits that effect or establish page ranking far less often. I can not find the site where I was reading about these changes, but it matches my experience; my new pages show up in search results, but they don?t have a page rank yet. Scott, despite the changes you made to your Atlanta page, it does not appear any of your images (including the Atlanta ones) have been added to the image search portion of Google - maybe I misunderstood your post, or were they in there briefly.
     
  22. Ron: At one time I had a thousand or more images from my site indexed by Google's image search engine (images.google.com) but all were dropped around the time when all of my "slideshow" pages were dropped from the text index. My recent experiment - shortening the URL of my Atlanta, GA photo gallery slideshow pages - has proved acceptable to Google. All of the image pages in this one slideshow are showing up in SERPS and are cached in the index. I'm also back at the #1 position for a search on 'photo atlanta skyline'. It would be a lot of work to transform my entire site to this new URL scheme, so I'm going to wait a while to see if Google doesn't just pick all of my slideshow pages back up eventually. Who knows, maybe I'd get dropped again after making such a change site-wide!
    Scott Robertson
    www.slrobertson.com
     
  23. I understand Google updated its image index about a week ago - I did some quick checking and see that not only myself, but Dan Heller, Scott Robertson and Frank Holub all have images now showing in the results.
    Ron
    Niebrugge Images - Stock Photos
     

Share This Page