“Search user uploaded photos” feature not using an up-to-date index?

Discussion in 'Photo.net Site Help' started by duolian, Jun 27, 2002.

  1. The “Search user uploaded photos” feature is a potentially useful one, but I think there is a problem with it. That feature appears not to respond to changes made by users in the information attached to their images.
    I know this because I have some photos that I have changed the caption on, using the “Edit Image Info” feature, to correct misspellings or to add or delete information. When I use the “Search user uploaded photos” to search for these photos using terms that were added to the caption after the photo was first uploaded, it does not “hit” them. However, if I search using terms that were part of the caption but were subsequently removed from it, it does “hit” them. I get these results even when the changes to the captions were made many months ago.
    I assume that like most search engines, the “Search user uploaded photos” feature is not actually searching in the database per se, but in an index created from the database. With most search engines, that index is periodically rebuilt, so that new contents get included and old contents that have been deleted get removed. What the results described above tell me is that either the index for the “Search user uploaded photos” feature is not being rebuilt properly, or else it is set up so that is never rebuilt, but only uses the information that was attached to the photo when it was uploaded.
    The “Search user uploaded photos” feature is a particularly important one to have available at photo.net, because there is no substitute for it. I generally don’t use the main photo.net search engine because it’s clunky and inflexible and it is much easier and more productive to use Google and set it to return only pages from the www.photo.net domain. However, Google only indexes static content pages at photo.net, so it does not get any of the uploaded photos and the information (caption, etc.) accompanying them. Thus, photo.net’s “Search user uploaded photos” feature is the only tool available to search in the photo database.
    Can the “Search user uploaded photos” feature be changed so that it uses a relatively current index that reflects changes made by users in the information attached to their images?
     
  2. Dave, I don't know how often those indexes are updated. Certainly, I would expect to see the results of a change in Edit Image Info to be reflected immediately.
     
  3. Brian:
    There are several photos in my portfolio that the search engine does not find when I search using words that are currently part of the caption, but that it does find using words that used to be in the caption but were taken out of the caption months ago. There are other oddities as well -- I have 38 photos in my portfolio, but when I search using my last name, the search engine tells me that there are 32 hits -- and it actually only displays 30 of them.
    Things are actually worse for some other folks -- like you. You have 27 photos in your photo.net portfolio. However, when I search using "Mottershead", the results page tells me there is exactly 1 hit ("Blue Table And Chairs").
    Also, when I search using terms that appear in the captions of some of your photos -- "Lully" ("Rising Storm, Lully"), "checkerboard" ("Sidewalk Checkerboard"), and "artifact" ("Artifact of the Old Economy"), for example -- the search engine does not find the photos. In other words, those captions seem not to have made it into the index.
    Or how about this: try to find the current POW, which is captioned "Rajasthani women", by searching for "Rajasthani". The "Search user uploaded photos" search engine doesn't find it.
    Something is clearly not working correctly with that search engine.
     
  4. I discovered that the indexing of photos was disabled after around 100000 photos had been indexed. Since we now have around 350000, that is quite a while ago. I am concerned that indexing all 350000 will be (a) quite expensive in resources; and (b) not be very useful, since a lot of these photos are neither good nor of any great interest.

    I am thinking of indexing only the following photos: (1) anything posted within the last 60 days; (2) any photo with at least 5 ratings averaging 12 (O+A total) or higher.
     
  5. Brian, could you include number of comments >3 as a criterion? - as this would weed out most of the junk while keeping in photos that might get a low score because they're controversial.

    Will you use the folder name in the indexing process and have you considered asking photographers to categorise their work, or provide key words? - because the titles often don't give much idea of the content and if the photo is obviously of a particular subject there may not seem to be any need to mention this in the caption. Asking for key words at this stage would be a good way of checking who is remaining active in the community.

    Finally the "score" (first column) seems an obscure parameter and a fairly useless one.

    I must say I'm absolutely delighted that you are re-jigging the search process because its present weakness is the main reason I have been spending more time on PhotoSIG recently.

    What I would really like is a special category comprised of: family snaps, flowers, cars, pets and 9/11 - so I could EXCLUDE it.

    Thanks for an otherwise great site.
     
  6. Brian, indexing 400,000 images may be resource intensive on the first run, but after that the index will be updated only for new/changed images. The indexing engine will not need to rebuild the entire index every time. And instead of using a kind of fascist policy to exlude lower rated images, is it such a big deal to index all images and add an option to sort by rating?

    I really do appreciate what you have done for this site, but I think you should make a higher priority to fix the search button.

    Thank you.
     
  7. pmj

    pmj

    I think it's not just the Oracle performance required to (re)build the index that's an issue here.

    The index must be stored somewhere (i.e., requires disk space), needs to be updated for every insert/edit/delete of a photo (i.e., requires Oracle power several thousand times each day) and will be used (duhh!) by the members (i.e., requires even more Oracle power each day).

    One could argue that a working search index would increase the use of the gallery, resulting in more subscriptions, but I'm not too sure about that.
     
  8. >The index must be stored somewhere (i.e., requires disk space) A good point, but even if you allow a generous 1Kb of text to be indexed per image that will add a small fraction to the disk space already used by that image. That would be 400 Mb for 400,000 images, but in practice the index will be smaller than that. I guess you can have a rough estimate by looking at how much space is taken by the current outdated index for 100,000 photos. > , needs to be updated for every insert/edit/delete of a photo (i.e., > requires Oracle power several thousand times each day) several thousand per day translates to a few times per minute - no sweat for Oracle engine. > and will be used (duhh!) by the members (i.e., requires even more > > Oracle power each day). "featuring 430,853 images" is in the photo.net headline. I suggest adding "(but you can't search them)". > One could argue that a working search index would increase the use of > the gallery, resulting in more subscriptions, but I'm not too sure > about that. I agree. It's kinda disappointing when there is a great site like photo.net that doesn't have a good search engine.
     

Share This Page