PN metrics

stevenseelig · September 21, 2006

Does anyone know what the mean and standard deviation is for aesthetics and

originality ratings are on PN?

Is possible for the rater id be attached to the ratings they provide so the

merit of rater can be judged or at least attach what their mean and SD A and O

ratings are?

Thanks

Steven

dem_photos · September 21, 2006

Steven: It's entirely irrelevant, but you might enjoy this:

http://www.photo.net/bboard/q-and-a-fetch-msg?msg_id=00Hr8P

I seem to remember Brian Mottershead dropping the occasional statistic (he was fond of pointing out that, on average, direct ratings were significantly higher than anonymous ratings), but I don't remember specifics, and I don't know if anyone is keeping track since he left.

byronlawrence · September 21, 2006

hahaha.. yea,, that's great stuff.

you can look up a rater, and see the average of the ratings given and the number of ratings given. maybe they will do something like that for the whole sight.. like include a statistic next to your rated photo telling you in what percentile it lays, or next to your average ratings (on all your photos) what percentile you are at compared to every one elses, or an improvement/popularity growth curve. intersting ideas for sure.

stevenseelig · September 21, 2006

My main reason for the initial post is I have been involved in an FDA regulated environment and as part of the quality control process we have used "subjective human ratings of the appearance" of something. Over the last 15 years, it is quite apparent that every individual rater tracks pretty much along their own average which may be substantially different from other raters. Thus, to "know" the meaning of the rater there are two choices: either train everyone to the same standard, or normalize each rater for themself. The first path on PN would be ridiculous and a terrible idea, because diversity is good. But it would be computationally trivial to normalize each rater to their own pattern and then express how many standard deviations above or below their mean value their rating of a particular picture was. Alternatively, provide an estimate of percentile for that particular individual....probably a better approach sense mean/SD assumes normal distribution, but percentile would not require normal distribution.

Then use the percentiles to calculate controversy index..which would likely be more meaningful. But beyond the numbers, I believe there is merit to attaching the raters PN ID which would be even easier to implement, I think. I would like to understand why someone rates my pictures either high or low....that is the way I learn... so with ID, I could ask the person the reason behind their rating... was it the fuzziness of the back eye, or the missing tooth in the grin or BW might have been better over color.

If PN wants to set itself apart from much of the rest of the garbage on the internet, I think this would be a good start.

Just IMHO. Thanks to the link re controversy index....similar concepts are involved...

Oops... a simple number...for the whole site, for the past month or year or whatever time, is there an average number for A and O with their respective SD. Someone at PN must know..help!

Steven

byronlawrence · September 21, 2006

well one problem with normalizing each rater, is that too many people only rate pictures they like or feel are good, and skip over other images. if you normalized all these ratings you would end up with a description for pictures giving the idea that the photo is not as good as it might be. you might even end up getting really good pictures with similar rating descriptions (for lack of a better term) to those of obviously poorer quality.

byronlawrence · September 21, 2006

.. and addition to my previous post.

If you could ensure that a rater rated EVERY photo that popped up in front of them then you might then be able to apply some useful normalization.

stevenseelig · September 21, 2006

Well, that could be contained within some guideline... rate whether you like or not...but at least for myself, I try to keep my rating averages close to 4..which by necessity means I hand out a few 1 and 2 to compensate for the 6 and 7...and in fact some pictures do not qualify for anything better than a 1 or 2....

Not sure you have to force ratings on everything... just if someone only rates a 6 on high quality pictures only, then the rater will have to ponder what to do when they find something they really like...because a 7 might not look at that impressive.

Anyway, I think it is experiment worth trying if PN would be willing to implement and let's see where it goes..and maybe in the final analysis it is a terrible idea, but I think not.

I don't know any of the movers and shakers at PN, but hopefully someone does....and perhaps an inquiry can be made...

Knowing individual ratings and the PN ID attached to those ratings would allow us to focus on certain responses and exclude other responses based on our individual assessment of the raters....and this strategy would not require a change in the rating systems at all.. just display of the data.

All of this is IMHO...

Steven

wood · September 21, 2006

Byron,

<p>

Following your last comment, perhaps it's not a bad idea to disable the "skip" (>>) button on the rate queue pages. Every photo would have to be rated before proceeding to the next. This might also help raters develop a more "normal" distribution to their ratings so that the following comment might be more reasonable.

<p>

I like the idea of normalizing rates based on the raters average. Is there a down side to this?

bennyboy · September 22, 2006

I think disabling the skip button could be a valid way forward, BUT, only if the O & A vales are amalgamated into a composite 'rating' value. (add O & A, divide by 2, round up or down by convention - not hard to do)

Anyhow, I've said it many times before.

stevenseelig · September 22, 2006

Well, I think I had asked a simple question..as to the actual average value of aesthetics and

originality along with their standard deviations on PN. Sadly, I have not seen the answer to

that question. Hoping that this is not lost voices in the woods.

Steven

root · September 23, 2006

The site wide average is well above 4.0 - closer to 5.0 as I recall - for both A and O. The reason, as stated in many threads, is that many people only rate images they like. Therefore, the statistic that you're interested in has no value, especially since some raters, like myself, have changed their ratings behavior for a variety of reasons during their time on this site.

stevenseelig · September 23, 2006

Carl,

Thanks for the note. What I have learned from your simple comment is when I receive a

4.0 rating that is well below the average and I have much room for improvement. I have

also learned that when I rate something as a 4.0 within the scheme of things, I am giving a

poor score rather than an average score. I must wonder how many people on the site

would have learned the same thing as I did from this simple statistic.

I am curious as to why you have changed your rating behavior?

I suspect, but do not know, that if ratings were normalized by individual, then people

might want to provide a spectrum of responses. If a person only rates things they like,

then their average score will be high, let say 5.5 so when they rate something a 5 which on

an absolute scale looks pretty good, in fact is really saying for that individual, they don't

like it as much as other pictures they have seen.

Providing a normalized value, like SD above or below the individual raters average value

would provide insight into the true meaning of the rater. For example, my average scores

are around 4.0 so when I give a 5, my intent is to say to the uploader, I like your picture,

but on an absolute scale it might appear to them as if I am only giving them an average

rating.

So normalization is a strategy, and perhaps both could be reflected under the pictures

(absolute and a some variant of normalization). I suppose some sort of function could be

used to allow raters to reset their values, if the make a decision to change their rating

strategy.

Having said all that, another solution path might be to simply provide the PN id of the

rater along with their rating. As the uploader, this would allow me to look at the raters

own pictures and for me to decide what level of importance to attach to their rating.

So you understand where I am coming from, I think PN is fantastic and a wonderful

resource for photographers from different parts of the world and skill sets. I joined for a

number of reasons: to get meaningful feedback on my own work was an important one,

but I also look thru the top rated pictures pretty frequently in an effort to understand how

people perceive good photographs.

Critiques and suggestions would likely be more valuable to me then ratings, but I did a

small experiment the other day and decided I was going to write a critique on each

picture. In about 20-30 minutes I got thru 3-5 pictures..so meaninful critiques take alot

of time, but IMHO, much more enriching.

I hope some of what I suggest could be incorporated into PN..although I suspect you have

heard it all before and for various reasons/rationale decided not to implement.

Regards,

Steven

root · September 23, 2006

Forget trying to interpret numbers, one at a time or collectively.

stevenseelig · September 23, 2006

OK (although I am not convinced that is a correct conclusion)... how about attaching PN id

numbers to the ratings?

steven

stevenseelig · September 24, 2006

Carl,

In reflecting on your position further, if the numbers do not mean anything, then the use

of ratings to rank pictures into Top Photos and to display photos based on their ranking

from high to low seems terribly paradoxical. If the numbers don't mean anything, then

get rid of the top photo sections and quit displaying pictures as if some are better or

worse based on the ranking.

Personally, when I look at the highly ranked pictures, I think they are clearly better, so I

guess I am back to the notion that PN needs to work at improving the rating system...and I

reject your notion, sorry to say, that they do not mean anything.

So PN, what can we do to improve the rating methodologies?

Steven

root · September 24, 2006

I didn't say they mean nothing. I said forget trying to interpret them as information that you can use about your own images.

The numbers are a reflection of the discrimination of the raters. I suspect that in time, many of them will become bored with overly saturated sunsets, faux filters, and even some of the subjects that are done quite well but are over represented, like birds.

fhmillard · September 24, 2006

Using the metrics suggested assumes little or no turbulence exist in the rating system from input -- raters. Since ratings are "subjective", then we operate without a reference standard for each category on the rating scale; albeit this could be attained statistically, but I can not think of any compelling reasons to do so, since rating turbulence might be high from inclusion of new raters and changes in individual rating criteria of current and older raters.

I agree that the raters' PN link should be included, because I might want to rate them.

stevenseelig · September 24, 2006

Since I don't have any primary data, but PN could gather it, and do the experiment in the

background to look at performance.

While there may be fluctuations (turbulence), my experience is that is large sampling

systems, such as PN, those fluctuations rarely cause a major pertubation in output. With

large sampling, there would have to be a large number of raters shifting in the same

direction.

Carl, I am very confused as to your thoughts. I can't use them for assessment of my

images but PN can use the information in a useful way. You have now provoked my

curiosity. Can you expand on the notion or if there is another thread that would answer

this, if you would be so kind as to provide the link, that would be great.

Frank...a very simple solution is to attach PN ID to the ratings. At least in some cases I

would spend the effort to look at that raters information and pictures and decide my own

assessment of the rater....and in rare cases might ask for additional thoughts. I think this

simple step, outside the complexities of metrics, would provoke dialogue and

conversation among the PN users.

Just IMHO.

steven

stevenseelig · September 24, 2006

Carl...if i could learn to type and proof read, perhaps i might make more sense

"You have not provoked my curiosity." should have been " You have NOW provoked my

curiosity."

**post has been corrected by moderator**

fhmillard · September 24, 2006

Assuming you have not already done this, try resubmitting one of your highly rated photos (high 5's or better) again for critique. Wait about a month between submissions so that the photos do not appear in the critique forum at the same time; and you might want to do this several times. Compare rating distributions. Since PN provides weighted ratings distributions, you would be able to (at least):

1. Determine a "reference" rating distribution from all the ratings

2. Determine how each individual set of rating distributions differ in shape.

3. Use ANOVA to determine if each individual ratings set could have come from the same set of raters or if the same criteria was used for rating -- I do not know how to assume that the same raters used different criteria for each submission. At least, using ANOVA, you would be able to determine if each submission was significantly different from the total rating set.

You could also do this with a photo of low rating.

robert_g.1 · September 24, 2006

Steven you are making good scene , Attaching ID to rates can only improve the rating system . Or perhaps scrap the whole system , and start a new one based on critiques . Either way would be a huge improvement . When there is a negative rating system , people will be turned off and it will demise the site in time, Ive seen other web site go down and almost out because of a negative rating system . People get fed up with it .

robert_g.1 · September 24, 2006

I need spell check , "sense" not "Scene" but I like scene better !

stevenseelig · September 26, 2006

Frank,

Have you ever done such an experiment and if yes, what was the outcome?

Thanks

Steven

fhmillard · September 26, 2006

No not with these data. I just thought you might want to test for your self.

Sign In

PN metrics

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in