Frequently, a set of objects has to be evaluated by a panel of assessors, but not every object is assessed by every assessor. A problem facing such panels is how to take into account different standards amongst panel members and varying levels of confidence in their scores. Here, a mathematically-based algorithm is developed to calibrate the scores of such assessors, addressing both of these issues. The algorithm is based on the connectivity of the graph of assessors and objects evaluated, incorporating declared confidences as weights on its edges. If the graph is sufficiently well connected, relative standards can be inferred by comparing how assessors rate objects they assess in common, weighted by the levels of confidence of each assessment. By removing these biases, "true" values are inferred for all the objects. Reliability estimates for the resulting values are obtained. The algorithm is tested in two case studies, one by computer simulation and another based on realistic evaluation data. The process is compared to the simple averaging procedure in widespread use, and to Fisher's additive incomplete block analysis. It is anticipated that the algorithm will prove useful in a wide variety of situations such as evaluation of the quality of research submitted to national assessment exercises; appraisal of grant proposals submitted to funding panels; ranking of job applicants; and judgement of performances on degree courses wherein candidates can choose from lists of options.
- model comparison