Inter-Rater Reliability

Authors
Affiliations

Doctor of Physical Therapy

B.S. in Kinesiology

Doctor of Physical Therapy

B.A. in Neuroscience

The important point here is the necessity for the independence of the raters; 1 rater must be ignorant of the other’s score. Otherwise, we would have a measure of the degree to which 1 person can accurately copy from another, which tells us much about the raters, but little about the subject.

Tests for Inter-Rater Reliability

Example of Poor Interrater reliability
  • If 1 rater gives a score of 7 and another a score of 12, we wouldn’t be sure which, if either, score is correct.
What to do with poor inter-rater reliability

How to improve inter-rater reliability: - The solution is usually more training of the raters - A common strategy is for raters to independently evaluate about 10 people who will not be part of the final sample, and to discuss why they disagreed on specific items; were the criteria unclear, ambiguous, poorly stated, or whatever? - This is repeated on a new group until satisfactory inter-rater reliability is achieved - If the study lasts more than a few months, it is usually a good idea to evaluate inter-rater reliability toward the middle to see if there has been any slippage in the level of agreement. - Problems exist with this method if thescale requires interviewing the subject.

References

1.
Streiner DL. Statistics Commentary Series: Commentary #15-Reliability. Journal of Clinical Psychopharmacology. 2016;36(4):305-307. doi:10.1097/JCP.0000000000000517
2.
Zapf A, Castell S, Morawietz L, Karch A. Measuring inter-rater reliability for nominal data - which coefficients and confidence intervals are appropriate? BMC medical research methodology. 2016;16:93. doi:10.1186/s12874-016-0200-9

Citation

For attribution, please cite this work as: