Intra-class Correlation Coefficient (ICC)

Authors
Affiliations

Doctor of Physical Therapy

B.S. in Kinesiology

Doctor of Physical Therapy

B.A. in Neuroscience

Intra‐class correlation coefficient (ICC) is a modified pearson correlation coefficient that indexes both degree of correlation and agreement between two measurements.

The ICC is the most comprehensive measure of reliability since it depends on level of agreement (like kappa) and on the correlation between 2 measures (like correlation coefficient)

It is sensitive for example to the extent to which subjects (individuals) keep their ranking order in repeated measurements. Moreover, itmay indicate the ability of an experimental method to detect and measure systematic differences between subjects. This ability islimited since those differences may be more or less masked by individual variations of

How to use this test

Dr. Monroe stated that ICC is the most comprehensive form of reliability. If the ICC is high, then you do not need to examine the other statistical results. If the ICC is low, then

Models

One-Way Random-Effects Model

  • Either participants or evaluators are random
  • This is not very common

In this model, each subject is rated by a different set of raters who were randomly chosen from a larger population of possible raters. Practically, this model is rarely used in clinical reliability analysis because majority of the reliability studies typically involve the same set of raters to measure all subjects. An exception would be multicenter studies for which the physical distance between centers prohibits the same set of raters to rate all subjects. Under such circumstance, one set of raters may assess a subgroup of subjects in one center and another set of raters may assess a subgroup of subjects in another center, and hence, 1-way random-effects model should be used in this case.

Two-Way Random

  • both the patients and researchers are random

Two-Way Mixed

  • Most common
  • Raters are fixed (not random)

Type

  • Consistency
  • Absolute agreement

Consistency

  • Evaluating any potential or level of linear relationship between instructors

Absolute Agreement

  • Evaluates how close the raters were in terms of their scores
  • Not interested in linear relationship
  • Focuses on whether raters have close or identical ratings

ICC Forms

10 forms of ICC based on the “Model” (1-way random effects, 2-way random effects, or 2-way fixed effects), the “Type” (single rater/ measurement or the mean of k raters/measurements), and the “Definition” of relationship considered to be important (consistency or absolute agreement).

Choosing the Right ICC Form

There are 4 guiding questions

  1. Do we have the same set of raters for all subjects?
  2. Do we have a sample of raters randomly selected from a larger population or a specific sample of raters?
  3. Are we interested in the reliability of single rater or the mean value of multiple raters?
  4. Do we concern about consistency or agreement?

The first 2 questions guide the “Model” selection, question 3 guides the “Type” selection, and the last question guides the “Definition” selection

Calculation

  • Model 1 = one-way random model (no bias)
  • Model 2 = two-way random model (random bias)
  • Model 3 = two-way mixed model (fixed bias)
  • n = number of subjects (targets)
  • k = number of measurements (conditions, raters)
  • i = subject index (i = 1,…, n)
  • j = measurement index (j = 1,…, k)
  • N = number of simulated matrices
  • < … > = average value obtained in simulation
  • μ = population mean of subject’s scores
  • ri = deviation from mean for subject i
  • cj = bias in measurement j
  • vij = “noise” = error in measurement j for subject i (Model 1)
  • eij = error in measurement j for subject i, Model 2 and Model 3
  • rcij = interaction in measurement j for subject i, Model 2 and Model 3
  • vij = “noise” = eij + rcij (Model 2 and Model 3)
  • σr2 σr2 = variance of ri
  • σc2 = variance of cj
  • σe2 = variance of eij
  • σrc2 = variance of rcij
  • σv2 = variance of vij (=σc2+σe2 in Model 2 and Model 3)
  • ρ1 = population ICC, Model 1
  • ρ2A = absolute agreement population ICC, Model 2
  • ρ2C = consistency population ICC, Model 2
  • ρ3A = absolute agreement population ICC, Model 3
  • ρ3C = consistency population ICC, Model 3
  • MST = mean square total
  • MSWS = mean square within subjects
  • MSBS = mean square between subjects
  • MSWM = mean square within measurements
  • MSBM = mean square between measurements
  • MSE = mean square error
  • F=MSBMMSE=F-value
  • ICC(1) ≡ ICC(1,1) = sample ICC formula, Model 1
  • ICC(A,1) ≡ ICC(2,1) = sample ICC formula, absolute agreement, Model 2 and 3
  • ICC(C,1) ≡ ICC(3,1) = sample ICC formula, consistency, Model 2 and 3

However, modern ICC is calculated by mean squares (ie, estimates of the population variances based on the variability among a given set of measures) obtained through analysis of variance.

  • (σs2) = Variance of Interest
  • (σe2) = Unwanted variance

ICC=σs2σs2+σe2=(Variance of Interest)(Variance of Interest)+(Unwanted Variance)

Alternative equation from

ICC=σs2σs2+σe2=(Between subject variance)(Between subject variance)+(Within subject Variance)

Note

the word “subject” in the previous equation likely refers to “rater” since this test assesses inter-rater reliability

What happens when the unwanted variance (σe2) is equal to or larger than the variance of interest (σs2) (for example, the variance between subjects)?

We will use an example of this, with variance of interst being “5” and unwanted variance being “6” ICC=(VarianceofInterest)(VarianceofInterest)+(UnwantedVariance)=(5)(5)+(6)=511=0.455

As a result, the reliability (ICC) of the method will be poor, resulting with a value of 0.455.

Output

An ICC will generally provide a single measure and average measure output for ICC.

Single

Scoring

Values range from 0.00 (not reliable) to 1.00 (perfectly reliable)

“Rule of thumb” Quick interpretation of reliability
Score Reliability
0.00 – 0.20 Poor
0.21 – 0.40 Fair
0.41 – 0.60 Moderate
0.61 – 0.80 Good
0.81 – 1.00 Excellent

Interpretation

1.
Koo TK, Li MY. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. Journal of Chiropractic Medicine. 2016;15(2):155-163. doi:10.1016/j.jcm.2016.02.012
2.
Liljequist D, Elfving B, Skavberg Roaldsen K. Intraclass correlation - A discussion and demonstration of basic features. PloS One. 2019;14(7):e0219854. doi:10.1371/journal.pone.0219854
3.
Gisev N, Bell JS, Chen TF. Interrater agreement and interrater reliability: Key concepts, approaches, and applications. Research in social & administrative pharmacy: RSAP. 2013;9(3):330-338. doi:10.1016/j.sapharm.2012.04.004

Citation

For attribution, please cite this work as: