Reliability
Reliability relates to the extent of variability and error inherent in a measurement and can be interpreted as the extent to which measurements can be replicated1. Reliability reflects not only degree of correlation but also agreement between measurements1.
Reliability is NOT the same as validity
“in order to improve reliability, attempts must be made to remove random error”2
Significance of Reliability
- One cannot be confident in the results of a scale with poor reliability3.
Misconceptions
Reliability is sometimes used synonymously with ‘precision,’‘agreement,’ and ‘repeatability,’ but these are misslabeling the concept3.
Non-mathematical explanation
Reliability is made up of 2 parts: Correlation and Agreement. :::{layout-ncol=“2”} ### Correlation
- Correlation refers to the fact that there is a pattern between results.
- For example, results iwth high correlation would mean that Test 2 is always greater than test 1.
- Or Test 2 is always slightly smaller than test 1.
Agreement
- Agreement quantifies the magnitude of difference from Test 1 and Test 2.
- It does not indicate whether the results follow a similar pattern, but how much they differ from eachother.
:::
Calculation
Reliability is the proportion of the total variance (\(\sigma_Total^2\)) in scores that is due to differences among people3.
\[ Reliability = \frac{\sigma_s^2}{\sigma_s^2 + \sigma_e^2} = \frac{\sigma_s^2}{\sigma_Total^2} \]
Symbol | Meaning |
---|---|
\(\sigma_s^2\) | Subject variability3,4 |
\(\sigma_e^2\) | Measurement error3,4 |
\(\sigma_Total^2\) | Total variance3,4 |
- If there is no variability between participants ( \(\sigma_s^2\) ) then \(\sigma_s^2 = 0\) the reliability is 03.
- The reliability of a scale reflects its ability to differentiate among people and if it cannot, then the reliability is 0 and the scale is useless3.
- Another implication about \(\sigma_s^2\) is that reliability is not a fixed property, but rather dependent on the sample being studied.
- Reliability is is very dependent on the sample in which it is determined3.
- Example: Applying a depression scale to ER patients vs outpatient psychiatric patients
- The ER Patients would consist of people who have very low depression to extremely high depression (suicidally depressed)3.
- This sample has a very high patient variability ( \(\sigma_s^2\) ) and thus a higher reliability score3
- The outpatient psychiatric patients would likely have moderate levels of depression (those with high depression would be administered to a hospital)3.
- Thus there will be low patient variability (\(\sigma_s^2\)) and the reliability will be lower3.
- The ER Patients would consist of people who have very low depression to extremely high depression (suicidally depressed)3.
Types of Reliability
Inter-rater Reliability
- Inter-rater reliability (IRR): Consistency of a measure assessed by multiple raters
Inter-rater Agreement
Inter-rater agreement (IRA) measures the variation in results of a test when performed by different assessors on the same patient at the same time point.
Intra-rater Reliability
Intra-rater reliability measures the variation of results of an assessor across multiple time points.
Test-Retest Reliability
Test-Retest Reliability measures the reliability of the instrument across multiple time points.
Internal Consistency
Reliability Coefficients
Common reliability statistics include:
Scoring
Values range from 0.00 (not reliable) to 1.00 (perfectly reliable)
Score | Reliability |
---|---|
0.00 – 0.20 | Poor |
0.21 – 0.40 | Fair |
0.41 – 0.60 | Moderate |
0.61 – 0.80 | Good |
0.81 – 1.00 | Excellent |
For reliability measures, the confidence interval defines a range in which the true coefficient lies with a given probability5
For a result to show that reliability is better than chance at a confidence level of 95%, the lower limit of the CI must be above 0.65.
Interpretation
*Poor-moderate reliability are not good for clinical decision making since you are likely to get a different result everytime you test your patient, regardless of status