Development and validation of a multitask deep learning model for severity grading of hip osteoarthritis features on radiographs

Claudio E. von Schacky, Jae Ho Sohn, Felix Liu, Eugene Ozhinsky, Pia M. Jungmann, Lorenzo Nardo, Magdalena Posadzy, Sarah C. Foreman, Michael C. Nevitt, Thomas M. Link, Valentina Pedoia

Research output: Contribution to journalArticlepeer-review

12 Scopus citations


Background: A multitask deep learning model might be useful in large epidemiologic studies wherein detailed structural assessment of osteoarthritis still relies on expert radiologists’ readings. The potential of such a model in clinical routine should be investigated. Purpose: To develop a multitask deep learning model for grading radiographic hip osteoarthritis features on radiographs and compare its performance to that of attending-level radiologists. Materials and Methods: This retrospective study analyzed hip joints seen on weight-bearing anterior-posterior pelvic radiographs from participants in the Osteoarthritis Initiative (OAI). Participants were recruited from February 2004 to May 2006 for baseline measurements, and follow-up was performed 48 months later. Femoral osteophytes (FOs), acetabular osteophytes (AOs), and joint-space narrowing (JSN) were graded as absent, mild, moderate, or severe according to the Osteoarthritis Research Society International atlas. Subchondral sclerosis and subchondral cysts were graded as present or absent. The participants were split at 80% (n = 3494), 10% (n = 437), and 10% (n = 437) by using split-sample validation into training, validation, and testing sets, respectively. The multitask neural network was based on DenseNet-161, a shared convolutional features extractor trained with multitask loss function. Model performance was evaluated in the internal test set from the OAI and in an external test set by using temporal and geographic validation consisting of routine clinical radiographs. Results: A total of 4368 participants (mean age, 61.0 years 6 9.2 [standard deviation]; 2538 women) were evaluated (15 364 hip joints on 7738 weight-bearing anterior-posterior pelvic radiographs). The accuracy of the model for assessing these five features was 86.7% (1333 of 1538) for FOs, 69.9% (1075 of 1538) for AOs, 81.7% (1257 of 1538) for JSN, 95.8% (1473 of 1538) for subchondral sclerosis, and 97.6% (1501 of 1538) for subchondral cysts in the internal test set, and 82.7% (86 of 104) for FOS, 65.4% (68 of 104) for AOs, 80.8% (84 of 104) for JSN, 88.5% (92 of 104) for subchondral sclerosis, and 91.3% (95 of 104) for subchondral cysts in the external test set. Conclusion: A multitask deep learning model is a feasible approach to reliably assess radiographic features of hip osteoarthritis.

Original languageEnglish (US)
Pages (from-to)139-145
Number of pages7
Issue number1
StatePublished - Jan 1 2020

ASJC Scopus subject areas

  • Radiology Nuclear Medicine and imaging


Dive into the research topics of 'Development and validation of a multitask deep learning model for severity grading of hip osteoarthritis features on radiographs'. Together they form a unique fingerprint.

Cite this