An international multicenter study to evaluate reproducibility of automated scoring for assessment of Ki67 in breast cancer

David L. Rimm, Samuel C.Y. Leung, Lisa M. McShane, Yalai Bai, Anita L. Bane, John M.S. Bartlett, Jane Bayani, Martin C. Chang, Michelle Dean, Carsten Denkert, Emeka K. Enwere, Chad Galderisi, Abhi Gholap, Judith C. Hugh, Anagha Jadhav, Elizabeth N. Kornaga, Arvydas Laurinavicius, Richard Levenson, Joema Lima, Keith MillerLiron Pantanowitz, Tammy Piper, Jason Ruan, Malini Srinivasan, Shakeel Virk, Ying Wu, Hua Yang, Daniel F. Hayes, Torsten O. Nielsen, Mitch Dowsett

Research output: Contribution to journalArticlepeer-review

42 Scopus citations


The nuclear proliferation biomarker Ki67 has potential prognostic, predictive, and monitoring roles in breast cancer. Unacceptable between-laboratory variability has limited its clinical value. The International Ki67 in Breast Cancer Working Group investigated whether Ki67 immunohistochemistry can be analytically validated and standardized across laboratories using automated machine-based scoring. Sets of pre-stained core-cut biopsy sections of 30 breast tumors were circulated to 14 laboratories for scanning and automated assessment of the average and maximum percentage of tumor cells positive for Ki67. Seven unique scanners and 10 software platforms were involved in this study. Pre-specified analyses included evaluation of reproducibility between all laboratories (primary) as well as among those using scanners from a single vendor (secondary). The primary reproducibility metric was intraclass correlation coefficient between laboratories, with success considered to be intraclass correlation coefficient >0.80. Intraclass correlation coefficient for automated average scores across 16 operators was 0.83 (95% credible interval: 0.73–0.91) and intraclass correlation coefficient for maximum scores across 10 operators was 0.63 (95% credible interval: 0.44–0.80). For the laboratories using scanners from a single vendor (8 score sets), intraclass correlation coefficient for average automated scores was 0.89 (95% credible interval: 0.81–0.96), which was similar to the intraclass correlation coefficient of 0.87 (95% credible interval: 0.81–0.93) achieved using these same slides in a prior visual-reading reproducibility study. Automated machine assessment of average Ki67 has the potential to achieve between-laboratory reproducibility similar to that for a rigorously standardized pathologist-based visual assessment of Ki67. The observed intraclass correlation coefficient was worse for maximum compared to average scoring methods, suggesting that maximum score methods may be suboptimal for consistent measurement of proliferation. Automated average scoring methods show promise for assessment of Ki67 scoring, but requires further standardization and subsequent clinical validation.

Original languageEnglish (US)
JournalModern Pathology
StateAccepted/In press - Jan 1 2018

ASJC Scopus subject areas

  • Pathology and Forensic Medicine


Dive into the research topics of 'An international multicenter study to evaluate reproducibility of automated scoring for assessment of Ki67 in breast cancer'. Together they form a unique fingerprint.

Cite this