Abstract: A Completely Annotated Whole Slide Image Dataset of Canine Breast Cancer to Aid Human Breast Cancer Research
Marc Aubreville, Christof A. Bertram, Taryn A. Donovan, Christian Marzahl, Andreas Maier, Robert Klopﬂeisch
Technische Hochschule Ingolstadt, Computer Science
Canine mammary carcinoma (CMC) has been used as a model to investigate the tumorigenesis of human breast cancer and the same histological grading scheme is commonly used to estimate patient outcome for both. One key component of this grading scheme is the density of cells undergoing cell division (mitotic figures, MF). Current publicly available datasets on human breast cancer only provide annotations for small subsets of whole slide images (WSIs). We present a novel dataset of 21 WSIs of CMC completely annotated for MF. For this, a pathologist screened all WSIs for potential MF and structures with a similar appearance. A second expert blindly assigned labels, and for non-matching labels, a third expert assigned the final labels. Additionally, we used machine learning to identify previously undetected MF. Finally, we performed representation learning and two-dimensional projection to further increase the consistency of the annotations. Our dataset consists of 13,907 MF and 36,379 hard negatives. We achieved a mean F1 score of 0.791 on the test set. Testing our algorithms without any further adaptation on a human breast cancer dataset (AMIDA13) yielded a mean F1-score of 0.635. The F1-score increased to 0.696 when using threshold optimization and model selection, and to 0.733 using transfer learning, both on the human tissue training set.
1. Aubreville M, Bertram CA, Donovan TA, et al. A completely annotated whole slide image dataset of canine breast cancer to aid human breast cancer research. Scientific Data. 2020;7(417).