: The clinical translation of automated HPV detection in Whole Slide Images (WSIs) is challenged by staining variability, sparse viral effects, and the biological continuum between cytology and histology. This work presents a fully automated pipeline for binary patch-level classification of HPV-induced lesions on H&E-stained tissue. The core contribution is a domain-adaptive transfer learning strategy: a ResNet50 backbone is pretrained on the SIPaKMeD cervical cytology dataset rather than ImageNet, then fine-tuned on a target histological cohort. Preprocessing includes adaptive tissue segmentation, blur rejection, and Macenko stain normalization to ensure vendor-agnostic inputs. Evaluated using a strict Leave-One-Patient-Out cross-validation on 42 diagnostic specimens, the SIPaKMeD-based initialization significantly outperforms the ImageNet baseline. This approach achieves higher AUC-ROC scores and superior stability across folds, demonstrating that domain-specific pretraining effectively mitigates data scarcity and class imbalance in digital cervical cancer screening. Under a complementary 5-fold patient-level cross-validation covering all 19 patients of the cohort (133,704 patches, 7181 HPV-positive, prevalence 5.37%), the SIPaKMeD-pretrained model attains a mean test AUC-ROC of 0.694 with a 95% patient-aware bootstrap confidence interval of [0.681, 0.705], consistently above the ImageNet baseline mean of 0.656 obtained on the controlled three-fold ablation.
Domain-Adaptive Transfer Learning for HPV Lesion Classification in Whole Slide Images: A Patient-Level Pipeline Across the Cytology–Histology Continuum
De Luca, Pasquale
;Di Nardo, Emanuel;Marcellino, Livia;Ciaramella, Angelo
2026-01-01
Abstract
: The clinical translation of automated HPV detection in Whole Slide Images (WSIs) is challenged by staining variability, sparse viral effects, and the biological continuum between cytology and histology. This work presents a fully automated pipeline for binary patch-level classification of HPV-induced lesions on H&E-stained tissue. The core contribution is a domain-adaptive transfer learning strategy: a ResNet50 backbone is pretrained on the SIPaKMeD cervical cytology dataset rather than ImageNet, then fine-tuned on a target histological cohort. Preprocessing includes adaptive tissue segmentation, blur rejection, and Macenko stain normalization to ensure vendor-agnostic inputs. Evaluated using a strict Leave-One-Patient-Out cross-validation on 42 diagnostic specimens, the SIPaKMeD-based initialization significantly outperforms the ImageNet baseline. This approach achieves higher AUC-ROC scores and superior stability across folds, demonstrating that domain-specific pretraining effectively mitigates data scarcity and class imbalance in digital cervical cancer screening. Under a complementary 5-fold patient-level cross-validation covering all 19 patients of the cohort (133,704 patches, 7181 HPV-positive, prevalence 5.37%), the SIPaKMeD-pretrained model attains a mean test AUC-ROC of 0.694 with a 95% patient-aware bootstrap confidence interval of [0.681, 0.705], consistently above the ImageNet baseline mean of 0.656 obtained on the controlled three-fold ablation.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


