T1: Classifying H&E-stained prostate biopsies into ISUP scores¶
Objective:
Develop a model to classify H&E-stained prostate biopsy slides into the International Society of Urological Pathology (ISUP) grade groups. These grades reflect prostate cancer aggressiveness and are crucial for treatment decisions.
Patient Population:
Patients with prostate cancer from two cohorts:
- 165 cases from Radboudumc (2012–2017), mainly from screening programs.
- 113 cases from six hospitals collected via a social media call, capturing routine clinical practice diversity.
Imaging Data:
Whole slide images (.tif format) at 0.5 microns per pixel resolution. Each slide is accompanied by a binary tissue mask:
- Label 0: Background
- Label 1: Tissue region
Each case contains a single biopsy, which corresponds to the most representative biopsy (selected and cropped by an expert pathologist) for that case.
Test Data:
Unlabeled whole slide images and tissue masks in the same format as the training data. Participants must predict the ISUP grade for each slide.
Reference Standard:
- Radboudumc cohort: Graded independently by three expert uropathologists using ISUP 2014 guidelines. Disagreements were resolved in a three-round process including consensus meetings.
- Six-hospital cohort: Clinically graded by contributing pathologists, with biopsy selection overseen by an expert.
Pathologists showed strong agreement:
- Pairwise quadratic kappa: 0.926
- Average vs. majority vote: 0.878 (range: 0.847–0.914)
- Pairwise agreement: average 0.858 (range: 0.777–0.916)
Evaluation metrics:
Cohen’s quadratic weighted kappa between predicted and reference ISUP grades (1–5). This metric accounts for the degree of disagreement between predicted and true grades.
Relation to Existing Challenges:
Task 1 is partly adapted from the PANDA challenge.
Differences from PANDA:
- We provide pre-computed tissue segmentation masks
- We do not provide pre-computed Gleason pattern segmentation annotations masks
- Not all the data comes from PANDA
- Unlike PANDA, which focused on developing models using a large, locally labeled dataset, this task emphasizes few-shot learning, with training conducted on-platform