T16: Classifying colon histopathology diagnosis¶
Objective:
The goal of this task is to predict whether the specimen was obtained from 1) biopsy or polypectomy, and whether the pathologist rated the specimen as 2) cancer, 3) high-grade dysplasia (hgd), 4) hyperplastic polyps, 5) low-grade dysplasia (lgd), 6) non-informative (ni), or 7) serrated polyps. Each of these seven properties is binary, and multiple can be present per block.
Patient Population:
The dataset includes 1277 pathology reports (in Dutch) from patients diagnosed with histopathological conditions of the colon, and consists of biopsies collected between January 1, 2000, and December 31, 2009, at Radboudumc. For patients with multiple visits during this time frame, only the first visit was included.
Imaging Data:
Not applicable. The task is based solely on textual data — pathology reports written in Dutch.
Test Data:
Unlabeled pathology reports consistent with the format of the training data. The model's prediction should be a a list of eight seven values (i.e., true
or false
), each indicating the presence (true) or absence (false) of the following characteristics above mentioned categories.
Reference Standard:
Binary labels for sample type and each diagnosis category, annotated based on manual review of the reports.
Example: [true,false,false,false,true,false,false]
Evaluation Metrics:
Model performance will be evaluated by calculating the AUC for each class separately. The overall performance of the model will be defined as the average of the AUC values across all classes.
Relation to Existing Challenges:¶
- Task 16 is derived from Task015 (Colon histopathology diagnosis) of the DRAGON challenge.
- Unlike the DRAGON challenge, UNICORN allows training on platform using only a small set of few-shot examples per task.