T12: Predicting histopathology sample origin


Objective:
Develop a model to classify the anatomical origin of histopathology material based on the content of pathology reports. Possible origins are “lung”, “lymph node”, “bronchus”, “liver”, “brain”, “bone”, or “other” when not fitting to the previous categories.

Patient Population:
Patients suspected of having non-small cell lung cancer (NSCLC), with pathology reports collected between January 1, 2016, and December 31, 2022.

Imaging Data:
Not applicable. The task is based solely on textual data — pathology reports written in Dutch.

Test Data:
Unlabeled pathology reports in Dutch, consistent in format with the training set. Participants must predict the categorical tissue origin for each report.

Reference Standard:

  • Each report is labeled with a categorical origin label (lung, lymph node, bronchus, liver, brain, pleural fluid, bone, or other)
  • Labels were manually assigned by student assistants through review of the full report content

Evaluation Metrics:
Model performance will be evaluated using the unweighted Cohen’s Kappa, which measures agreement between predicted and reference labels beyond chance.

Relation to Existing Challenges:

  • Task 12 is derived from Task013 (Histopathology Tissue Origin) of the DRAGON Challenge.
  • Unlike DRAGON, which allowed local training with larger labeled datasets, this task is designed for few-shot learning on the platform.

Additional Resources: