T19: Anonymizing report


Objective:
This task requires identifying and tagging personally identifiable information (PII) within reports. The target PII categories include:

  • Dates
  • Personal identifiers
  • Report identifiers
  • Locations
  • Clinical trial names
  • Times
  • Ages

The goal is to replace each PII entity with its corresponding tag in the original text, producing a fully anonymized version of the report.

Patient Population:
Pathology and radiology reports were sourced from two hospitals:

  • Radboudumc
  • Antoni van Leeuwenhoek Ziekenhuis

The dataset includes 1,307 reports.

Imaging Data:
Not applicable. The task is based solely on textual data — radiology and pathology reports written in Dutch.

Test Data:
Unlabeled reports requiring anonymization. Participants must return a string where all PII entities are replaced with their corresponding tags (e.g., <DATE>, <NAME>, <HOSPITAL>, etc.).

Reference Standard:
Reports were annotated semi-automatically using an in-house rule-based system. All annotations were manually verified and corrected by three trained investigators to ensure high-quality ground truth.

Evaluation Metrics:

  • Model predictions are compared against ground truth annotations using the macro F1 score, calculated over all PII categories.
  • A prediction is counted as correct only if the predicted entity exactly matches the ground truth in type and position.

Relation to existing Challenges:

  • Task 19 is derived from Task025 (Anonymization) of the DRAGON challenge.
  • In contrast to DRAGON, the output format in UNICORN has been updated:

    • Instead of BIO-tagging tokens, participants must output the original text with tagged entities replaced inline (e.g., De patiënt is <AGE> jaar oud...).
    • Public few-shot examples have been updated accordingly.

Additional Resources: