T19: Anonymizing report¶
Objective:
This task requires identifying and tagging personally identifiable information (PII) within reports. The target PII categories include:
- Dates
- Personal identifiers
- Report identifiers
- Locations
- Clinical trial names
- Times
- Ages
The goal is to replace each PII entity with its corresponding tag in the original text, producing a fully anonymized version of the report.
Patient Population:
Pathology and radiology reports were sourced from two hospitals:
- Radboudumc
- Antoni van Leeuwenhoek Ziekenhuis
The dataset includes 1,307 reports.
Imaging Data:
Not applicable. The task is based solely on textual data — radiology and pathology reports written in Dutch.
Test Data:
Unlabeled reports requiring anonymization. Participants must return a string where all PII entities are replaced with their corresponding tags (e.g., <DATE>
, <NAME>
, <HOSPITAL>
, etc.).
Reference Standard:
Reports were annotated semi-automatically using an in-house rule-based system. All annotations were manually verified and corrected by three trained investigators to ensure high-quality ground truth.
Evaluation Metrics:
- Model predictions are compared against ground truth annotations using the macro F1 score, calculated over all PII categories.
- A prediction is counted as correct only if the predicted entity exactly matches the ground truth in type and position.
Relation to existing Challenges:
- Task 19 is derived from Task025 (Anonymization) of the DRAGON challenge.
-
In contrast to DRAGON, the output format in UNICORN has been updated:
- Instead of BIO-tagging tokens, participants must output the original text with tagged entities replaced inline (e.g.,
De patiënt is <AGE> jaar oud...
). - Public few-shot examples have been updated accordingly.
- Instead of BIO-tagging tokens, participants must output the original text with tagged entities replaced inline (e.g.,