Task overview

The 11 vision-only tasks span both pathology and radiology domains. The tasks are divided into various task types:

  • Classification
    • T1: Classifying HE prostate biopsies into ISUP scores This task aims to classify prostate biopsies and resections into one of the International Society of Urological Pathology (ISUP) grade groups.
    • T2: Classifying lung nodule malignancy in CT The main goal is to predict a risk score for a lung nodule candidate, indicating either a low or high risk of malignancy.
    • T4: Predicting slide-level tumor proportion score in NSCLC IHC-stained WSI This task aims to assess the Tumor Proportion Score, which is computed as the total amount of PD-L1-positive tumor cells divided by the total number of tumor cells in the histology slide. For the purposes of this challenge, the continuous TPS value is discretized into three categories: TPS < 1%; 1% ≤ TPS < 50%; TPS ≥ 50%.
  • Regression
    • T3: Predicting the time to biochemical recurrence in HE prostatectomies This task aims to estimate the biochemical recurrence risk of patients undergoing radical prostatectomy surgery.
  • Detection
    • T5: Cell detection of signet ring cells in HE-stained WSI of gastric cancer This task aims to accurately predict the (x,y) coordinates of signet ring cells in variable-sized ROIs extracted from whole-slide images of gastric cancer.
    • T6: Detecting clinically significant cancer in prostate MRI exams The main goal of this task is the 3D detection of clinically significant cancerous lesions (ISUP score greater than or equal to 2).
    • T7: Detecting lung nodules in thoracic CT The main goal of this task is to accurately detect pulmonary nodules in both clinical routine chest CT scans and screening settings.
    • T8: Cell detection of mitotic figures in breast cancer HE-stained WSIs The main goal of this task is to accurately predict the (x,y) coordinates of each mitotic figure in a ROI extracted from a whole-slide image.
  • Segmentation
    • T9: Segmenting ROIs in breast cancer HE-stained WSIs This task aims to segment tumor and stroma tissue in breast cancer histopathological images.
    • T10: Segmenting lesions within ROIs in CT The goal of this task is to segment 3D masks of lesions in CT scans.
    • T11: Segmenting three anatomical structures in lumbar spine MRI This task aims to segment vertebrae, intervertebral discs, and spinal canal in lumbar MRI.

Note: Additional tasks may be introduced in the future to further assess model generalizability across new and evolving medical imaging needs.

Slide-level tasks For pathology tasks T1, T3, and T4, the evaluation is done at the slide level. Slide-level features can be used to make predictions, or a (patch) feature aggregation method or other techniques can be developed for slide-level predictions.

Submission to the UNICORN leaderboard

Submissions to the UNICORN challenge for vision tasks follow a two-step pipeline designed to assess how foundation models can be applied to diverse medical imaging modalities and task types. Participants submit a Docker container that generates features from the input images, and during submission, they specify the aggregation method they wish to use in the second step to make predictions across various task types, including segmentation, classification, regression, and detection.

Step 1: Algorithm Docker (Encoder) Participants first submit a Docker container that includes their pre-trained vision foundation model, which serves as the encoder. This container is responsible for processing medical images (CT, MRI, WSI) and producing task-relevant features that can be used for downstream prediction. The encoder receives both the evaluation data and the task-specific few-shot examples and processes it in a fully automated manner to deliver descriptive features. This includes all necessary steps, including for example patch extraction. The encoded features must be saved in one of the following Grand Challenge interfaces:

  • Patch-level neural representation: for detection or segmentation tasks.
  • Image-level neural representation: for classification or regression tasks.

Each representation is saved as a list of floats, corresponding to either patch-level or slide-level features.

Step 2: Adaptation and Evaluation (Decoder) The second container, the adaptation and evaluation Docker generates the predictions for the specific downstream tasks using the FM features from Step 1.

At submission, participants select an adaptation method from the UNICORN evaluation repository. This includes default, lightweight adaptation methods that leverage few-shot samples to generate predictions. Custom adaptation methods can also be submitted via pull request. Upon approval, these will be added to the repository and made selectable for all participants. To maintain the challenge's focus on generalizability, only lightweight adapters are allowed. In this phase, the use of pretrained models or large-scale methods is not permitted. Additionally, all methods must run within the specified time limits per task (see details here).

The evaluation Docker container takes as input the features of the task-specific evaluation data and few-shot data. The few-shot features with corresponding labels might be used to guide the adaptation docker toward task-specific predictions with minimal supervision. The Docker container outputs are task-specific predictions (e.g., classification labels, regression values, cell detection coordinates, or segmentation masks) and evaluation metrics. Participants will not adapt this part of the evaluation docker.