Task overview

The 11 vision-only tasks span both pathology and radiology domains. The tasks are divided into various task types:

  • Classification
    • T1: Classifying HE prostate biopsies into ISUP scores This task aims to classify prostate biopsies and resections into one of the International Society of Urological Pathology (ISUP) grade groups.
    • T2: Classifying lung nodule malignancy in CT The main goal is to predict a risk score for a lung nodule candidate, indicating either a low or high risk of malignancy.
  • Regression
    • T3: Predicting the time to biochemical recurrence in HE prostatectomies This task aims to estimate the biochemical recurrence risk of patients undergoing radical prostatectomy surgery.
    • T4: Predicting slide-level tumor proportion score in NSCLC IHC-stained WSI This task aims to assess the Tumor Proportion Score, which is computed as the total amount of PD-L1-positive tumor cells divided by the total number of tumor cells in the histology slide, resulting in a number between 0 and 100%.
  • Detection
    • T5: Cell detection of signet ring cells in HE-stained WSI of gastric cancer This task aims to accurately predict the (x,y) coordinates of signet ring cells in WSIs of gastric cancer.
    • T6: Detecting clinically significant cancer in prostate MRI exams The main goal of this task is the 3D detection of clinically significant cancerous lesions (ISUP score greater than or equal to 2).
    • T7: Detecting lung nodules in thoracic CT The main goal of this task is to accurately detect pulmonary nodules in both clinical routine chest CT scans and screening settings.
    • T8: Cell detection of mitotic figures in breast cancer HE-stained WSIs The main goal of this task is to accurately predict the (x,y) coordinates of each mitotic figure in a whole-slide image.
  • Segmentation
    • T9: Segmenting ROIs in breast cancer HE-stained WSIs This task aims to segment tumor and stroma tissue in breast cancer histopathological images.
    • T10: Segmenting lesions within ROIs in CT The goal of this task is to segment 3D masks of lesions in CT scans.
    • T11: Segmenting three anatomical structures in lumbar spine MRI This task aims to segment vertebrae, intervertebral discs, and spinal canal in lumbar MRI.

Note: Additional tasks may be introduced in the future to further assess model generalizability across new and evolving medical imaging needs.

slide level tasks For pathology tasks T1, T3, and T4, the evaluation is done at the slide level. Slide-level features can be used to make predictions, or a (patch) feature aggregation method or other techniques can be developed for slide-level predictions.

Submission to the UNICORN leaderboard

Submissions to the UNICORN challenge for vision tasks follow a two-step process designed to assess how foundation models can be applied to diverse medical imaging tasks. Participants submit a Docker container that generates features from the input images, and during submission, they specify the aggregation method they wish to use in the second step to make predictions across various task types, including segmentation, classification, regression, and detection.

Step 1: Feature Extraction (Encoder) Participants first submit a Docker container with their pre-trained vision model, which serves as the encoder. This encoder processes whole-slide images in the evaluation dataset by extracting patches and encoding features that capture relevant visual information for downstream tasks.

Step 2: Aggregation and Prediction (Decoder) Once the feature extraction container is submitted, the evaluation process is automatically triggered. During submission, participants specify the name of the evaluation method they want to use from the UNICORN publicly available evaluation Docker repository on GitHub. This repository includes default lightweight aggregation methods that leverage few-shot samples to generate predictions. Additionally, participants can submit custom aggregation code through a pull request. After review, approved custom methods will be merged into the repository and can be selected as preferred aggregation techniques. Given the challenge's emphasis on the quality and generalizability of foundation models, only lightweight aggregators are accepted, and training with more complex models or large-scale methods is not permitted.