Task overview¶
The UNICORN Challenge currently includes one vision-language task: T20 generating a caption from a WSI.
In this task, participants are asked to generate a descriptive caption from a whole-slide pathology image (WSI) using a pre-trained vision-language model. This task is designed to test the model's ability to interpret visual data and summarize clinical insights, enabling potential applications in automated diagnostics and pathology report generation. The model should be able to understand relevant visual features of the slide, such as tissue type and any prominent pathological findings, and provide a clear and concise textual description that resembles the conclusion of a pathology report.
Note This is a zero-shot task. No few-shot examples are provided on platform.
Submission to the UNICORN leaderboard¶
Algorithm Docker: Participants must submit a single Docker container that includes a pre-trained vision-language model capable of processing WSIs and generating captions. The Docker container does not have internet access, so all necessary dependencies must be included within the image to ensure full offline execution. The foundation model should process the data in a fully automated manner to deliver predicted outcomes.
Output: The submitted Docker container should produce a JSON file containing a generated caption for each WSI. The required output interface for this task is NLP Predictions Dataset
. To learn more about GC input/output interfaces, visit https://grandchallenge.org/components/interfaces/algorithms/.
Evaluation: Evaluation is handled through a separate evaluation Docker container provided by the organizers. This container includes the official evaluation metrics and should not be modified by participants. It takes the predictions generated by the algorithm container as input and computes task-specific metrics accordingly.