A generalizable 3D framework and model for self-supervised learning in medical imaging

By
Tony Xu
Sepehr Hosseini
Chris Anderson
Anthony Rinaldi
Rahul G. Krishnan
Anne L. Martel
Maged Goubran
November 7, 2025
0 min

Npj Digital Medicine

At a Glance

Category	Detail
Condition	Medical imaging analysis including detection, diagnosis, and risk profiling
Key Mechanisms	Self-supervised learning (SSL) with 3D self-distillation (3DINO) and Vision Transformer (3DINO-ViT) pretrained on large multimodal 3D datasets
Target Population	Patients undergoing 3D medical imaging across multiple organs and modalities (MRI, CT, PET)
Care Setting	Clinical imaging and diagnostic workflows utilizing 3D medical imaging data

Key Highlights

3DINO-ViT is pretrained on ~100,000 unlabeled 3D medical volumes from over 10 organs and multiple modalities (MRI, CT, PET).
Combines image-level and patch-level SSL objectives to learn salient features for both segmentation and classification tasks.
Demonstrates superior performance and generalizability on multiple downstream medical imaging benchmarks compared to state-of-the-art pretrained models.

Guideline-Based Recommendations

Diagnosis

Utilize 3DINO-ViT pretrained weights to improve accuracy in 3D medical image-based diagnosis and classification tasks.
Apply the model to diverse organs and imaging modalities including MRI, CT, and PET for robust feature extraction.

Management

Incorporate 3DINO framework to reduce reliance on large labeled datasets by leveraging unlabeled 3D medical imaging data.
Use the 3D ViT-Adapter module to enhance segmentation performance by injecting spatial inductive biases.

Monitoring & Follow-up

Evaluate model performance on segmentation and classification benchmarks relevant to clinical tasks (e.g., BraTS, BTCV, LA-SEG, TDSC-ABUS).
Monitor generalizability on out-of-distribution organs and modalities to ensure robustness.

Risks

Computational demands for training 3D SSL models can be high; consider resource availability.
Potential limitations in rare disease or scarce modality data despite improved generalizability.

Patient & Prescribing Data

Patients undergoing 3D medical imaging for various clinical indications across multiple organs and modalities.

3DINO-ViT pretrained models can enhance diagnostic accuracy and segmentation quality in label-scarce settings, facilitating improved clinical decision-making.

Clinical Best Practices

Leverage large, multimodal unlabeled 3D datasets for self-supervised pretraining to improve downstream task performance.
Employ combined image-level and patch-level SSL objectives to capture comprehensive 3D anatomical context.
Use pretrained 3DINO-ViT weights as initialization for diverse medical imaging tasks to reduce training overhead and improve generalizability.

A generalizable 3D framework and model for self-supervised learning in medical imaging

Clinical Scorecard: A Versatile 3D Model and Framework for Self-Supervised Learning in Medical Imaging

At a Glance

Key Highlights

Guideline-Based Recommendations

Diagnosis

Management

Monitoring & Follow-up

Risks

Patient & Prescribing Data

Clinical Best Practices

References

Original Source(s)

A generalizable 3D framework and model for self-supervised learning in medical imaging

Related Content

Anatomically-guided Masked Autoencoder with Domain-Adaptive Prompting (AMAP) for multimodal cerebral aneurysm detection and segmentation

An end-to-end deep learning pipeline for hematoma expansion prediction in spontaneous intracerebral hemorrhage based on non-contrast computed tomography

HemaContour: explicit parametric contour learning for robust ICH segmentation on non-contrast CT