Structure-aware multi-task learning with domain generalization for robust vertebrae analysis in spinal CT

By
Jianyang Du
Heng’an Ge
Rui Zhang
Zhenghan Chen
Yuxin Zhang
Yuqi Bai
Honghao Xu
Feng Ding
Yongchao Zhang
Juan Ye
Yihang Yang
Shaoshan Hu
Jingbiao Huang
January 10, 2026
0 min

Npj Digital Medicine

Overview

VertebraFormer, a novel multi-task learning framework, enhances vertebra segmentation, numbering, and lesion localization in spinal CT imaging with superior accuracy and robustness across diverse clinical domains. Supported by the MultiSpine benchmark, this approach integrates a Transformer encoder and dynamic modulation to adapt to heterogeneous imaging data.

Background

Accurate spinal image analysis is essential for diagnosing musculoskeletal and neurological disorders. Traditional vertebra segmentation methods often lack generalizability across different clinical imaging domains and do not comprehensively address related tasks such as vertebra identification and lesion localization. Multi-task learning frameworks that unify these tasks and adapt to domain variability can improve clinical utility. VertebraFormer addresses these challenges by leveraging advanced deep learning techniques and a curated heterogeneous dataset.

Data Highlights

The MultiSpine benchmark comprises CT volumes from four datasets, including public sources (CTSpine1K, SpineWeb, VerSe 2020) and private institutional cohorts, annotated with vertebra segmentation masks, anatomical labels, and pathology regions. VertebraFormer was evaluated on three tasks—vertebra segmentation, vertebra numbering, and lesion localization—under both in-domain and cross-domain conditions, demonstrating superior performance compared to competitive baselines. Ablation and perturbation analyses confirmed the framework's robustness and efficiency.

Key Findings

VertebraFormer integrates a Transformer encoder with task-specific decoders and a dynamic modulation unit to adapt feature representations across imaging domains.
It achieves improved accuracy and robustness in vertebra segmentation, numbering, and lesion localization compared to existing methods.
The MultiSpine benchmark provides a heterogeneous, multi-source dataset enabling comprehensive evaluation of spinal CT analysis methods.
Cross-domain evaluations demonstrate VertebraFormer's strong generalizability to diverse clinical imaging settings.
Ablation studies validate the contribution of each framework component to overall performance and efficiency.

Clinical Implications

The VertebraFormer framework offers a clinically applicable tool for comprehensive spinal CT analysis, potentially improving diagnostic accuracy and workflow efficiency. Its domain-generalized design supports deployment across varied clinical environments without loss of performance, facilitating broader adoption in musculoskeletal and neurological disorder management.

Conclusion

VertebraFormer represents a significant advancement in spinal CT image analysis by unifying multiple vertebra-related tasks within a domain-adaptive framework, validated on a diverse benchmark. This approach paves the way for more robust and generalizable clinical applications in spine imaging.

References

Qu et al. 2022 -- Current development and prospects of deep learning in spine image analysis: a literature review
Simion et al. 2024 -- Bone density of the cervical, thoracic and lumbar spine measured using Hounsfield units of computed tomography
Lessmann et al. 2019 -- Iterative fully convolutional neural networks for automatic vertebra segmentation and identification
Sekuboyina et al. 2021 -- Verse: a vertebrae labelling and segmentation benchmark for multi-detector CT images
Cheng et al. 2021 -- Automatic vertebrae localization and segmentation in CT with a two-stage dense-U-Net
Isensee et al. 2021 -- nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation