Text-image alignment for ILD imaging: linking CXR evidence to CT quantification - Report - MDSpire

Text-image alignment for ILD imaging: linking CXR evidence to CT quantification

  • By

  • Jiani Gao

  • Yijiu Ren

  • Fengjing Yang

  • Xuefei Hu

  • Changbo Sun

  • Sihua Wang

  • Chang Chen

  • February 4, 2026

  • 0 min

Share

Integrating Chest X-Ray and CT for Quantitative ILD Assessment

Overview

This report presents ARCTIC-ILD, a novel multimodal framework integrating chest X-ray evidence with CT quantitative analysis for interstitial lung disease (ILD). The system enhances report generation accuracy by coupling visual evidence from CXR with controlled language models and audited CT segmentation, addressing key challenges in ILD imaging interpretation.

Background

Interstitial lung disease is characterized by progressive fibrosis leading to respiratory failure, with high-resolution CT (HRCT) serving as the gold standard for detailed assessment. Chest radiography (CXR) is more accessible and frequently used for initial screening and follow-up but lacks the detailed quantification of HRCT. Recent advances in vision-language models (VLMs) and large language models (LLMs) have improved radiology report generation, yet challenges remain in ensuring factual consistency, spatial localization, and reproducibility. Integrating CXR findings with CT quantitative analysis through a controllable, auditable framework can improve clinical decision-making in ILD.

Data Highlights

ARCTIC-ILD incorporates a multi-label evidence head trained on CheXbert’s 14 observations to identify four key ILD findings with calibrated probability outputs. The system uses BioViL-T for image encoding and an instruction-tuned text generator constrained by visual evidence. A contrastive image-text matching head ensures cross-modal consistency, while the Terminology2Mask Module enables text-driven CT segmentation with improved spatial coherence. These components collectively address limitations such as prompt drift, semantic spatial disconnect, and reproducibility in ILD imaging interpretation.

Key Findings

  • Chest X-ray evidence is extracted using a multi-label evidence head supervised on standardized ILD observations, improving early detection and report accuracy.
  • Controlled report generation is achieved by constraining language models with calibrated visual evidence, reducing factual inconsistencies and prompt drift.
  • Contrastive image-text matching regularizes cross-modal learning and provides sentence-level consistency scores, enhancing report reliability.
  • Text-driven CT segmentation using terminology-guided prompts and low-rank adapters addresses challenges of coarse boundaries and inter-slice discontinuity in fibrosis quantification.
  • The integrated ARCTIC-ILD framework enables auditable, reproducible, and clinically actionable ILD assessments by linking CXR findings to quantitative CT analysis.

Clinical Implications

Clinicians can leverage ARCTIC-ILD to obtain more reliable and consistent ILD reports by combining the accessibility of chest X-rays with the detailed quantification of CT scans. This approach supports earlier detection, standardized reporting, and objective monitoring of fibrosis progression, potentially improving patient management and outcomes. The auditable framework also facilitates traceability and quality assurance in radiologic interpretation.

Conclusion

ARCTIC-ILD represents a significant advancement in ILD imaging by integrating multimodal data through a controllable and auditable framework. This approach bridges the gap between accessible chest radiography and detailed CT quantification, enhancing diagnostic accuracy and clinical decision support.

References

  1. CheXbert Dataset and Evaluation Protocol
  2. BioViL-T and Instruction-Tuned Text Generation in Radiology
  3. Contrastive Image-Text Matching for Radiology Report Consistency
  4. Terminology2Mask Module for Text-Guided CT Segmentation
  5. ARCTIC-ILD Framework for Multimodal ILD Assessment

Original Source(s)

Related Content