AnatomiX: Anatomy-Aware Grounded Multimodal LLM for Chest X-Ray Interpretation

Anees Hashmi, Numan Saeed, Christoph Lippert

Hasso Plattner Institute, Germany | CVPR 2026 - Findings

AnatomiX teaser illustration

Abstract

Multimodal medical LLMs have shown substantial progress in chest X-ray interpretation but continue to face challenges in spatial reasoning and anatomical understanding. AnatomiX introduces a two-stage approach: first, identifying anatomical structures and features; second, using a language model to perform downstream tasks such as phrase grounding, report generation, visual question answering, and image understanding. Extensive experiments demonstrate >25% improvement in anatomy grounding and grounded tasks compared to existing approaches.

Key Contributions

Method

Anatomy Perception Module (APM) architecture

Fig 1: Anatomy Perception Module (APM) architecture. Encoder outputs image embeddings, decoder and feature module output bounding boxes and anatomical tokens. Vector database used for contrastive retrieval during inference.

Results

Comparison between AnatomiX and RadVLM in anatomy understanding

Fig 2: AnatomiX vs RadVLM in anatomy understanding. Red = model output, Green = ground truth. AnatomiX shows superior anatomical recognition, including flipped images.

Model NLG Metrics (GD / GC) Clinical Metrics (GD / GC) Phrase Grounding Anatomy Grounding
BERTScore ROUGE METEOR RadGraph-F1 CheXbert-14-F1 IoU mAP IoU mAP
MAIRA-2 0.01 / 0.08 0.01 / 0.06 0.01 / 0.04 0.00 / 0.02 0.03 / 0.02 0.32 0.24 0.35 0.24
RadVLM 0.15 / 0.27 0.06 / 0.11 0.05 / 0.07 0.00 / 0.12 0.32 / 0.40 0.39 0.30 0.60 0.49
CheXagent 0.49 / 0.56 0.43 / 0.44 0.29 / 0.37 0.40 / 0.39 0.40 / 0.61 0.33 0.24 0.18 0.09
AnatomiX (ours) 0.63 / 0.65 0.60 / 0.56 0.42 / 0.48 0.58 / 0.50 0.54 / 0.78 0.46 0.35 0.73 0.66

Table 1: Performance on four grounding tasks. GD = Grounded Diagnosis, GC = Grounded Captioning.

BibTeX

@article{hashmi2026anatomix,
  title={AnatomiX, an Anatomy-Aware Grounded Multimodal Large Language Model for Chest X-Ray Interpretation},
  author={Hashmi, Anees Ur Rehman and Saeed, Numan and Lippert, Christoph},
  journal={arXiv preprint arXiv:2601.03191},
  year={2026}
}