Box It to Bind It: Unified Layout Control and Attribute Binding in Text-to-Image Diffusion Models

Ashkan Taghipour; Morteza Ghahremani; Mohammed Bennamoun; Aref Miri Rekavandi; Hamid Laga; Farid Boussaid

doi:10.1109/TMM.2025.3607759

Back

Box It to Bind It: Unified Layout Control and Attribute Binding in Text-to-Image Diffusion Models

Journal article

Peer reviewed

Box It to Bind It: Unified Layout Control and Attribute Binding in Text-to-Image Diffusion Models

Ashkan Taghipour, Morteza Ghahremani, Mohammed Bennamoun, Aref Miri Rekavandi, Hamid Laga and Farid Boussaid

IEEE transactions on multimedia, Vol.27, pp.8393-8407

2025

DOI: https://doi.org/10.1109/TMM.2025.3607759

Abstract

attribute binding

Australia

Diffusion models

Diffusion processes

Electronic mail

Image color analysis

Image synthesis

Layout

layout guidance

Semantics

Text to image

text-to-image generation

Training

training-free

While latent diffusion models (LDMs) excel at creating imaginative images, they often lack precision in semantic fidelity and spatial control over where objects are generated. To address these deficiencies, we introduce the Box-it-to-Bind-it (B2B) module-a novel, training-free approach for improving spatial control and semantic accuracy in text-to-image (T2I) diffusion models. B2B targets three key challenges in T2I: catastrophic neglect, attribute binding, and layout guidance. The process encompasses two main steps: (i) Object generation , which adjusts the latent encoding to guarantee object generation and directs it within specified bounding boxes, and (ii) Attribute binding , ensuring that generated objects adhere to their specified attributes in the prompt. B2B is designed as a compatible plug-and-play module for existing T2I models like Stable Diffusion and Gligen, markedly enhancing models' performance in addressing these key challenges. We assess our technique on the well-established CompBench and TIFA score benchmarks, and HRS dataset where B2B not only surpasses methods specialized in either attribute binding or layout guidance but also uniquely excels by integrating these capabilities to deliver enhanced overall performance.

Details

Title: Box It to Bind It: Unified Layout Control and Attribute Binding in Text-to-Image Diffusion Models
Authors/Creators: Ashkan Taghipour - The University of Western Australia
Morteza Ghahremani - Technical University of Munich
Mohammed Bennamoun - The University of Western Australia
Aref Miri Rekavandi - The University of Western Australia
Hamid Laga - Murdoch University, Centre for Biosecurity and One Health
Farid Boussaid - The University of Western Australia
Publication Details: IEEE transactions on multimedia, Vol.27, pp.8393-8407
Publisher: IEEE
Number of pages: 15
Identifiers: 991005814048007891
Murdoch Affiliation: Centre for Biosecurity and One Health; School of Information Technology; Centre for Healthy Ageing
Language: English
Resource Type: Journal article

Metrics

11 Record Views