Logo image
Box It to Bind It: Unified Layout Control and Attribute Binding in Text-to-Image Diffusion Models
Journal article   Peer reviewed

Box It to Bind It: Unified Layout Control and Attribute Binding in Text-to-Image Diffusion Models

Ashkan Taghipour, Morteza Ghahremani, Mohammed Bennamoun, Aref Miri Rekavandi, Hamid Laga and Farid Boussaid
IEEE transactions on multimedia, Early Access
2025

Abstract

attribute binding Australia Diffusion models Diffusion processes Electronic mail Image color analysis Image synthesis Layout layout guidance Semantics Text to image text-to-image generation Training training-free
While latent diffusion models (LDMs) excel at creating imaginative images, they often lack precision in semantic fidelity and spatial control over where objects are generated. To address these deficiencies, we introduce the Box-it-to-Bind-it (B2B) module-a novel, training-free approach for improving spatial control and semantic accuracy in text-to-image (T2I) diffusion models. B2B targets three key challenges in T2I: catastrophic neglect, attribute binding, and layout guidance. The process encompasses two main steps: (i) Object generation , which adjusts the latent encoding to guarantee object generation and directs it within specified bounding boxes, and (ii) Attribute binding , ensuring that generated objects adhere to their specified attributes in the prompt. B2B is designed as a compatible plug-and-play module for existing T2I models like Stable Diffusion and Gligen, markedly enhancing models' performance in addressing these key challenges. We assess our technique on the well-established CompBench and TIFA score benchmarks, and HRS dataset where B2B not only surpasses methods specialized in either attribute binding or layout guidance but also uniquely excels by integrating these capabilities to deliver enhanced overall performance.

Details

Metrics

11 Record Views
Logo image