Box It to Bind It: Unified Layout Control and Attribute Binding in T2I Diffusion Models

Ashkan Taghipour; Morteza Ghahremani; Mohammed Bennamoun; Aref Miri Rekavandi; Hamid Laga; Farid Boussaid

doi:10.48550/arxiv.2402.17910

Back

Box It to Bind It: Unified Layout Control and Attribute Binding in T2I Diffusion Models

Preprint

Open access

Box It to Bind It: Unified Layout Control and Attribute Binding in T2I Diffusion Models

Ashkan Taghipour, Morteza Ghahremani, Mohammed Bennamoun, Aref Miri Rekavandi, Hamid Laga and Farid Boussaid

ArXiv.org

Cornell University

2024

DOI: https://doi.org/10.48550/arxiv.2402.17910

Files and links (1)

pdf

Pre-Proof8.45 MBDownload View

Published (Version of Record)CC BY-NC-ND V4.0, Open Access

Abstract

Computer Science - Computer Vision and Pattern Recognition

While latent diffusion models (LDMs) excel at creating imaginative images, they often lack precision in semantic fidelity and spatial control over where objects are generated. To address these deficiencies, we introduce the Box-it-to-Bind-it (B2B) module - a novel, training-free approach for improving spatial control and semantic accuracy in text-to-image (T2I) diffusion models. B2B targets three key challenges in T2I: catastrophic neglect, attribute binding, and layout guidance. The process encompasses two main steps: i) Object generation, which adjusts the latent encoding to guarantee object generation and directs it within specified bounding boxes, and ii) attribute binding, guaranteeing that generated objects adhere to their specified attributes in the prompt. B2B is designed as a compatible plug-and-play module for existing T2I models, markedly enhancing model performance in addressing the key challenges. We evaluate our technique using the established CompBench and TIFA score benchmarks, demonstrating significant performance improvements compared to existing methods. The source code will be made publicly available at https://github.com/nextaistudio/BoxIt2BindIt.

Details

Title: Box It to Bind It: Unified Layout Control and Attribute Binding in T2I Diffusion Models
Authors/Creators: Ashkan Taghipour
Morteza Ghahremani
Mohammed Bennamoun
Aref Miri Rekavandi
Hamid Laga
Farid Boussaid
Publication Details: ArXiv.org
Publisher: Cornell University; Australia
Number of pages: 13
Identifiers: 991005642270107891
Murdoch Affiliation: Centre for Biosecurity and One Health; School of Information Technology; Centre for Healthy Ageing
Language: English
Resource Type: Preprint

Metrics

21 File views/ downloads

98 Record Views