🎨 AnyStyle: Single-Pass Multimodal Stylization for 3D Gaussian Splatting 🎨

Joanna Kaleta^1,2 Bartosz Świrta¹ Kacper Kania¹ Przemysław Spurek^3,4 Marek Kowalski⁵

¹University of Science and Technology

²Sano Centre for Computational Medicine

³Jagiellonian University

⁴IDEAS NCBR

⁵Microsoft

Paper (PDF) arXiv Code

AnyStyle teaser: any scene and any style input yields stylized 3D Gaussians in under 0.1s — AnyStyle accepts any scene and any style—image or text—and outputs stylized 3D Gaussians in a single forward pass.

Comparison of AnyStyle against prior methods across key properties — AnyStyle is the only method that is simultaneously multimodal, scene zero-shot, style zero-shot, pose-free, and fast.

Abstract

The growing demand for rapid and scalable 3D asset creation has driven interest in feed-forward 3D reconstruction methods, with 3D Gaussian Splatting (3DGS) emerging as an effective scene representation. While recent approaches have demonstrated pose-free reconstruction from unposed image collections, integrating stylization or appearance control into such pipelines remains underexplored. Existing attempts largely rely on image-based conditioning, which limits both controllability and flexibility. In this work, we introduce AnyStyle, a feed-forward 3D reconstruction and stylization framework that enables pose-free, zero-shot stylization through multimodal conditioning. Our method supports both textual and visual style inputs, allowing users to control the scene appearance using natural language descriptions or reference images. We propose a modular stylization architecture that requires only minimal architectural modifications and can be integrated into existing feed-forward 3D reconstruction backbones. Experiments demonstrate that AnyStyle improves style controllability over prior feed-forward stylization methods while preserving high-quality geometric reconstruction. A user study further confirms that AnyStyle achieves superior stylization quality compared to an existing state-of-the-art approach.

Method

AnyStyle architecture: style branch injected into AnySplat backbone via zero-conv layers

Results

Use the arrows or swipe to browse scenes. Each slide shows reconstruction alongside image- and text-prompted stylizations.

Image & Text Style Conditioning

Natural Language Prompts

Comparison with Stylos

Citation


  @article{kaleta2026anystyle,
    title     = {AnyStyle: Single-Pass Multimodal Stylization for 3D Gaussian Splatting},
    author    = {Kaleta, Joanna and Świrta, Bartosz and Kania, Kacper and Spurek, Przemysław and Kowalski, Marek},
    journal   = {arXiv preprint arXiv:2602.04043},
    year      = {2026}
  }