LUVE : Latent-Cascaded Ultra-High-Resolution Video Generation with Dual Frequency Experts

¹Nanjing University, ²Meituan, ³Nanyang Technological University

^*Equal contribution, ^† Corresponding author, ^‡ Project leader

Abstract

Recent advances in video diffusion models have significantly improved visual quality, yet ultra-high-resolution (UHR) video generation remains a formidable challenge due to the compounded difficulties of motion modeling, semantic planning, and detail synthesis. To address these limitations, we propose LUVE, a Latent-cascaded UHR Video generation framework built upon dual frequency Experts. LUVE employs a three-stage architecture comprising low-resolution motion generation for motion-consistent latent synthesis, video latent upsampling that performs resolution upsampling directly in the latent space to mitigate memory and computational overhead, and high-resolution content refinement that integrates low-frequency and high-frequency experts to jointly enhance semantic coherence and fine-grained detail generation. Extensive experiments demonstrate that our LUVE achieves superior photorealism and content fidelity in UHR video generation, and comprehensive ablation studies further validate the effectiveness of each component.

2K Visual Results

4K Visual Results

BibTeX

@misc{zhao2026luvelatentcascadedultrahighresolution, title={LUVE : Latent-Cascaded Ultra-High-Resolution Video Generation with Dual Frequency Experts}, author={Chen Zhao and Jiawei Chen and Hongyu Li and Zhuoliang Kang and Shilin Lu and Xiaoming Wei and Kai Zhang and Jian Yang and Ying Tai}, year={2026}, eprint={2602.11564}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2602.11564}, }

LUVE

LUVE : Latent-Cascaded Ultra-High-Resolution Video Generation with Dual Frequency Experts

LUVE Demo

Abstract

Challenge

Method

Comparison with SOTA

2K Visual Results

4K Visual Results

BibTeX