Reg-DPO: A New Framework for Stable and High-Quality Video Generation

TLDR: Reg-DPO is a new framework for improving video generation quality. It introduces GT-Pair for automatic, high-quality preference data creation without manual annotation. It also uses SFT regularization to stabilize DPO training and incorporates multiple memory optimization techniques, enabling efficient training of large video models. Experiments show it consistently produces superior video quality for both image-to-video and text-to-video tasks.

Video generation, a rapidly evolving field in artificial intelligence, faces significant hurdles when it comes to producing high-quality, realistic, and stable video content. While Direct Preference Optimization (DPO) has emerged as a promising technique to enhance video quality, its application to complex video tasks, especially with large-scale models, has been limited by challenges in data construction, training stability, and substantial memory consumption.

Researchers from ByteDance and Shanghai Jiao Tong University have introduced a novel framework called Reg-DPO, which aims to overcome these limitations. Their work, detailed in the paper Reg-DPO: SFT-Regularized Direct Preference Optimization with GT-Pair for Improving Video Generation, presents a systematic approach to make DPO more efficient and effective for video generation.

Addressing Data Challenges with GT-Pair

One of the primary obstacles in DPO training for video generation is the high cost and difficulty of creating high-quality preference data. Traditional methods often rely on human annotation or complex automatic evaluators, which are expensive and time-consuming, especially for videos. To tackle this, the team developed the GT-Pair strategy.

GT-Pair automatically constructs high-quality preference pairs by using real, “ground-truth” videos as positive examples and videos generated by the model itself as negative examples. This innovative approach eliminates the need for any external human or automated annotation, making data construction significantly more efficient and scalable. The real videos inherently possess superior visual quality, temporal consistency, and semantic completeness compared to generated ones, creating a clear distinction for the model to learn from. This method ensures high data quality, low cost, and strong discriminability, leading to more effective training.

Enhancing Training Stability with Reg-DPO

Standard DPO, while powerful, can suffer from intrinsic instability during training. It primarily focuses on the relative difference between preferred and non-preferred samples, without directly supervising the overall distribution of generated samples. This can lead to rapid convergence, pronounced distribution shifts, and even model collapse, where the generated videos become blurry or contain artifacts.

To mitigate this, the researchers introduced Reg-DPO, which incorporates a Supervised Fine-Tuning (SFT) loss as a regularization term into the DPO objective. This SFT regularization provides an explicit constraint on positive samples, ensuring that the model consistently moves towards generating high-quality outputs. By dynamically weighting this regularization term, Reg-DPO balances preference learning with maintaining distribution consistency, leading to significantly enhanced training stability and improved generation fidelity. Experiments showed that Reg-DPO prevents the performance degradation and visual artifacts seen in vanilla DPO, producing consistently clearer and higher-quality videos.

Optimizing Memory for Large Models

Training large video generation models (often exceeding 10 billion parameters) with DPO presents immense memory challenges. Video inputs are multi-frame, and DPO requires a frozen reference model alongside the trainable one, leading to extremely high GPU memory usage and frequent “out-of-memory” errors.

The team implemented a comprehensive memory optimization scheme. This combines the Fully Sharded Data Parallel (FSDP) framework with several advanced techniques: Flash Attention for efficient attention computation, Context Parallelism for sequence-dimension parallelization, a fully parallelized pair computation strategy, prompt pre-encoding to reduce runtime overhead, model offloading for frozen modules, and refined computational graph and memory reclamation optimizations. This systematic approach achieved nearly three times higher effective training capacity compared to using FSDP alone, enabling stable training of ultra-large video models with high-resolution and long-sequence videos.

Also Read:

Superior Performance Across Video Tasks

Extensive experiments were conducted on both Image-to-Video (I2V) and Text-to-Video (T2V) tasks across multiple datasets. The results consistently demonstrated that Reg-DPO, combined with the GT-Pair data construction, significantly outperforms existing approaches. Evaluations using both human assessments (GSB) and automated metrics (VBench) confirmed superior video generation quality, better prompt adherence, enhanced visual consistency, reduced micro-motion videos, improved generation stability, and greater physical plausibility.

In conclusion, Reg-DPO offers a robust and efficient framework for advancing video generation. By innovatively addressing data construction, algorithmic stability, and memory optimization, this research paves the way for creating more realistic, stable, and high-quality video content with large-scale generative models.

AI Leaders Intensify Battle Against Rising Cyber Threats, Focusing on Prompt Injection Vulnerabilities

Dr. Ola Adebogun Endorses President Tinubu’s Call for Prudent AI Integration

Global Generative AI Models’ Free Access Poses Challenge for Indian Developers

ArXiv Implements New Policy to Combat Influx of AI-Generated Survey Papers

ASEAN’s Path to Resilient Growth: Integrating Sustainability and Digitalization with Responsible AI

HCLTech CEO C. Vijayakumar Affirms Enduring Importance of Coders in Generative AI Landscape

AI Leaders Intensify Battle Against Rising Cyber Threats, Focusing on Prompt Injection Vulnerabilities

Dr. Ola Adebogun Endorses President Tinubu’s Call for Prudent AI Integration

Global Generative AI Models’ Free Access Poses Challenge for Indian Developers

ArXiv Implements New Policy to Combat Influx of AI-Generated Survey Papers

ASEAN’s Path to Resilient Growth: Integrating Sustainability and Digitalization with Responsible AI

HCLTech CEO C. Vijayakumar Affirms Enduring Importance of Coders in Generative AI Landscape

Reg-DPO: A New Framework for Stable and High-Quality Video Generation

Addressing Data Challenges with GT-Pair

Enhancing Training Stability with Reg-DPO

Optimizing Memory for Large Models

Superior Performance Across Video Tasks

Gen AI News and Updates

80% Faster Chips: Why Manufacturing and Auto Leaders Must Act Now on AI Hardware Acceleration

Generative AI Pioneers Personalized Learning in Medical Education

University of Melbourne and Emeritus Launch Advanced Generative AI and Machine Learning Program for Indian Professionals

Enhancing Disaster Assessment with Multimodal Data Augmentation

Interactive AI Dance: Crafting Responsive Movement Partners with Diffusion Models

New Research Confirms Sorting by Strip Swaps is NP-Hard

Advancing Coral Health Monitoring with Deep Learning

AI Innovation and Human Rights: A Call for Responsible Governance

InfoAug: A New Approach to Positive Sample Selection in Contrastive Learning

Guiding LLMs to Safer Responses with Feature Steering

When AI Unlearns: The Unexpected Loss of Benign Knowledge

Adaptive Computing for PDE Solvers: Introducing Skip-Block Routing for Efficient Neural Operators

Guiding Robots with Language: How STRIDER Improves Navigation in Unseen Spaces

Bridging the Data Gap: Semi-Supervised Preference Optimization for Smarter Language Models

BiBo: Empowering Humanoid Agents with Off-the-Shelf AI Intelligence

DynBERG: A New Approach to Cryptocurrency Fraud Detection

Adaptive AI Model Enhances Multi-Horizon Weather Forecasting Precision

FLoRA Adapters: Enhancing LLM Fine-Tuning and Speed

Unlocking PEFT Performance: New Insights into Weight Conditioning and Singular Value Entropy

Subscribe to get the latest news and updates