T2V Hook Relabeling Pipeline

用更强的 AI 修正更弱 AI 的标注错误 — 以人工审核无法企及的成本

~100x cost vs. human labeling

Label Dimensions

Iterative Rounds

100x

Cost vs. Human

EWMA

Adaptive Rate Limit

Loading diagram...

The AI-generated video content industry has a dirty secret: the training data that teaches models about human demographics, scenes, and seasons is itself labeled by humans—and human labelers make systematic mistakes that compound into model bias. By 2025, synthetic avatar video generation is a $500M+ market (Sora, Kling, Pika, HeyGen), and every model needs demographic-accurate training labels.

The meta-labeling insight: a more capable frontier model (Gemini 2.5 Flash) can audit and correct the output of cheaper, less accurate labeling rounds. Human labeling costs $0.50–2.00 per video. Our Gemini pipeline costs ~$0.01 per video at scale—a 50–200x cost reduction while improving label accuracy. The economic math is undeniable once you're beyond 10,000 videos.

The engineering challenge was rate-limit adaptation at scale. Gemini's API has token-per-minute quotas that change dynamically based on request patterns. Static throttling either leaves throughput on the table or causes cascading 429s. We implemented EWMA (Exponentially Weighted Moving Average) rate tracking that learns the effective TPM from the last N requests and automatically adjusts concurrency—no manual tuning, no token waste.

Gemini's Files API lets the model receive a video URL and stream the content directly without local download. The 60MB guard rail prevents timeout failures on oversized files. After 3 full-run rounds with incremental merge—preserving high-confidence labels from earlier rounds, re-examining only the uncertain ones—the label dataset quality converges.

Python

Gemini 2.5 Flash

Async Pipeline

Video AI

PythonGemini 2.5 FlashAsyncVideo AI