OCR-Auto
50要素タイプでbbox精度100%——10回のプロンプトイテレーションで65%から92%へ
100%
Bbox Accuracy
50
Element Types
92%
Label Accuracy
19
Prompt Versions
Most annotation tools require human labelers to classify each element manually. OCR-Auto replaces that entire workflow with an async 4-stage pipeline powered by Qwen VL models. The system identifies 50 distinct element types across code languages, interaction formats, content elements, and edge cases—from hyperlinks to multi-column layouts to watermarks.
The engineering challenge was reliability at scale: three-layer fault tolerance (exponential backoff retry, EWMA-adaptive rate limiting, circuit breaker), SHA256 content-addressed caching for deterministic results, and checkpoint recovery for crash resilience. Prompt engineering iterated from V1.0 (65% accuracy) through 10+ versions to V3.9 (92%), with V2.0 achieving 100% bbox accuracy on validation sets.