VNProductKIE: A Dataset and Three-Stage Pipeline for Key Product Information Recognition on Vietnamese Packaging Labels
Le-Thanh Tien, Luu-Huu Tri, Ha-Trong Nguyen, Nguyen-Huu An, Dung-Cam Quang
Food waste poses serious environmental and economic concerns, often worsened by the lack of accessible product information. Automated extraction from packaging labels offers a promising solution, yet existing datasets fall short in representing the linguistic and visual diversity found in Vietnamese markets. This paper introduces VNProductKIE, a new dataset of high-resolution images capturing Vietnamese food and beverage packaging. It features both English and Vietnamese text, diacritic-rich scripts, local date formats, and real-world distortions such as blur, curvature, and clutter. To extract structured information, this paper proposes a three-stage pipeline including: (1) a YOLOv11-based detector for locating key regions (e.g., product name, weight, brand, expiration date), (2) a word-level detector for segmenting individual words, and (3) a VietOCR-based recognizer for transcription. The final output is structured into complete product metadata. This pipeline experimented on VNProductKIE achieved a recognition accuracy of 98.85%, highlighting the effectiveness of the proposed approach.
Springer CCIS