8.5 C
New York
Thursday, November 13, 2025

Buy now

spot_img

Unlocking Documents in 109 Languages: Baidu’s PaddleOCR-VL (0.9B) Masterfully Parses Multilingual Docs into Structured Data”

Ever wished you could effortlessly convert complex, multilingual documents—packed with dense layouts, tiny scripts, formulas, charts, and handwriting—into clean, structured Markdown or JSON? Baidu’s PaddlePaddle team has just made that wish a reality with their latest release, PaddleOCR-VL (0.9B), a vision-language model designed for end-to-end document parsing across various elements, supporting a whopping 109 languages.

How does it work, you ask?

PaddleOCR-VL operates in two stages. First, PP-DocLayoutV2 performs page-level layout analysis, identifying and categorizing regions, and predicting reading order. Then, PaddleOCR-VL-0.9B takes over, recognizing elements based on the detected layout. The final outputs are aggregated into easy-to-use Markdown and JSON formats.

At its core, PaddleOCR-VL-0.9B combines a NaViT-style (Native-resolution ViT) dynamic-resolution vision encoder with the ERNIE-4.5-0.3B language model. This innovative setup allows it to handle dense, multi-column, mixed text-graphic pages with lower latency and memory usage, making it perfect for real-world deployments.

But how does it perform?

PaddleOCR-VL achieves state-of-the-art results on OmniDocBench v1.5 and competitive or leading scores on v1.0, covering overall quality and sub-tasks. It also shows complementary strength on other benchmarks, including in-house evaluations for handwriting, tables, formulas, and charts.

Why should you care?

This release is a game-changer because it merges a NaViT-style dynamic-resolution visual encoder with the lightweight ERNIE-4.5-0.3B decoder, delivering state-of-the-art page-level document parsing and element-level recognition at practical inference costs. It supports 109 languages, including small scripts and complex page layouts, making it a powerful tool for document intelligence across the globe.

Ready to dive in?

Check out the Technical Paper, Model on Hugging Face, and Technical details. For hands-on resources, head over to the GitHub Page for Tutorials, Codes, and Notebooks. Stay updated by following them on Twitter, joining their 100k+ ML SubReddit, subscribing to their Newsletter, and even joining them on Telegram!

Related Articles

Leave A Reply

Please enter your comment!
Please enter your name here

Stay Connected

0FansLike
0FollowersFollow
0SubscribersSubscribe
- Advertisement -spot_img

Latest Articles

Follow by Email
YouTube
WhatsApp