Smart OCR solution using Xilinx Ultrascale+ and Vitis AI
The rich, precise high-level semantics embodied in the text helps understand the world around us and build autonomous-capable solutions that can be deployed in a live environment. Therefore, automatic text reading from natural environments, also known as scene text detection/recognition or PhotoOCR, has become an increasingly popular and important research topic in computer vision.
As the written form of human languages evolved, we developed thousands of unique font-families. When we add case (capitals/lower case/uni-case/small caps), skew (italic/roman), proportion (horizontal scale), weight, size-specific (display/text), swash, and serifization (serif/sans in super-families), the number grows in millions, and it makes text identification an exciting discipline for Machine Learning.
Xilinx as a choice for OCR solutions
Today, Xilinx powers 7 out of 10 new developments through its wide variety of powerful platforms and leads the FPGA-based system design trends. MosChip chose Xilinx for implementing this solution because of the integrated Vitis™ AI stack and strong hardware capabilities.
Xilinx Vitis™ is a free and open-source development platform that packages hardware modules as software-callable functions and is compatible with standard development environments, tools, and open-source libraries. It automatically adapts software and algorithms to Xilinx hardware without the need for VHDL or Verilog expertise.
Selecting the right Xilinx Platform
The comprehensive and rich Xilinx toolset and ecosystem make prototyping a very predictable process expedites the development of the solutions to reduce overall development time by up to 70%.
MosChip chose Xilinx Ultrascale+ platform as it offers the best of application processing and FPGA acceleration capabilities. It also provides impressive high-level synthesis capability resulting in 5x system-level performance per watt compared to earlier variants. It supports Xilinx Vitis AI that offers a wide range of capabilities to build AI inferencing using acceleration libraries.
MosChip used Xilinx Vitis AI stack and acceleration utilizing the software to create a hybrid application and implemented LSTM functionality for effective sequence prediction by porting/migrating TensorFlow-lite to ARM. It is running on Processing Side (PS) using the N2Cube Software. Image pre- and post-processing was achieved using HLS through Vivado and Vitis was used for inferencing using CTPN (Connectionist Text Proposal Network). We eventually graduated the solution to real-time scene text detection with video pipeline and improved the model with a robust dataset.
Scene Text Detection
There are many implementations available, and new ones are being researched. Still, a series of grand challenges may still be encountered when detecting and recognizing text in the wild. The difficulties in natural scene mainly stem from three differences when compared to scripts in documents:
- Diversity and Variability are arising from languages, colors, fonts, sizes, orientations, etc.
- Vibrant background on which text is written
- The aspect ratios and layouts of scene text may vary significantly
This type of solution has extensive applicability in various fields requiring real-time text detection on a video stream with higher accuracy and quick recognition. Few of these application areas are:
- Parking validation — Cities and towns are using mobile OCR to validate if cars are parked according to city regulations automatically. Parking inspectors can use a mobile device with OCR to scan license plates of vehicles and check with an online database to see if they are permitted to park.
- Mobile document scanning — A variety of mobile applications allow users to take a photo of a document and convert it to text. This OCR task is more challenging than traditional document scanners because photos have unpredictable image angles, lighting conditions, and text quality.
- Digital asset management – The software helps organize rich media assets such as images, videos, and animations. A key aspect of DAM systems is the search-ability of rich media. By running OCR on uploaded images and video frames, DAM can make rich media searchable and enrich it with meaningful tags.
MosChip team has been working on Xilinx FPGA based solutions that require design and software framework implementation. Our vast experience with Xilinx and understanding of intricacies ensured we took this solution from conceptualization to proof-of-concept within 4 weeks. Using our end-to-end solution building expertise, you can visualize your ideas with the fastest concept realization service on Xilinx Platforms and achieve greatly reduced time-to-market.
About MosChip:
MosChip has 20+ years of experience in Semiconductor, Embedded Systems & Software Design, and Product Engineering services with the strength of 1300+ engineers.
Established in 1999, MosChip has development centers in Hyderabad, Bangalore, Pune, and Ahmedabad (India) and a branch office in Santa Clara, USA. Our embedded expertise involves platform enablement (FPGA/ ASIC/ SoC/ processors), firmware and driver development, BSP and board bring-up, OS porting, middleware integration, product re-engineering and sustenance, device and embedded testing, test automation, IoT, AIML solution design and more. Our semiconductor offerings involve silicon design, verification, validation, and turnkey ASIC services. We are also a TSMC DCA (Design Center Alliance) Partner.
Stay current with the latest MosChip updates via LinkedIn, Twitter, FaceBook, Instagram, and YouTube