Optical Character Recognition or OCR is the challenge of extracting text from images and documents and has been the focus of much research for a long time. There are various open-source software tools like Python-Tesseract and other tools like Amazon’s Rekognition that can “read” the text embedded in an image. Google has its own Tesseract Project as well, which was originally developed by Hewlett-Packard in the 1980s.
So, whether it’s printed documents or the Bus-Stop signboards or handwriting in general, a well-trained OCR system should be able to extract the written content from it and keep a digital copy. This technology is increasingly being used for rapid digitization across various industries, including book scanning (the Gutenberg Project, for instance).
OCR on Seven Segment Displays
OCR systems were quite complex and expensive until a few decades ago, but with steady advancements in computer vision and deep learning, anyone can build their own OCR tools. Having said that, seven-segment displays remains a sector that hasn’t been explored in its entirety in terms of character recognition. Medical equipment, digital watches, car odometers, for instance, show the data using seven-segment displays and extracting the text from it remains a niche sector.
But if we were to go ahead with it, we will have to classify every digit separately and perform detections on images using the size and position of the bounding boxes. Moreover, object detection requires a huge amount of data to perform decently. But there’s a way out.
Since it may not be practical to source thousands of odometer images, we can always swap the digits in a given image to create new training data synthetically. This will create various permutations of a single image.
We can also use XMLTree and OpenCV to develop a data generation pipeline. This will place random numbers over previously labelled positions, thereby generating a whole new image data point. We can also add noise to the image to avoid edge-detection training.
We can implement our model in Python 3.7 since there exists a diverse set of libraries for computer vision and machine learning. OpenCV will obviously be used for feature extraction along with Numpy and XML Element Tree.
For detecting digits, we can mainly depend upon YOLO (You Only Look Once) which is an object detection paradigm. It is a strategy or an algorithm employed, just like R-CNN, and Single Shot Detector, to detect objects faster than its counterparts. YOLOv3 is the latest and the best one out of the YOLO family. We can also improve localization with images that are generated with label reshuffling. A good model should be able to localize the digits and classify them thereafter by ensembling our localization model with a convolutional neural network.
In this article, we talked about what OCR is and how it can possibly be used to extract text from seven-segment displays like car odometers. For your own usage, one can obviously use the existing tools, including the commercially available Bixby Vision and Google Lens. Again, we can use existing tools like the Tesseract or Attention OCR, but training our own model that’s designed specifically to detect digits in seven-segment displays is bound to perform better.