Scan documents via mobile camera – OCR

You might have wondered about scanning a document or a some pages of a book and store them as digital documents. So here is how to do it.

You need a good quality mobile phone camera to be able to take good quality picture that can be clearly understood.

Now there are many mobile applications for Android that helps to scan images. Textfairy is an application that enhances the clicked picture and extracts text from it. The extracted text was not much reliable, but it’s OK. Another application is CamScanner. The limitation of these apps is that I couldn’t do the batch processing after clicking the pictures. I found one application i.e. Droid scan that did the batch operation. I applied it to 88 images and it took a lot much time. When I reduced the number of images to 5, then it completed within some minutes, but the output was not correct and some images were distorted.

On Linux, we may use the tesseract tool to obtain the text from an image (scanned or clicked). To install tesseract, use the command below, which will also install the data for english language for OCR.

sudo apt-get install tesseract tesseract-ocr-eng

After it got installed, try it on an image using the following command:

tesseract input.jpg output.txt -l eng

Here the input.jpg is your image from where you want to extract the text. The output.txt is the target output file that will be created with the text extracted from image. And the -l eng is for specifying the language.

Tesseract is very accurate as per my experience.
Read more on OCR at https://en.wikipedia.org/wiki/Optical_character_recognition

 

Advertisements

2 thoughts on “Scan documents via mobile camera – OCR

  1. Pingback: Extract text out of a PDF | Mandeep Singh
  2. Pingback: OCR | Davinder Kaur

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s