Open Source OCR Tesseract installation on Ubuntu and use of it

First of all you must have command line expertise to use this open source OCR software

At the beginning we are going to install Tesseract on Ubuntu

Open your terminal and write the following command

root@nur-HP:~#apt-get install tesseract-ocr

It will install OCR on your Ubuntu Operating System. Then install your desire language packages. Remember you do not to install English language package because it already installed with tesseract installation.

Here, I going to install Bangla language package

apt-get install tesseract-ocr-[lang]

root@nur-HP:~#apt-get install tesseract-ocr-ben    (This command will install Bangla language package)

If you like to install All language packages, try the following command

root@nur-HP:~#apt-get install tesseract-ocr-all

Our installation has completed. Now we are going to use it

tesseract [image_path] [file_name]

sample command:

root@nur-HP:~#tesseract /home/nurahammad/Dropbox/ForOCR/IMG_20171201_161244.jpg /home/nurahammad/Desktop/test

If you like to see the result on terminal, try below command

tesseract [image_path] stdout

root@nur-HP:~# tesseract /home/nurahammad/Dropbox/ForOCR/IMG_20171201_161244.jpg stdout

I think it will help you for processing your Repository/Digital Library files

 

About Nur Ahammad

I am a Library Technology professional
This entry was posted in Digital Library, Uncategorized. Bookmark the permalink.

Leave a comment