Two programs are advisable for this converting images or PDF file:
To extract raw text, you can use tesseract.
Tesseract command-line OCR engine
# ubuntu, debian, pop sudo apt update -y && sudo apt install tesseract # arch, manjaro sudo pacman -S tesseract # rhel, fedora, centos sudo yum update -y || sudo dnf update -y sudo yum install tesseract -y
To create more advanced documents, you can use gimagereader
# ubuntu, debian, pop sudo apt update -y && sudo apt install gimagereader -y # arch, manjaro sudo pacman -Syu gimagereader
RHEL, Fedora, CentOS can find the RPM here: https://fedora.pkgs.org/30/fedora-updates-x86_64/gimagereader-gtk-3.3.1-1.fc30.x86_64.rpm.html
For example, we are going to convert the following image into text, and then convert the image to text
Convert PNG, JPG to TXT in GNU/Linux
tesseract PDF-to-TXT-Document-OCR-in-Linux.png -l
This yields:
Optional Licence Elements Along with the basic rights and obligations set out in each CC licence, there are a set of “optional’ licence elements which can be added by the creator of the work. These elements allow the creator to select the different ways they want the public to use their work. The creator can mix and match the elements to produce the CC licence they
This is great for block text, but what if we want to keep the formatting?
Then we can use gimagereader (which uses tesseract too!)
Step 1: Open gimagereader-gtk or gimagereader-qt and drag in the file you want to convert.
Step 2:
Convert Image or PDF to Plaintext using gimagereader
Option 1
Change OCR mode to Plain text and then click Recognize all:
Option 2: Convert images to text in Linux keep formatting.
Now, export in your desired format!
You can even export images to PDF with text overlaying the old positioning!
You can select PDF with invisible text.
Invisible text means you can highlight it when viewing in any PDF viewer.
You can also choose a suitable font for your exported PDF.
Now you can highlight text from the original image that is now converted to a PDF file!
If you have any questions or you would like us to convert files in bulk for you, you can drop us a contact form with a dropbox link or a Google drive link and we will let you know price for time!
Let us know if you have any questions in the comments!
Sick.Codes