OCR » Market Overview

I’ve already actively used or tested two free OCR systems, od kojih je one is Microsoft’s Text Extractor utility from PowerToys, and the other is the OCR tool from the open-source ShareX package.

I have tested both on some Serbian government website, with amazing quality of fonts in screenshots of perfect image quality, written in Cyrillic. To my surprise, both performed abysmally.

Now it is obvoius why both underperformed, as they are powered by the same engine, as ShareX uses the Microsoft OCR engine locally, as evidenced by this author commend here.

Tesseract

The current stable version is major version 5, which started with the release 5.0.0 on November 30, 2021.

Classic Tesseract

Tesseract is used from the command-line and does not have a built-in GUI, but there are plenty of independent GUI tools listed here.

Installation on native Windows is straightforward and installs the regularly updated build from UB Mannheim:

scoop install tesseract tesseract-languages

GUI Tesseract

As I am searching for the best tools, I will only mention those that are somewhat interesting.

dpScreenOCR

Thus, I became interested in dpScreenOCR, which uses Tesseract, as its results were significantly better. I also downloaded a “larger” model from here, but my results were identical to those I obtained before.

The installation is completely manual, but the tool is not that bad.

AI OCR - ludilo Check this one: JaidedAI/EasyOCR: Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.. Ima demo: Jaided AI: EasyOCR demo ali je ludilo šta ovaj ume da uradi. Nestvarno!

OCR in PDFs

Tools that can iterate through all PDFs on a disk and invisibly insert an OCR-generated text layer into the original PDF file, creating a searchable PDF. Both tools work on Windows, but better on WSL.

OCRmyPDF

OCRmyPDF on Windows

ocrmypdf/OCRmyPDF is a more polished tool, and although it works on native Windows, it can be installed within WSL with just one line: apt install ocrmypdf.

You also need to add languages, which you can obtain with the command: apt-cache search tesseract-ocr. So, install what I need:

apt install tesseract-ocr-srp tesseract-ocr-srp-latn tesseract-ocr-eng tesseract-ocr-deu

Although it can do a lot with the PDF, its basic use is:

ocrmypdf -l eng+deu+srp+srp_latn input_scanned.pdf output_searchable.pdf

or better specify exact language like:

ocrmypdf -l srp_latn input_scanned.pdf output_searchable.pdf

Choosing the languages is optional, but it’s said to speed up and improve recognition quality.

I’ve tested it and I’m amazed at how well it performs.

OCRmyPDF on macOS

ocrmypdf/OCRmyPDF: OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched je savršen i instalira se sa brew install ocrmypdf dok za instalaciju svih Tesseract languages koristi brew install tesseract-lang.

Tipična upotreba je --skip-text to skip pages already with text.

ocrmypdf -l eng+srp_latn+srp --skip-text file.pdf file.pdf

Tesseract OCR engine has no ability to detect the language when it is unknown

pdf2pdfocr

pdf2pdfocr can also use an incredibly good CuneiForm engine and has even a “relatively” easy installation on native Windows with the help of Scoop, as explained in the install_windows.txt file.

Interesting Projects and Libraries

Project Naptha and Other Magic

Project Naptha used to be science fiction until a few years ago. It’s almost unbelievable that there’s an excellent pure Javascript OCR, tesseract.js, completely complete for all languages, working on both the server and client, completely well and easily. There’s a demo on the website and of course the legendary Chrome extension.

PHP Library

An interesting one is tesseract-ocr-for-php, a PHP wrapper library.

Not-Tesseract Open-Source OCR

The same Project Naptha author also evaluated the only two open-source alternatives other than Tesseract, and wrote an interesting comment on the CuneiForm OCR engine that I know it used to be quite good:

GOCR and Ocrad are essentially the only other open source OCR engines (there’s technically also Cuneiform, but the source code is in a really really big zip file from some website in Russian and its also really slow according to benchmarks). And something I didn’t realize until I had peered into the source code is that they are powered by (presumably) painstakingly written rules for each and every detectable glyph and variation. This kind of blew my mind.

Interesting…

Tesseract OCR Software GUI open-source GUI front-end for Tesseract OCR engine, with PDF support, at repo A9T9/Free-Ocr-Windows-Desktop.

Still need better text recognition results? Then try these new alternatives:

Online OCR for images and PDF is reall free web-based OCR app
OCR API is free web API that includes OCR command line examples with cURL.

It appears that utilizing the built-in Windows OCR engine within an application is a straightforward affair, as evidenced by the sample provided in rostok/cliocr, which showcases a simplistic command line OCR tool that utilizes “Windows.Media.Ocr” to extract text directly from the clipboard.

Screen OCR Keyboard Shortcut

Default shortcuts on the letter T are complicated, probably because there’s a lot going on with that T button, so I need to write down and explain them:

The Win + T shortcut is a built-in Windows functionality related to the taskbar that allows cycling through active taskbars
Text Extractor from PowerToys, accessed through Win + Shift + T, is a basic OCR tool that would probably be completely obsolete if it weren’t for the fact that it’s the only tool capable of overriding the built-in Windows shortcut. So, you either have to disable all shortcuts or use PowerToys Text Extractor.
Always On Top from PowerToys has the default shortcut Win + Ctrl + T, which is an extremely useful option for setting any window to be on top.
PowerToys also has a Keyboard Manager tool that, on “Remap a shortcut,” can disable item 1. if you choose “Disabled” as the “Mapped To” option.
ShareX tool cannot redefine Win + Shift + T to be used as an OCR tool shortcut or any similar combination because of 1. point.
Fn + T on Dell laptops toggles the “Ultra Performance Mode” that appears only affects fan speeds and nothing more.
In the Registry, Windows shortcut keys can be disabled by setting the 1 value on the key at the address HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\Explorer\NoKeyShorts, but a computer restart is necessary after making this change. Note that this will completely disable all other Win-key shortcuts, some of which may be quite useful.
Uz upotrebu Local Group Policy Editor odnosno gpedit.msc, treba pronaći User Configuration > Administrative Templates > Windows Components > File Explorer i naći stavku “Turn off Windows key hotkeys policy” koju treba postaviti na “Enable”.
Utilize the Local Group Policy Editor, i.e., gpedit.msc, and navigate to User Configuration > Administrative Templates > Windows Components > File Explorer and ten locate the “Turn off Windows key hotkeys policy” item, which should be set to “Enable”. This should have the same effect as item 8.
Both 8. and 9. didn’t work on m system.

You can find .reg files for item 7. in this article on How to Turn Off Keyboard Shortcuts and Disable Them in Windows: 3 Ways.

So I decided to move OCR tools from letter T to letter O and create shortcuts on O:

Win + Ctrl + O is very usable On-Screen Keyboard so don’t touch that one

Windows Shortcuts: The Lesser Known Ones

Fn + Win + Space is exactly the same as Win + . aka Win + Dot and will open amazing Windows Emoji Picker tool
Win + T to cycle through open taskbar programs
Win + B selects the system tray area and then you can use arrows to select items
Win + D hides or unhide the desktop
Win + , aka Win + Comma to peek at the desktop as akternative to previous Win + D shortcut
Win + Home Key minimizes all programs except the current one
Win + Ctrl + Left-Right to switch between virtual desktops
Alt + Esc similar to Alt + Tab but switches apps in the order they were opened in and without preview
Ctrl + Esc opens the Start menu.
Ctrl + Shift + Esc opens the task manager directly
Win + 1, 2, 3, 4, ... will launch each program in your taskbar
Win + Alt + 1, 2, 3, 4, ... will open the jump-list for each program in taskbar
Alt + Space is like you left-clicked on window left-top with move/size/minimize/maximize. Espanso tool has redefined this one to open it’s search bar
Win + Up/Down/Left/Right move and maximizes and restores the current window
Win + Pause/Break will open the system properties window.
Win + Ctrl + Shift + B will reset graphics engine
Win + PrtScn will take a full-screen screenshot and without question save it into the Screenshots folder.
Win + V for clipboard history

An unpleasant Teams Keyboard Shortcut

While we’re on the topic of Windows shortcut keys, let me mention which shortcut I disabled:

Win + C is a built-in shortcut that attempts to open Teams Chat or sign you in to Microsoft Teams and this happens even though I don’t have Teams installed.
Keyboard Manager from PowerToys is able to successfully disables that shortcut by choosing a “Remap a shortcut” and mapping Win + C to “Disabled”.

Defkey is absolutely the best website to see which applications typically use certain key shortcuts, especially when you have no idea why you can’t redefine a shortcut.

The official documentation for shortcuts can be found on the Microsoft website in the keyboard shortcuts in Windows article.

Capture2Text and setup with scoop install capture2text

dynobo/normcap is a cross-platform OCR powered screen-capture tool, and its description mentions all other alternatives with the same purpose.

Don’t forget that Shottr, which is my favorite tool for macOS, has OCR built-in, also based on Tessarect (Tesseract).

macOS Screenshot Tools

schappim/macOCR: Get any text on your screen into your clipboard.

date 03. Feb 2023 | modified 10. Jun 2024

filename: AI » OCR