An OCR engine with unicode (UTF-8) support that can recognize over 100 languages out of the box.

Documentation

Manual: tesseract.pdf
Vignette: None available.

Maintainer: Jeroen Ooms <jeroen at berkeley.edu>

Author(s): Jeroen Ooms

Install package and any missing dependencies by running this line in your R console:

install.packages("tesseract")

Depends
Imports Rcpp(>=0.12.10), curl, digest
Suggests magick, pdftools, tiff
Enhances
Linking to Rcpp
Reverse
depends
Reverse
imports
Reverse
suggests
Reverse
enhances
Reverse
linking to

Package tesseract
Materials
URL https://github.com/ropensci/tesseract
Task Views NaturalLanguageProcessing
Version 1.4
Published 2017-03-21
License MIT + file LICENSE
BugReports https://github.com/ropensci/tesseract/issues
SystemRequirements Tesseract >= 3.03 (libtesseract-dev / tesseract-devel) and Leptonica (libleptonica-dev / leptonica-devel). On Debian you need to install the English training data separately (tesseract-ocr-eng)
NeedsCompilation yes
Citation
CRAN checks tesseract check results
Package source tesseract_1.4.tar.gz