-
Notifications
You must be signed in to change notification settings - Fork 4
/
Changelog
78 lines (47 loc) · 2.55 KB
/
Changelog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
0.6-0
* Added getLSTMSymbolChoices() to access the alternative characters within each word and
their confidences from an LSTM recognition of the page/image.
0.5-0
* Compatible with tesseract 5.0 thanks to work and a pull request from Wesley Brooks
* Added level name to class of OCRResults, e.g. WordOCRResults.
* Added OCRResultsConfidences class to OCRResults when Confidence column is included.
0.4-0
* readMultipageTiff() function to read all pages of a TIFF file. Can pass each page to
tesseract() or use as a Pix object directly.
* readImageInfo() function to read metadata from an image file, i.e., format, dimensions,
pixels per sampler, has a color map.
* getAvailableLanguages() function.
* Enhancements related to garbage collection and copying of Pix objects.
* Use UTF-8 encoding on text obtained from OCR
* Examples of reading Russian and Sanscrit documents.
* Avoid garbage collection related segmentation fault related to GetImage().
* dim() method for TesseractBaseAPI to get dimensions of current image.
* Added oem<-() function for TesseractBaseAPI objects.
* getLines()/findLines() now coerce the pix parameter to a Pix so can accept a
TesseractBaseAPI object or a path to an image file.
* getLines()/findLines() convert the Pix to 8-bit
* GetImageDims() works on OCRResults objects.
* R global option "OCRConfidenceColors" controls the range of colors used in plot.OCR for displaying
the confidences.
* lines() method for drawing object returned by getLines().
* GetFontInfo() function.
0.3-1
* Set the locale for each .Call() invocation to handle tesseract 4.0.
* pixRead() has an addFinalizer logical argument to control whether this is added or not; related
to SetImage() and tesseract free'ing its image when tesseract itself is GC'ed.
* Can treat the TesseractBaseAPI object returned by tesseract() as a named list to get/set the
tesseract variables. names(), $, [[, $<-, [[<-, [, [<-, methods.
0.3-0
* getText(), getConfidences(), getBoxes(), getAlternatives() with methods for image file
name or TessBaseAPI objects (and others internally).
* removed ocr() function
* color the plot of results by confidence.
* removed exposure of ResultIterator and Pix to the end user to avoid potential memory
management issues.
* ReadConfigFile() has a debug = FALSE parameter.
* several functions to convert the image to searchable/selectable PDF, HTML, TSV, OSD.
* code more robust.
0.2-0
* coerce tesseract API object to iterator as necessary
* works with tesseract 3.05dev
* improved configuration