tesseract v5.3.3.20231005
tesseract::ImageThresholder Class Reference

#include <thresholder.h>

Public Member Functions

 ImageThresholder ()
 
virtual ~ImageThresholder ()
 
virtual void Clear ()
 Destroy the Pix if there is one, freeing memory. More...
 
bool IsEmpty () const
 Return true if no image has been set. More...
 
void SetImage (const unsigned char *imagedata, int width, int height, int bytes_per_pixel, int bytes_per_line)
 
void SetRectangle (int left, int top, int width, int height)
 
virtual void GetImageSizes (int *left, int *top, int *width, int *height, int *imagewidth, int *imageheight)
 
bool IsColor () const
 Return true if the source image is color. More...
 
bool IsBinary () const
 Returns true if the source image is binary. More...
 
int GetScaleFactor () const
 
void SetSourceYResolution (int ppi)
 
int GetSourceYResolution () const
 
int GetScaledYResolution () const
 
void SetEstimatedResolution (int ppi)
 
int GetScaledEstimatedResolution () const
 
void SetImage (const Image pix)
 
virtual bool ThresholdToPix (Image *pix)
 Returns false on error. More...
 
virtual std::tuple< bool, Image, Image, ImageThreshold (TessBaseAPI *api, ThresholdMethod method)
 
virtual Image GetPixRectThresholds ()
 
Image GetPixRect ()
 
virtual Image GetPixRectGrey ()
 

Protected Member Functions

virtual void Init ()
 Common initialization shared between SetImage methods. More...
 
bool IsFullImage () const
 Return true if we are processing the full image. More...
 
void OtsuThresholdRectToPix (Image src_pix, Image *out_pix) const
 
void ThresholdRectToPix (Image src_pix, int num_channels, const std::vector< int > &thresholds, const std::vector< int > &hi_values, Image *pix) const
 

Protected Attributes

Image pix_
 
int image_width_
 Width of source pix_. More...
 
int image_height_
 Height of source pix_. More...
 
int pix_channels_
 Number of 8-bit channels in pix_. More...
 
int pix_wpl_
 Words per line of pix_. More...
 
int scale_
 Scale factor from original image. More...
 
int yres_
 y pixels/inch in source image. More...
 
int estimated_res_
 Resolution estimate from text size. More...
 
int rect_left_
 
int rect_top_
 
int rect_width_
 
int rect_height_
 

Detailed Description

Base class for all tesseract image thresholding classes. Specific classes can add new thresholding methods by overriding ThresholdToPix. Each instance deals with a single image, but the design is intended to be useful for multiple calls to SetRectangle and ThresholdTo* if desired.

Definition at line 45 of file thresholder.h.

Constructor & Destructor Documentation

◆ ImageThresholder()

tesseract::ImageThresholder::ImageThresholder ( )

Definition at line 42 of file thresholder.cpp.

43 : pix_(nullptr)
44 , image_width_(0)
45 , image_height_(0)
46 , pix_channels_(0)
47 , pix_wpl_(0)
48 , scale_(1)
49 , yres_(300)
50 , estimated_res_(300) {
51 SetRectangle(0, 0, 0, 0);
52}
int pix_wpl_
Words per line of pix_.
Definition: thresholder.h:188
int estimated_res_
Resolution estimate from text size.
Definition: thresholder.h:192
void SetRectangle(int left, int top, int width, int height)
int scale_
Scale factor from original image.
Definition: thresholder.h:190
int pix_channels_
Number of 8-bit channels in pix_.
Definition: thresholder.h:187
int yres_
y pixels/inch in source image.
Definition: thresholder.h:191
int image_width_
Width of source pix_.
Definition: thresholder.h:185
int image_height_
Height of source pix_.
Definition: thresholder.h:186

◆ ~ImageThresholder()

tesseract::ImageThresholder::~ImageThresholder ( )
virtual

Definition at line 54 of file thresholder.cpp.

54 {
55 Clear();
56}
virtual void Clear()
Destroy the Pix if there is one, freeing memory.
Definition: thresholder.cpp:59

Member Function Documentation

◆ Clear()

void tesseract::ImageThresholder::Clear ( )
virtual

Destroy the Pix if there is one, freeing memory.

Definition at line 59 of file thresholder.cpp.

59 {
60 pix_.destroy();
61}
void destroy()
Definition: image.cpp:32

◆ GetImageSizes()

void tesseract::ImageThresholder::GetImageSizes ( int *  left,
int *  top,
int *  width,
int *  height,
int *  imagewidth,
int *  imageheight 
)
virtual

Get enough parameters to be able to rebuild bounding boxes in the original image (not just within the rectangle). Left and top are enough with top-down coordinates, but the height of the rectangle and the image are needed for bottom-up.

Definition at line 148 of file thresholder.cpp.

149 {
150 *left = rect_left_;
151 *top = rect_top_;
152 *width = rect_width_;
153 *height = rect_height_;
154 *imagewidth = image_width_;
155 *imageheight = image_height_;
156}

◆ GetPixRect()

Image tesseract::ImageThresholder::GetPixRect ( )

Get a clone/copy of the source image rectangle. The returned Pix must be pixDestroyed. This function will be used in the future by the page layout analysis, and the layout analysis that uses it will only be available with Leptonica, so there is no raw equivalent.

Definition at line 351 of file thresholder.cpp.

351 {
352 if (IsFullImage()) {
353 // Just clone the whole thing.
354 return pix_.clone();
355 } else {
356 // Crop to the given rectangle.
357 Box *box = boxCreate(rect_left_, rect_top_, rect_width_, rect_height_);
358 Image cropped = pixClipRectangle(pix_, box, nullptr);
359 boxDestroy(&box);
360 return cropped;
361 }
362}
bool IsFullImage() const
Return true if we are processing the full image.
Definition: thresholder.h:165
Image clone() const
Definition: image.cpp:24

◆ GetPixRectGrey()

Image tesseract::ImageThresholder::GetPixRectGrey ( )
virtual

Definition at line 368 of file thresholder.cpp.

368 {
369 auto pix = GetPixRect(); // May have to be reduced to grey.
370 int depth = pixGetDepth(pix);
371 if (depth != 8 || pixGetColormap(pix)) {
372 if (depth == 24) {
373 auto tmp = pixConvert24To32(pix);
374 pix.destroy();
375 pix = tmp;
376 }
377 auto result = pixConvertTo8(pix, false);
378 pix.destroy();
379 return result;
380 }
381 return pix;
382}

◆ GetPixRectThresholds()

Image tesseract::ImageThresholder::GetPixRectThresholds ( )
virtual

Definition at line 324 of file thresholder.cpp.

324 {
325 if (IsBinary()) {
326 return nullptr;
327 }
328 Image pix_grey = GetPixRectGrey();
329 int width = pixGetWidth(pix_grey);
330 int height = pixGetHeight(pix_grey);
331 std::vector<int> thresholds;
332 std::vector<int> hi_values;
333 OtsuThreshold(pix_grey, 0, 0, width, height, thresholds, hi_values);
334 pix_grey.destroy();
335 Image pix_thresholds = pixCreate(width, height, 8);
336 int threshold = thresholds[0] > 0 ? thresholds[0] : 128;
337 pixSetAllArbitrary(pix_thresholds, threshold);
338 return pix_thresholds;
339}
int OtsuThreshold(Image src_pix, int left, int top, int width, int height, std::vector< int > &thresholds, std::vector< int > &hi_values)
Definition: otsuthr.cpp:38
virtual Image GetPixRectGrey()
bool IsBinary() const
Returns true if the source image is binary.
Definition: thresholder.h:84

◆ GetScaledEstimatedResolution()

int tesseract::ImageThresholder::GetScaledEstimatedResolution ( ) const
inline

Definition at line 115 of file thresholder.h.

115 {
116 return scale_ * estimated_res_;
117 }

◆ GetScaledYResolution()

int tesseract::ImageThresholder::GetScaledYResolution ( ) const
inline

Definition at line 102 of file thresholder.h.

102 {
103 return scale_ * yres_;
104 }

◆ GetScaleFactor()

int tesseract::ImageThresholder::GetScaleFactor ( ) const
inline

Definition at line 88 of file thresholder.h.

88 {
89 return scale_;
90 }

◆ GetSourceYResolution()

int tesseract::ImageThresholder::GetSourceYResolution ( ) const
inline

Definition at line 99 of file thresholder.h.

99 {
100 return yres_;
101 }

◆ Init()

void tesseract::ImageThresholder::Init ( )
protectedvirtual

Common initialization shared between SetImage methods.

Definition at line 342 of file thresholder.cpp.

342 {
344}

◆ IsBinary()

bool tesseract::ImageThresholder::IsBinary ( ) const
inline

Returns true if the source image is binary.

Definition at line 84 of file thresholder.h.

84 {
85 return pix_channels_ == 0;
86 }

◆ IsColor()

bool tesseract::ImageThresholder::IsColor ( ) const
inline

Return true if the source image is color.

Definition at line 79 of file thresholder.h.

79 {
80 return pix_channels_ >= 3;
81 }

◆ IsEmpty()

bool tesseract::ImageThresholder::IsEmpty ( ) const

Return true if no image has been set.

Definition at line 64 of file thresholder.cpp.

64 {
65 return pix_ == nullptr;
66}

◆ IsFullImage()

bool tesseract::ImageThresholder::IsFullImage ( ) const
inlineprotected

Return true if we are processing the full image.

Definition at line 165 of file thresholder.h.

165 {
166 return rect_left_ == 0 && rect_top_ == 0 && rect_width_ == image_width_ &&
168 }

◆ OtsuThresholdRectToPix()

void tesseract::ImageThresholder::OtsuThresholdRectToPix ( Image  src_pix,
Image out_pix 
) const
protected

Definition at line 385 of file thresholder.cpp.

385 {
386 std::vector<int> thresholds;
387 std::vector<int> hi_values;
388
389 int num_channels = OtsuThreshold(src_pix, rect_left_, rect_top_, rect_width_, rect_height_,
390 thresholds, hi_values);
391 // only use opencl if compiled w/ OpenCL and selected device is opencl
392#ifdef USE_OPENCL
393 OpenclDevice od;
394 if (num_channels == 4 && od.selectedDeviceIsOpenCL() && rect_top_ == 0 && rect_left_ == 0) {
395 od.ThresholdRectToPixOCL((unsigned char *)pixGetData(src_pix), num_channels,
396 pixGetWpl(src_pix) * 4, &thresholds[0], &hi_values[0], out_pix /*pix_OCL*/,
398 } else {
399#endif
400 ThresholdRectToPix(src_pix, num_channels, thresholds, hi_values, out_pix);
401#ifdef USE_OPENCL
402 }
403#endif
404}
void ThresholdRectToPix(Image src_pix, int num_channels, const std::vector< int > &thresholds, const std::vector< int > &hi_values, Image *pix) const

◆ SetEstimatedResolution()

void tesseract::ImageThresholder::SetEstimatedResolution ( int  ppi)
inline

Definition at line 110 of file thresholder.h.

110 {
111 estimated_res_ = ppi;
112 }

◆ SetImage() [1/2]

void tesseract::ImageThresholder::SetImage ( const Image  pix)

Pix vs raw, which to use? Pix is the preferred input for efficiency, since raw buffers are copied. SetImage for Pix clones its input, so the source pix may be pixDestroyed immediately after, but may not go away until after the Thresholder has finished with it.

Definition at line 163 of file thresholder.cpp.

163 {
164 if (pix_ != nullptr) {
165 pix_.destroy();
166 }
167 Image src = pix;
168 int depth;
169 pixGetDimensions(src, &image_width_, &image_height_, &depth);
170 // Convert the image as necessary so it is one of binary, plain RGB, or
171 // 8 bit with no colormap. Guarantee that we always end up with our own copy,
172 // not just a clone of the input.
173 if (depth > 1 && depth < 8) {
174 pix_ = pixConvertTo8(src, false);
175 } else {
176 pix_ = src.copy();
177 }
178 depth = pixGetDepth(pix_);
179 pix_channels_ = depth / 8;
180 pix_wpl_ = pixGetWpl(pix_);
181 scale_ = 1;
182 estimated_res_ = yres_ = pixGetYRes(pix_);
183 Init();
184}
virtual void Init()
Common initialization shared between SetImage methods.
Image copy() const
Definition: image.cpp:28

◆ SetImage() [2/2]

void tesseract::ImageThresholder::SetImage ( const unsigned char *  imagedata,
int  width,
int  height,
int  bytes_per_pixel,
int  bytes_per_line 
)

SetImage makes a copy of all the image data, so it may be deleted immediately after this call. Greyscale of 8 and color of 24 or 32 bits per pixel may be given. Palette color images will not work properly and must be converted to 24 bit. Binary images of 1 bit per pixel may also be given but they must be byte packed with the MSB of the first byte being the first pixel, and a one pixel is WHITE. For binary images set bytes_per_pixel=0.

Definition at line 76 of file thresholder.cpp.

77 {
78 int bpp = bytes_per_pixel * 8;
79 if (bpp == 0) {
80 bpp = 1;
81 }
82 Image pix = pixCreate(width, height, bpp == 24 ? 32 : bpp);
83 l_uint32 *data = pixGetData(pix);
84 int wpl = pixGetWpl(pix);
85 switch (bpp) {
86 case 1:
87 for (int y = 0; y < height; ++y, data += wpl, imagedata += bytes_per_line) {
88 for (int x = 0; x < width; ++x) {
89 if (imagedata[x / 8] & (0x80 >> (x % 8))) {
90 CLEAR_DATA_BIT(data, x);
91 } else {
92 SET_DATA_BIT(data, x);
93 }
94 }
95 }
96 break;
97
98 case 8:
99 // Greyscale just copies the bytes in the right order.
100 for (int y = 0; y < height; ++y, data += wpl, imagedata += bytes_per_line) {
101 for (int x = 0; x < width; ++x) {
102 SET_DATA_BYTE(data, x, imagedata[x]);
103 }
104 }
105 break;
106
107 case 24:
108 // Put the colors in the correct places in the line buffer.
109 for (int y = 0; y < height; ++y, imagedata += bytes_per_line) {
110 for (int x = 0; x < width; ++x, ++data) {
111 SET_DATA_BYTE(data, COLOR_RED, imagedata[3 * x]);
112 SET_DATA_BYTE(data, COLOR_GREEN, imagedata[3 * x + 1]);
113 SET_DATA_BYTE(data, COLOR_BLUE, imagedata[3 * x + 2]);
114 }
115 }
116 break;
117
118 case 32:
119 // Maintain byte order consistency across different endianness.
120 for (int y = 0; y < height; ++y, imagedata += bytes_per_line, data += wpl) {
121 for (int x = 0; x < width; ++x) {
122 data[x] = (imagedata[x * 4] << 24) | (imagedata[x * 4 + 1] << 16) |
123 (imagedata[x * 4 + 2] << 8) | imagedata[x * 4 + 3];
124 }
125 }
126 break;
127
128 default:
129 tprintf("Cannot convert RAW image to Pix with bpp = %d\n", bpp);
130 }
131 SetImage(pix);
132 pix.destroy();
133}
const double y
void tprintf(const char *format,...)
Definition: tprintf.cpp:41
void SetImage(const unsigned char *imagedata, int width, int height, int bytes_per_pixel, int bytes_per_line)
Definition: thresholder.cpp:76

◆ SetRectangle()

void tesseract::ImageThresholder::SetRectangle ( int  left,
int  top,
int  width,
int  height 
)

Store the coordinates of the rectangle to process for later use. Doesn't actually do any thresholding.

Definition at line 137 of file thresholder.cpp.

137 {
138 rect_left_ = left;
139 rect_top_ = top;
140 rect_width_ = width;
141 rect_height_ = height;
142}

◆ SetSourceYResolution()

void tesseract::ImageThresholder::SetSourceYResolution ( int  ppi)
inline

Definition at line 95 of file thresholder.h.

95 {
96 yres_ = ppi;
97 estimated_res_ = ppi;
98 }

◆ Threshold()

std::tuple< bool, Image, Image, Image > tesseract::ImageThresholder::Threshold ( TessBaseAPI api,
ThresholdMethod  method 
)
virtual

Definition at line 186 of file thresholder.cpp.

188 {
189 Image pix_binary = nullptr;
190 Image pix_thresholds = nullptr;
191
192 if (pix_channels_ == 0) {
193 // We have a binary image, but it still has to be copied, as this API
194 // allows the caller to modify the output.
195 Image original = GetPixRect();
196 pix_binary = original.copy();
197 original.destroy();
198 return std::make_tuple(true, nullptr, pix_binary, nullptr);
199 }
200
201 auto pix_grey = GetPixRectGrey();
202
203 int r;
204
205 l_int32 pix_w, pix_h;
206 pixGetDimensions(pix_grey, &pix_w, &pix_h, nullptr);
207
208 bool thresholding_debug;
209 api->GetBoolVariable("thresholding_debug", &thresholding_debug);
210 if (thresholding_debug) {
211 tprintf("\nimage width: %d height: %d ppi: %d\n", pix_w, pix_h, yres_);
212 }
213
214 if (method == ThresholdMethod::Sauvola) {
215 int window_size;
216 double window_size_factor;
217 api->GetDoubleVariable("thresholding_window_size", &window_size_factor);
218 window_size = window_size_factor * yres_;
219 window_size = std::max(7, window_size);
220 window_size = std::min(pix_w < pix_h ? pix_w - 3 : pix_h - 3, window_size);
221 int half_window_size = window_size / 2;
222
223 // factor for image division into tiles; >= 1
224 l_int32 nx, ny;
225 // tiles size will be approx. 250 x 250 pixels
226 nx = std::max(1, (pix_w + 125) / 250);
227 ny = std::max(1, (pix_h + 125) / 250);
228 auto xrat = pix_w / nx;
229 auto yrat = pix_h / ny;
230 if (xrat < half_window_size + 2) {
231 nx = pix_w / (half_window_size + 2);
232 }
233 if (yrat < half_window_size + 2) {
234 ny = pix_h / (half_window_size + 2);
235 }
236
237 double kfactor;
238 api->GetDoubleVariable("thresholding_kfactor", &kfactor);
239 kfactor = std::max(0.0, kfactor);
240
241 if (thresholding_debug) {
242 tprintf("window size: %d kfactor: %.3f nx:%d ny: %d\n", window_size, kfactor, nx, ny);
243 }
244
245 r = pixSauvolaBinarizeTiled(pix_grey, half_window_size, kfactor, nx, ny,
246 (PIX**)pix_thresholds,
247 (PIX**)pix_binary);
248 } else { // if (method == ThresholdMethod::LeptonicaOtsu)
249 int tile_size;
250 double tile_size_factor;
251 api->GetDoubleVariable("thresholding_tile_size", &tile_size_factor);
252 tile_size = tile_size_factor * yres_;
253 tile_size = std::max(16, tile_size);
254
255 int smooth_size;
256 double smooth_size_factor;
257 api->GetDoubleVariable("thresholding_smooth_kernel_size",
258 &smooth_size_factor);
259 smooth_size_factor = std::max(0.0, smooth_size_factor);
260 smooth_size = smooth_size_factor * yres_;
261 int half_smooth_size = smooth_size / 2;
262
263 double score_fraction;
264 api->GetDoubleVariable("thresholding_score_fraction", &score_fraction);
265
266 if (thresholding_debug) {
267 tprintf("tile size: %d smooth_size: %d score_fraction: %.2f\n", tile_size, smooth_size, score_fraction);
268 }
269
270 r = pixOtsuAdaptiveThreshold(pix_grey, tile_size, tile_size,
271 half_smooth_size, half_smooth_size,
272 score_fraction,
273 (PIX**)pix_thresholds,
274 (PIX**)pix_binary);
275 }
276
277 bool ok = (r == 0);
278 return std::make_tuple(ok, pix_grey, pix_binary, pix_thresholds);
279}

◆ ThresholdRectToPix()

void tesseract::ImageThresholder::ThresholdRectToPix ( Image  src_pix,
int  num_channels,
const std::vector< int > &  thresholds,
const std::vector< int > &  hi_values,
Image pix 
) const
protected

Threshold the rectangle, taking everything except the src_pix from the class, using thresholds/hi_values to the output pix. NOTE that num_channels is the size of the thresholds and hi_values

Definition at line 410 of file thresholder.cpp.

411 {
412 *pix = pixCreate(rect_width_, rect_height_, 1);
413 uint32_t *pixdata = pixGetData(*pix);
414 int wpl = pixGetWpl(*pix);
415 int src_wpl = pixGetWpl(src_pix);
416 uint32_t *srcdata = pixGetData(src_pix);
417 pixSetXRes(*pix, pixGetXRes(src_pix));
418 pixSetYRes(*pix, pixGetYRes(src_pix));
419 for (int y = 0; y < rect_height_; ++y) {
420 const uint32_t *linedata = srcdata + (y + rect_top_) * src_wpl;
421 uint32_t *pixline = pixdata + y * wpl;
422 for (int x = 0; x < rect_width_; ++x) {
423 bool white_result = true;
424 for (int ch = 0; ch < num_channels; ++ch) {
425 int pixel = GET_DATA_BYTE(linedata, (x + rect_left_) * num_channels + ch);
426 if (hi_values[ch] >= 0 && (pixel > thresholds[ch]) == (hi_values[ch] == 0)) {
427 white_result = false;
428 break;
429 }
430 }
431 if (white_result) {
432 CLEAR_DATA_BIT(pixline, x);
433 } else {
434 SET_DATA_BIT(pixline, x);
435 }
436 }
437 }
438}

◆ ThresholdToPix()

bool tesseract::ImageThresholder::ThresholdToPix ( Image pix)
virtual

Returns false on error.

Threshold the source image as efficiently as possible to the output Pix. Creates a Pix and sets pix to point to the resulting pointer. Caller must use pixDestroy to free the created Pix. Returns false on error.

Definition at line 285 of file thresholder.cpp.

285 {
286 if (image_width_ > INT16_MAX || image_height_ > INT16_MAX) {
287 tprintf("Image too large: (%d, %d)\n", image_width_, image_height_);
288 return false;
289 }
290 Image original = GetPixRect();
291 if (pix_channels_ == 0) {
292 // We have a binary image, but it still has to be copied, as this API
293 // allows the caller to modify the output.
294 *pix = original.copy();
295 } else {
296 if (pixGetColormap(original)) {
297 Image tmp;
298 Image without_cmap =
299 pixRemoveColormap(original, REMOVE_CMAP_BASED_ON_SRC);
300 int depth = pixGetDepth(without_cmap);
301 if (depth > 1 && depth < 8) {
302 tmp = pixConvertTo8(without_cmap, false);
303 } else {
304 tmp = without_cmap.copy();
305 }
306 without_cmap.destroy();
307 OtsuThresholdRectToPix(tmp, pix);
308 tmp.destroy();
309 } else {
311 }
312 }
313 original.destroy();
314 return true;
315}
void OtsuThresholdRectToPix(Image src_pix, Image *out_pix) const

Member Data Documentation

◆ estimated_res_

int tesseract::ImageThresholder::estimated_res_
protected

Resolution estimate from text size.

Definition at line 192 of file thresholder.h.

◆ image_height_

int tesseract::ImageThresholder::image_height_
protected

Height of source pix_.

Definition at line 186 of file thresholder.h.

◆ image_width_

int tesseract::ImageThresholder::image_width_
protected

Width of source pix_.

Definition at line 185 of file thresholder.h.

◆ pix_

Image tesseract::ImageThresholder::pix_
protected

Clone or other copy of the source Pix. The pix will always be PixDestroy()ed on destruction of the class.

Definition at line 183 of file thresholder.h.

◆ pix_channels_

int tesseract::ImageThresholder::pix_channels_
protected

Number of 8-bit channels in pix_.

Definition at line 187 of file thresholder.h.

◆ pix_wpl_

int tesseract::ImageThresholder::pix_wpl_
protected

Words per line of pix_.

Definition at line 188 of file thresholder.h.

◆ rect_height_

int tesseract::ImageThresholder::rect_height_
protected

Definition at line 196 of file thresholder.h.

◆ rect_left_

int tesseract::ImageThresholder::rect_left_
protected

Definition at line 193 of file thresholder.h.

◆ rect_top_

int tesseract::ImageThresholder::rect_top_
protected

Definition at line 194 of file thresholder.h.

◆ rect_width_

int tesseract::ImageThresholder::rect_width_
protected

Definition at line 195 of file thresholder.h.

◆ scale_

int tesseract::ImageThresholder::scale_
protected

Scale factor from original image.

Definition at line 190 of file thresholder.h.

◆ yres_

int tesseract::ImageThresholder::yres_
protected

y pixels/inch in source image.

Definition at line 191 of file thresholder.h.


The documentation for this class was generated from the following files: