tesseract
v5.3.3.20231005
cjkpitch.h
Go to the documentation of this file.
1
2
// File: cjkpitch.h
3
// Description: Code to determine fixed pitchness and the pitch if fixed,
4
// for CJK text.
5
// Copyright 2011 Google Inc. All Rights Reserved.
6
// Author: takenaka@google.com (Hiroshi Takenaka)
7
// Created: Mon Jun 27 12:48:35 JST 2011
8
//
9
// Licensed under the Apache License, Version 2.0 (the "License");
10
// you may not use this file except in compliance with the License.
11
// You may obtain a copy of the License at
12
// http://www.apache.org/licenses/LICENSE-2.0
13
// Unless required by applicable law or agreed to in writing, software
14
// distributed under the License is distributed on an "AS IS" BASIS,
15
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16
// See the License for the specific language governing permissions and
17
// limitations under the License.
18
//
20
#ifndef CJKPITCH_H_
21
#define CJKPITCH_H_
22
23
#include "
blobbox.h
"
24
25
namespace
tesseract
{
26
27
// Function to test "fixed-pitchness" of the input text and estimating
28
// character pitch parameters for it, based on CJK fixed-pitch layout
29
// model.
30
//
31
// This function assumes that a fixed-pitch CJK text has following
32
// characteristics:
33
//
34
// - Most glyphs are designed to fit within the same sized square
35
// (imaginary body). Also they are aligned to the center of their
36
// imaginary bodies.
37
// - The imaginary body is always a regular rectangle.
38
// - There may be some extra space between character bodies
39
// (tracking).
40
// - There may be some extra space after punctuations.
41
// - The text is *not* space-delimited. Thus spaces are rare.
42
// - Character may consists of multiple unconnected blobs.
43
//
44
// And the function works in two passes. On pass 1, it looks for such
45
// "good" blobs that has the pitch same pitch on the both side and
46
// looks like a complete CJK character. Then estimates the character
47
// pitch for every row, based on those good blobs. If we couldn't find
48
// enough good blobs for a row, then the pitch is estimated from other
49
// rows with similar character height instead.
50
//
51
// Pass 2 is an iterative process to fit the blobs into fixed-pitch
52
// character cells. Once we have estimated the character pitch, blobs
53
// that are almost as large as the pitch can be considered to be
54
// complete characters. And once we know that some characters are
55
// complete characters, we can estimate the region occupied by its
56
// neighbors. And so on.
57
//
58
// We repeat the process until all ambiguities are resolved. Then make
59
// the final decision about fixed-pitchness of each row and compute
60
// pitch and spacing parameters.
61
//
62
// (If a row is considered to be proportional, pitch_decision for the
63
// row is set to PITCH_CORR_PROP and the later phase
64
// (i.e. Textord::to_spacing()) should determine its spacing
65
// parameters)
66
//
67
// This function doesn't provide all information required by
68
// fixed_pitch_words() and the rows need to be processed with
69
// make_prop_words() even if they are fixed pitched.
70
void
compute_fixed_pitch_cjk
(ICOORD page_tr,
// top right
71
TO_BLOCK_LIST *port_blocks);
// input list
72
73
}
// namespace tesseract
74
75
#endif
// CJKPITCH_H_
blobbox.h
tesseract
Definition:
baseapi.h:39
tesseract::compute_fixed_pitch_cjk
void compute_fixed_pitch_cjk(ICOORD page_tr, TO_BLOCK_LIST *port_blocks)
Definition:
cjkpitch.cpp:1103
src
textord
cjkpitch.h
Generated on Thu Oct 5 2023 22:10:26 for tesseract by
1.9.4