Connected Components Labeling on Bitonal Images

. Several algorithmic solutions for the optimization of Connected Components Labeling have been proposed in literature. Among them, one of the most eﬀective is a block-based mask to drastically reduce the number of memory accesses during the labeling procedure. This paper proposes a systematic approach for labeling multiple pixels at once, automatically generating the actions to be performed on the current pixel/block given the mask values. The proposed strategy allows to extend existing techniques for the generation of optimal decision trees to much more complex masks, where the connectivity between pixels inside a block is not guaranteed. A showcase application, consisting in the de-sign of an eﬃcient CCL algorithm for bitonal images, demonstrates the eﬀectiveness of our proposal in terms of speed and memory footprint.


Introduction
The task of labeling connected components, also known as Connected Components Labeling or CCL in short, aims at producing a description of the objects inside binary images, by generating a symbolic image where each pixel of a single connected component (object) is assigned a unique identifier.Objects inside binary images are usually defined according to the pixel neighborhood, which can be either 4-or 8-neighborhood for 2D-images.The rest of the paper will focus on the 8-neighborhood.
Connected components labeling represents a fundamental pre-and postprocessing step for many Computer Vision and Image Processing pipelines [3,6,8,11,12,14,18,25,26,28,29,31].CCL has an exact output, and therefore different algorithmic solutions are only compared in term of speed and memory footprint.After the introduction of the task in the Sixties, several proposals were made in the course of decades to optimized its computational load, both for sequential [5,20,24,32] and parallel architectures [1,23,27,33].Among the different algorithmic solutions, block-based scan approaches (i.e.label a block of 2 × 2 pixels at once) [9,10,17], decision trees [19,32] and state prediction [15,22] (i.e.reuse the information gathered during the previous step when labeling ---p q 0 q 1 q 2 q 3 r --- (b) Grana ---p q 0 q 1 q 2 q 3 r ------s x 0 x 1 x 2 x 3 (c) Rosenfeld4 Fig. 1.Example of scan masks.Gray squares identify current pixels to be labeled using information extracted from white pixels.(a) and (b) are very common masks employed in CCL (c) is an extended version of the Rosenfeld mask that is proposed and analyzed in this paper.In this specific case, the current pixels, i.e. x0, x1, x2, x3, do not necessarily share the same label.Dashes identify meaningless pixels.
the current pixel/block) revealed to be some of the most valuable strategies, especially when combined together [4].
Binary images can be efficiently stored with only 1 bit per pixel ("1-bit graphics" format, or bitonal images).This representation is especially useful in embedded systems with limited resources, where memory usage must be reduced to a minimum.In banking, as an example, the bitonal image is the legally recognized standard for electronic check clearing in the United States and many other countries.Working with 1-bit per pixel images (also denoted as 1-bpp or bitonal images) allows to considerably reduce the amount of memory accesses; on the other hand, it also requires additional bitwise operations for retrieving single pixel values.
In the context of 1-bpp images, being able of labeling an entire byte as a single block would guarantee a significant performance improvement, without requiring to convert the input into a 1-byte per pixel image.Unfortunately, the assumption that foreground pixels are always connected inside a block does not hold in such a case, and algorithms proposed in literature for the automatic generation of binary decision trees are not feasible.This paper introduces a systematic approach for generating all the possible actions associated to a scanning mask, which is employed to design an extremely fast CCL algorithm for bitonal images capable of labeling four consecutive pixels at once.The rest of this paper is organized as follows.Section 2 resumes the latest contributions on connected components labeling; the proposed strategy is described in Section 3, and the result is evaluated in Section 4. Finally, in Section 5 conclusions are drawn.

Related Work
Originally introduced by Rosenfeld and Pfaltz [30], connected components labeling has a very long story, full of different strategies and proposals.Since its first appearance in 1966, many papers showed algorithms to improve the efficiency of the task.Traditionally, the fastest CCL algorithms employ a two scan strategy.In the first scan, each pixel is assigned a provisional label determined using a mask of already visited pixels, such as the one in Fig. 1a, and possible equivalences between labels are recorded.Then, a representative label is established for each connected component, and the second scan replaces provisional labels with final ones.
Several strategies have been proposed for the resolution of label equivalence, and the most commonly seen in literature employ some variation of the unionfind.The union-find data structure, first applied to CCL by Dillencourt et al. [13], provides two convenient procedures to deal with equivalence classes of labels: Find, which retrieves the representative label of an equivalence class, and Union, which merges two equivalence classes into one, ensuring that they share the same representative label.
After the introduction of union-find, a significant improvement was provided by Wu et al. in [32].The authors proved an optimal strategy, based on a manually identified decision tree, to reduce the average number of load/store operations during the first scan of the input image, driven by the Rosenfeld mask (Fig. 1a).The resulting algorithm have been christened Scan Array-based Union Find, or SAUF in short.
In 2010, Grana et al. [17] introduced a major breakthrough, consisting in a 2 × 2 block-based approach (Fig. 1b).The problem was modeled as a command execution metaphor : values of pixels in the scanning mask constitute a rule (binary string), which is associated to a set of equivalent actions in an OR-decision table (Fig. 2).Given this decision table, an algorithm can simply read all the pixels inside the mask, identify the rule, and find the action to be performed in the corresponding column.In [19], a dynamic programming approach to convert OR-decision tables into optimal binary decision trees was proposed, in order to minimize the average number of conditions to be checked when choosing the correct action to be performed.The resulting algorithm is denoted as Block-Based Decision Tree, or BBDT.Many improvements were published since then [21].
In 2014, He et al. [22] demonstrated the possibility to use a finite state machine to summarize the value of already inspected pixels during the horizontal mask movement.
In [15], decision trees and configuration transitions are combined in a decision forest, where each previous pattern allows to "predict" some of the current configuration pixel values, thus allowing for automatic code generation.The first scan phase of the algorithm is ruled by a forest of decision trees connected into a single graph, where each tree derives from a reduction of the complete optimal decision tree.Additionally, in [7] Bolelli et al. demonstrated that switching from decision trees to Directed Rooted Acyclic Graphs (DRAGs) allows for a reduction of machine code footprint, thus lessening the impact on instruction cache.
Finally, in [4] authors managed to combine the block-based mask with state prediction and code compression: the resulting algorithm, known as Spaghetti Labeling, was modeled as a Directed Rooted Acyclic Graph with multiple entry points, automatically generated without manual intervention.

Method
In this section, the proposed method for labeling multiple pixels at once is presented.As usual, CCL is performed with two raster scans of the input, here briefly summarized.
The first scan employs a mask moving in discrete steps, which highlights the current pixel(s) to be labeled and its neighborhood, composed of already analyzed and labeled pixels; at each step, the current pixel is assigned a provisional label, and if it connects two or more connected components, their labels are recorded as equivalent by means of some variation of Union-Find [7].This set of operations carried out for a certain mask position is known as action, and depends on the values of pixels inside the mask, which form a binary word known as command [17].
Then, the second scan simply replaces each provisional label with the chosen representative for the equivalence class, thus completing the task.While the second scan is usually fixed and nearly identical for most algorithms, the first scan is where algorithmic proposals differ the most: here several optimizations can take place, such as the aforementioned decision tree (decide the action without reading the whole command word), block-based strategy (label multiple connected pixels at once), prediction (avoid to re-read neighbor pixels known from the previous step), and compression (reduce machine code size by merging equivalent subtrees of a larger decision tree).
The new technique proposed with this work extends the block-based approach, by overcoming the limitation that all pixels to be labeled at once must be connected.In fact, with respect to previous proposals [9,10,17], limited to a block size of at most 2 × 2 pixels, the devised algorithm can be applied to blocks of any shape.
In the following, the term macroblock identifies the pixels of the mask to be labeled during the current step.A macroblock can be divided into disjoint blocks, each of which always contains pixels connected to each other.This ensures that a block can always be assigned only one label.
On the other hand, it is not possible to assume that only one single action is performed on the current macroblock.Taking the mask of Fig. 1c as an example, the macroblock is composed by pixels x 0 , x 1 , x 2 , and x 3 which can be divided into two different blocks: In this context, it may happen that, e.g., block X 0 requires a new label, while X 1 must be assigned the result of a merge -the union procedure of the union-find data structure-of two different existing label classes.
In literature, no attempt has been made to systematically generate the set of actions associated to a macroblock-based scan mask.The block-based mask proposed in [17] (Fig. 1b), for example, shares the same actions of the Rosenfeld mask, with the only addition of some merge operations that have no effect in a pixel-based context.
Let us start with some formal definitions.Be I : L → {B, F } a binary image, i.e., a function defined over a 2-dimensional square lattice L, where pixels only assume two possible values, background (B) and foreground (F ), usually represented by integers 0 and 1 respectively.
The 8-neighborhood of pixel p = (p r , p c ) ∈ L, denoted as N 8 (p), is the set of pixels sharing an edge or vertex with p: Given S ⊂ L, pixels p, q ∈ S are connected in S, denoted as p S q, if a path of neighbor foreground pixels exists, all belonging to S and leading from p to q: Connectivity in S is an equivalence relation, since the properties of reflexivity, symmetry and transitivity hold.Equivalence classes of this relation are called Connected Components (CCs) of S. When S = L, we omit the subscript in the notation p q, and CCs of L are referred to as just CCs.
To better detail the proposed algorithmic solution, we divide the pixels in the mask in two subsets: -Outer Pixels (P O ), pixels inside the mask but outside the macroblock.Outer pixels already have a provisional label, since they have already been analyzed by the algorithm.-Inner Pixel (P I ), pixels inside the macroblock.Inner pixels must be assigned a provisional label in the current step.
In order to proceed with the generation of the action set, the following operations are required for each configuration of the mask: ---p q 0 q 1 q 2 q 3 r --- -Identify the CCs of P O ; this set of outer connected components is denoted as C O ; -Identify connected components of P I ; these inner connected components are denoted as C I ; -Identify the connected components for the whole mask, i.e., CCs for P O ∪P I ; these are denoted as C T ; -For each t ∈ C T , consider all inner pixels belonging to t (i.e. the set t ∩ P I ); all these pixels must be assigned the same label l.This label is determined analyzing the set of outer connected components contained in t, denoted as all CCs in C t O must be merged, meaning that their labels are marked as equivalent and l is set to any of them (typically the smallest).
It is important to stress that C T = C O ∪ C I , and that the external components are defined without considering connections through pixels currently under examination (the pixels of the macroblock).The same goes for internal components, where we do not consider connections due to external pixels.Those connections are considered only for C T .
An example of action generation is reported in Fig. 3.In Fig. 3a, a mask configuration is shown: the gray squares represent foreground pixels, x 0 , x 1 , x 2 , x 3 are the pixels in the "current" macroblock, and dashes identify meaningless pixels.The process starts with the detection of the connected components in the outer part of the mask, i.e. ignoring the current macroblock (Fig. 3b).In this specific example, three different objects are in C O , respectively named 1, 2, 3.Then, inner connected components are identified inside the macroblock: a and b as in Fig. 3c.Finally global connected components (C T ), x and y, are depicted in Fig. 3d.In particular, CC x contains both the inner component a and the outer component 1, so from this configuration we can derive the first operation to be performed, a = 1, that easily translates into action x 0 = p.This action means: assign to x 0 the same label previously assigned to p.Moreover, CC y contains inner component b and outer components 2 and 3.In this case the operation is b = 2 + 3: assign to all the pixels of connected component b the result of the merge between components 2 and 3. Translating this operation into an action, we obtain x 2 x 3 = q 2 + r, that is assign to pixels x 2 and x 3 the merge of labels previously assigned to pixels q 2 and r.
In order to give the reader an additional example, we can consider the same configuration of Fig. 3a and add q 1 as foreground.In this case, outer and inner components are the same of Fig. 3b and Fig. 3c, but component 2 is made of two pixels (q 1 and q 2 ) instead of just one.On the other hand, we obtain just one single global component.This causes the algorithm to generate the action x 0 x 2 x 3 = p + q 1 q 2 + r: x 0 , x 2 , and x 3 must be assigned the result of the merge between p, one between q 1 and q 2 , and r.Actually, the or between q 1 and q 2 is responsible for the generation of two equivalent actions, x 0 x 2 x 3 = p+q 1 +r and x 0 x 2 x 3 = p + q 2 + r.Most equivalence cases can be resolved with the blockbased approach described in the following; the others are treated during the generation of the optimal decision tree, as explained in [19].Basically, each global connected component generates one or multiple equivalent actions, responsible for the labeling of all pixels belonging to one or more internal components.

Reducing Actions with Blocks
The number of actions generated with the proposed approach grows very quickly as the mask size increases, making the generation of the optimal decision tree extremely hard or even impossible.In order to reduce the number of actions and simplify the problem, we introduce a block-based approach.As described above, macroblocks are divided into blocks, and pixel-based actions are replaced with block-based ones, eliminating possible duplicates.This way, many of the previous actions translate into the same one, and can be removed.
As an example, let us consider the following pixel-based actions: x 0 = q 0 and x 1 = q 0 q 1 , the latter actually representing the two equivalent actions x 1 = q 0 and x 1 = q 1 .Since x 0 and x 1 are always connected, they can be viewed as part of the single block X 0 , and the same applies to q 0 and q 1 , which are part of the block Q 0 .Thus all the three aforementioned pixel-based actions can be fused into X 0 = Q 0 , producing the same outcome.
As previously described in literature [17,4], when working with block-based actions the second scan of an algorithm requires a slight overhead to correctly handle blocks, i.e. assigning the block label to all foreground pixels belonging to that block.Obviously, the reduction in the number of actions can be more or less significant depending on the mask features.For what concerns the mask of   Fig. 1c, actions reduce of about 80% (from 413 to 85 actions) when moving to the block-based approach.After generating all the possible actions associated to a scan mask and the corresponding OR-decision table, the algorithm described in [19] can be employed for the generation of an optimal decision tree, which maps the mask configuration to an action, minimizing the average number of load/store operations required.
The described approach has been employed to generate an optimal decision tree for the mask of Fig. 1c, which constitutes the core of a new CCL algorithm, specifically designed for 1-bpp images.Since it shares the general structure of SAUF, but operates on 4-pixel macroblocks, it is referred to as SAUF4 1BPP.

Experimental Results
The performance evaluation of the proposed algorithm has been carried out with an extended version of the YACCLAB benchmark [7,16], an open source C++ framework specifically designed to test CCL algorithms.In order to incorporate a standard well-known implementation of bitonal images, the benchmark has been integrated with Leptonica, an open-source image processing library employed in several projects (e.g.Tesseract OCR by Google).The extended version of the YACCLAB benchmark, including the proposed algorithm implementation, is available at https://github.com/prittt/YACCLAB.
Experimental results discussed in the following were obtained on an Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz with Windows 10.0.17134 (64 bit) OS and MSVC 19.15.26730 compiler.Our proposal is evaluated on three datasets: Fingerprints, Medical and Tobacco800, which cover the most common CCL application fields, a full description can be found in [2].Fig. 4 highlights how the performance of algorithms is influenced by the different phases they are composed of: memory management, first scan and second scan.
The selected algorithms for comparison are SAUF, BBDT, Spaghetti, and the CCL implementation available in Leptonica.The first three algorithms, mentioned in Section 2, represent the state of the art regarding 1-byte per pixel images; for a fair comparison, their first scan times also include a conversion of the input to their preferred format.SAUF 1BPP and SAUF4 1BPP, finally, are the 1-bpp algorithms introduced in this paper.The former is a simple adaptation of SAUF, which iterates over the eight pixels stored in each byte; the latter employs a decision tree generated starting from the mask of Fig. 1c, employing the action generation algorithm introduced with this paper.All the algorithms employ the classic union-find label solver [32].
The Leptonica algorithm is based on a seedfill approach which, as can be observed, is extremely inefficient when connected components extend vertically (e.g.Fingerprints), causing a series of non cache-friendly memory accesses.On the other hand, when small size CCs constitute the images, Leptonica has comparable performance with SAUF 1BPP.
The main proposal of this work, SAUF4 1BPP, considerably exceeds the performance of Leptonica, with a speedup ranging from 1.13 to 4.81 depending on the dataset, and thus represents the currently most efficient CCL algorithms designed to work on bitonal images.Moreover, SAUF4 1BPP has comparable performance to Spaghetti (current state of the art for CCL on binary images), when the latter needs a prior conversion of the input.However, SAUF 1BPP only requires about 1 /9× memory for the input data, making it an excellent choice for use cases where memory size is constrained.

Conclusion
An effective solution to automatically map a connected components labeling scan mask configuration with the actions to be performed has been presented, which

Fig. 3 .
Fig.3.Example of the proposed action(s)-generation algorithms when applied to the mask configuration depicted in (a).Gray squares identify foreground pixels inside the mask.The pixels to be labeled using the information extracted from all the others are x0, x1, x2, and x3.Together they constitute the inner part of the mask.Remaining pixels are the outer part of the mask.Dashes identify meaningless values.