Challenge24 2013 writeups

Electronic contest - D. Octal CNC

We were considering an OCR task for challenge24 for some time. We had to invent our special font to make sure using existing OCR software is not a major advantage. The decision was made in favor of a small alphabet: we needed more digits than 2 (binary) but much less than the English alphabet.

The first step was writing the generator. Output should look like hand written, simulating all sort of glitches and noises. Characters are built from two primitives: circles and lines. The following chart shows the parameters of the generator and their effect on the primitives.

On a higher level there is a set of document-parameters controlling:

how much characters are rotated;
noise on the horizontal spacing between characters;
waving of text lines;
borders.

This kind of task is easy to solve by eye and hard to solve by software. The bias toward software solution is set by the huge number of characters in each input. However, if documents are large, a single software failure may render hours of work useless (with a reward of -5 points after the submission). This why the CRC digit is included at the end of each line: teams can check document integrity before submission.

This task is also an example on how important finding the balance is: a relatively simple solver can solve most of the inputs with only a few errors per document and can solve even the hardest inputs with only about 1 error per line in average. On the EC it's probably cheaper to make the solver detect these errors and recognize those characters manually than to aim for a perfect solver.

An interesting aspect of the generator is the background noise. Input images are large and we had to make sure contestants can download the problem set in time. Without noise, PNG compression works well as most of the image consists of large white areas. But with true white noise the resulting files are huge. The workaround for this problem is a special, semi-periodic noise: 3 buffers of random numbers are generated, each at length of 80 +- 2 characters (random length). The background noise is these 3 buffers pasted in random order. This helped PNG compression to cut back on file size and as a bonus sometimes the generated noise looks more like a paper texture.

Hidden feature: the binary stream encoded is actually 7-bit ASCII text containing different CNC programs (e.g. gerber) or vector files (e.g. ps). Error correction is also possible looking at this level of the content.

A simple solver

stage 1: noise filter and binarization

Our first working solver is a set of hacks. The first stage is a minimalistic noise filter that decreases the resolution of input images to 1/4 and decides whether the given 4x4 block is black or white using a simple threshold - this would result in an 1-bit image.

stage 2: extracting characters

The next step is determining which pixels belong to the same character. We found three simple methods here: floodfill and floodfill and hammer. The first is floodfilling the background - this would take a lot of time. The second is taking the first black pixel and run floodfill on that then tag each black pixel with a unique "island" ID. Both floodfill methods require a threshold or radius since the first stage filter will leave some white pixels cutting characters in parts.

An even simpler (and potentially faster) method, the hammer, is just taking each black pixel and check whether it is close-enough to an already tagged island. If it is, it should get the same island ID, if it is not, a new ID is allocated. A corner case here is when two islands get too close to each other (for example on the bottom of a "V" shaped character assuming top-down scanning). On this case the two islands shall be merged using a single island ID.

We called them islands, not characters: there's still no guarantee an island of black pixel is a full character. Especially on the hard inputs where ink tends to fade faster, there are cases where a character falls apart into two smaller islands. In theory it's possible to fix this in a later stage: once all properly recognized characters are removed from the image, the rest can be combined based on some simple "which island is close to which other island" heuristics. However, it turned out there are only a few characters are affected and we found it cheaper to manually recognize them.

The output of this stage is a set of relatively small buffers, each containing a potential character and some meta-data (coordinates of the top-left corner, size of the bounding box). This how one of them looked like:

.#####..........###..
#.....#.......#.#..#.
#....#........#....##
#..#############....#
#..#.#........#....#.
..#..##.......#....#.
.#####.........#####.
..#..................
...#.................
...#.................
...#.................
...#.................
...#.................
..#..................
.#.###...............
#.#..#...............
#..#..#..............
#.....#..............
#....#...............
#.###.#..............
....#................

stage 3: recognition of characters

Our first solver used a very simple method:

find the center of the three circles
determine which pairs of circles are likely to have a line in between
hash all the above and compare to saved hashes

For finding the circles a ring shaped function is calculated from each pixel of this small buffer. The inner and outer radius of the ring is chosen that most pixels of a legal circle is within the ring when the ring is applied to the center of the circle. The function is: sum of radius^2 of all black pixels in the ring.

Once calculated for each pixel the result is a height map which has peaks at the locations that look like circles. If the parameters are chosen carefully, this will remain true even if the circle is a bit egg-shaped, doesn't close, varies in line width, crossed by lines etc.

However, one circle will result in multiple peaks around the center. To select the three locations look most like circles, we pick the highest peak then pillage the hills to zero around that location with the radius of the largest expected circle (MAXR). Repeating this three times results in the center of the three circle with high probability. There are a minor noise in the LSB of the actual coordinates, tho: if a line was crossing the circle from the left, the detected center is slightly tilted to the left as the ring function sensed more pixels that side. This will not affect recognition.

After the three circle centers are identified, a very simple method can detect lines between them: travel from one center to another (using some line draw algo) with a rectangle and sum black pixels within the rectangle. If the number of black pixels is above a threshold, there is a line. Finding a good rectangle size and threshold value is not that hard, the difference is really large even with thin lines between the circles.

NOTE: this method will ignore any line that is not between two circles. Characters 0, 1, 6 and 7 have one such line. We've already used those lines tho: they helped keeping parts of the character together in stage 2. Later, for hashing, they wouldn't make any difference.

For the hash, circles need to be sorted. Sorting by x or y coords is not very efficient because of the +-15 degree rotation. For example if circles are sorted by y first then x, slight rotation of digit 2 will yield different ordering of the top-left and top-right circles. Instead, this solver sets up a 3x3 grid over the buffer and decides in which cell a circle center is. Since the buffer is created using the bounding box of the character, it's impossible that the center of a circle is near to the edge. Thus the grid is slightly uneven: it is calculated with a border of MAXR. Sorting the circles by grid cell ID will guarantee a fixed order tolerating some rotation.

After calculating the hash, the output is a list of the recognized characters, unrecognized characters and the meta-data for each (coordinates, size).

stage 4: assembling lines

Take the top-left character (let's call this A) by coordinate - this is the first character of the first line - and remove it from the list. Select the nearest character (B) for which the following conditions are true:

the top-left corner of B is to the right of the bottom-right corner of A
the top-left corner of B is above the bottom-right corner of A
the distance between the top-left corner of B and the bottom-right corner of A is not above some threshold (2..3 times the width of a character)

If B is found, it is appended to the line buffer and is removed from the list. If B is not found, it's the end of the line - check CRC, save the line.

A more professional solver

The solve.sh script uses imagemagick identify and convert to get resolution and produce a raw image. Then this resolution and raw image name are passed as command line arguments to the solver_a executable. That uses its built in image processing functions to extract presumably useful data from the image. Finally the recognition of the symbols is done by doing projections to several axes and comparing this in a normalized way to the previously generated golden samples with a tolerance.

High lebel steps inside the code:

Loading files
Apply a 9x9 gaussian filter with sigma of 2
Compute histogram and binarize based on this by the althego method and a parameter of -0.1 (step towards dark from the max until value is 10% of max)
Morphological closing (erosion and dilation) with a small kernel to eliminate residual noise, then a dilation with a bigger kernel to fuse together the potentially separated parts of the symbols
Label every closed component uniquely
Compute area, center of mass, bounding box for each closed component, discard the ones that are too small or too big (3100, 15000), find the lines on the image, crystal growing from objects with radius tolerance of 1.92, angle tolerance of pi/7, then use the result to assign the remaining objects to lines and order them inside lines, get the lowest inertia axis and roundness of each object, finally compute object projections to this axis the one perpendicular and two more rotated with additional 45 degrees
Do the recognition by comparing the projections in a normalized way with 64 uniformly distributed samples by projection and signal as relative to the maximum of the given projection, maximum error tolerance is 0.2, retry limit bending factor is 0.15, errors too near warning is 0.1
Output read lines, hopefully with numbers 0..7, U means Unknown when no match is close enough, warnings are generated separately if matches are too close to each other