Challenge24 2013 writeups

Electronic contest - G. Trains

Generating the input files

We've developed a small C program, about 400 SLoC for mixing the samples for the input wavs. It reads a simple event tab on the standard input and writes raw sound on the standard output. Events are click sounds, rail sound, white noise and various background noises. Clicks and background noises happen at predefined times at a specific fixed volume. Rail sound and noise are rendered with varying volume (using trapezoid functions: linear increase/decrease or constant volume).

Event tabs are generated by an awk script (~170 lines) that first reads the UDoW then reads its standard input for generic settings and trains. Trains are given almost in the output format; the only difference is that we have named the cars and tried to arrange that locomotives are pulling or pushing the trains. The speed of the train is also given for each train.

Listening to and cutting the samples took more time than developing the generator. There are a few tricks only. For example the rail sound is coming from a sample that is very short - recordings won't have clear and loud rail sound without wheel clicks for more than a few seconds. We needed a part that could be looped without introducing extra clicks; while looping the mixer cheats with the length of the sample: sometimes it cuts off some from the end or the start. This helps reducing noticeable periodicity of the sound when listening to the wavs.

solver

stage 1: filter

After ignoring the first 100 bytes (the header, I was too lazy to look at the actual size and the files start with noise anyway) the differences between adjacent samples are calculated, then an integral filter is run on that and the stream is then passed to a low pass filter. Constants tweaked on the example input is good enough for majority of the inputs. The result of this filter is an almost clean hill for each click.

stage 2: splitting by train

Some parameters may vary from train to train. To simplify rest of the solver, the best is to split it into trains by the output of the first stage: a train starts a second before the first click and stops if there are no clicks for some time.

Output of the filter after splitting (first train of the example input):

stage 3: DC shift

Calculate the average of the stream for each train and DC shift down with the average. This effectively kills most of the noise. Unfortunately our solver did this before splitting by train, which is less efficient if there are different click sounds and noises per train.

stage 4: wheel detection

Try to find the top of each hill - there should be a click. The simplest working method we found is this: when samples are going up, increase a counter (up to a predefined max) and when they are going down, decrease the counter (down to a predefined min). When it first reaches top, and the top is above 45% of the height of the last hill detected, it's a click; emit an event with a time stamp and wait until the counter goes back to min before next triggering.

A plot of the wheel count for the same train:

The approximate timing of clicks (output of the wheel detection):

stage 5: wagon and train recognition

We convert the UDoW and the event list into prolog and let prolog figure what train at what speed would produce that series of clicks. Since the filter is written in C, the wheel detection in awk and the glue is in shell, we decided against introducing only one new language (prolog): the converters are written in sed and tcl.

The prolog program tries to match the first few clicks on each wagon while speed is also unknown. Upon match, it assumes the speed it found and matches the rest of the wagons using that value.

Because of the errors introduced by all the filters and the very short period for measurement, measurement of the speed by only one wagon is not precise. Thus the solver matches speed for later wagons with some tolerance. If the tolerance is too high, multiple similar wagons will match; if it's too low, no wagon will match. The prolog program is configured to report all possible solutions for a train with the given tolerance.

There's no single good value for the tolerance, different values work for different trains. A shell wrapper around prolog does a linear search on some tolerance from higher values toward lower values until prolog outputs only one solution.