Challenge24 2013 writeups
Electronic contest - G. Trains
Generating the input files
We've developed a small C program, about 400 SLoC for mixing the samples
for the input wavs. It reads a simple event tab on the standard input
and writes raw sound on the standard output. Events are click sounds,
rail sound, white noise and various background noises. Clicks
and background noises happen at predefined times at a specific
fixed volume. Rail sound and noise are rendered with varying
volume (using trapezoid functions: linear increase/decrease or
constant volume).
Event tabs are generated by an awk script (~170 lines)
that first reads the UDoW then
reads its standard input for generic settings and trains. Trains are
given almost in the output format; the only difference is that we
have named the cars and tried to arrange that locomotives are pulling
or pushing the trains. The speed of the train is also given for each
train.
Listening to and cutting the samples took more time than developing the
generator. There are a few tricks only. For example the rail sound is
coming from a sample that is very short - recordings won't have
clear and loud rail sound without wheel clicks for more than a few seconds.
We needed a part that could be looped without introducing extra clicks; while
looping the mixer cheats with the length of the sample: sometimes it cuts
off some from the end or the start. This helps reducing noticeable periodicity
of the sound when listening to the wavs.
solver
stage 1: filter
After ignoring the first 100 bytes (the header, I was too lazy to look at
the actual size and the files start with noise anyway) the differences
between adjacent samples are calculated, then an integral filter is
run on that and the stream is then passed to a low pass filter. Constants
tweaked on the example input is good enough for majority of the inputs.
The result of this filter is an almost clean hill for each click.
stage 2: splitting by train
Some parameters may vary from train to train. To simplify rest of the solver,
the best is to split it into trains by the output of the first stage:
a train starts a second before the first click and stops if there are no
clicks for some time.
Output of the filter after splitting (first train of the example input):
stage 3: DC shift
Calculate the average of the stream for each train and DC shift down with
the average. This effectively kills most of the noise. Unfortunately our solver
did this before splitting by train, which is less efficient if there are
different click sounds and noises per train.
stage 4: wheel detection
Try to find the top of each hill - there should be a click. The simplest
working method we found is this: when samples are going up, increase
a counter (up to a predefined max) and when they are going down, decrease
the counter (down to a predefined min). When it first reaches top, and
the top is above 45% of the height of the last hill detected, it's a click;
emit an event with a time stamp and wait until the counter goes back to min
before next triggering.
A plot of the wheel count for the same train:
The approximate timing of clicks (output of the wheel detection):
1.48989
1.60354
1.7061
2.38451
2.48576
2.60848
3.09878
3.16574
4.40243
4.48476
4.71147
4.77825
6.01333
6.08537
6.31447
6.38798
7.62107
7.69687
7.92211
7.99288
9.23007
9.30315
stage 5: wagon and train recognition
We convert the UDoW and the event list into prolog and let prolog
figure what train at what speed would produce that series of clicks. Since
the filter is written in C, the wheel detection in awk and the glue is in shell,
we decided against introducing only one new language (prolog): the converters
are written in sed and tcl.
The prolog program tries to match the first few clicks on each wagon while
speed is also unknown. Upon match, it assumes the speed it found and
matches the rest of the wagons using that value.
Because of the errors introduced by all the filters and the very short
period for measurement, measurement of the speed by only one wagon is
not precise. Thus the solver matches speed for later wagons with some
tolerance. If the tolerance is too high, multiple similar wagons will
match; if it's too low, no wagon will match. The prolog program is
configured to report all possible solutions for a train with the given
tolerance.
There's no single good value for the tolerance, different values work
for different trains. A shell wrapper around prolog does a linear search
on some tolerance from higher values toward lower values until prolog
outputs only one solution.