Using libfawk on untrusted user input

TL;DR: don't. The main goal of libfawk is delivering the smallest usable implementation of language core. Being small is generally incompatible with evading attacks - because mitigation usually takes extra code. Often non-portable code, like getting a secure random number.

That said, in case you are still going to use libfawk in context of untrusted user input, you should know about the following considerations.

single-c version

Always use the single-c-file version with #include. This setup allows you to override builtin defaults using #define or other techniques. The rest of the document depends on this.

Fixed seed of sip hash

libfawk uses a very simple hash function for arrays. If your script fills in arrays from user input, an attacker may craft input that delibetary causes hash collisions. This will ruin the performance of the hash table code and might be suitable for a DoS attack. See also: hash collision DoS attacks. This depends on the attacker being able to figure the hash seed.

Solution: the host app shall generate a random number that can not be guessed by the attacker and overwrite the value of the global variable unsigned libfawk_hash_seed.

Heap usage

Libfawk uses fawk_malloc()/fawk_calloc()/fawk_realloc()/fawk_free() to manage runtime dynamic memory. An extra first argument is always the script context (fawk_ctx_t*).

These are all macros that can be over-defined by the code that #includes the single-source version. The context pointer can be used to account allocations on a per script basis.

There is only one exception: a token stack within the byacc-generated parser that uses plain realloc(). However, this is not affected by user input, only by script source.

CPU usage

If complexity of calculations, e.g. number of loop iterations depends on user input, an attacker may trick the script to run for very long time, potentially blocking the host application from dealing with other tasks, or at least taking up too much CPU.

Solution #1: make sure the script limits its own CPU usage; do input sanity checks, refuse input that would cause too long calculations or abort operations if certain limits are reached.

Solution #2: when executing a fawk script, specify a reasonable run limit so the VM aborts the script if it tries to execute too many instructions. However, figuring the optimal limit might be hard.

Stack usage

If the script implements a recursive algorithm that depends on user input, an attacker may trick the script into recursing a lot. Libfawk uses plain C function calls for implementing fawk function calls, which means each function call in fawk has a footprint on the host application's C stack. Worst case it may grow or even exhaust the stack of the host app.

Solution #1: make sure the script limits its own recursion; do input sanity checks, refuse input that would cause too deep recursion or abort operations if certain limits are reached.

Solution #2: when executing a fawk script, specify a reasonable run limit so the VM aborts the script if it tries to execute too many instructions. This will indirectly limit recursion as entering and leaving functions spend a lot of instructions for managing parameters. However, figuring the optimal limit might be hard.