Introduction

The Function Cacher project (https://gitlab.com/philipp.friese/function-cacher) provides a way to cache the result of a (e.g. computation or IO-heavy) function to a provided storage location, typically on disk.

In comparison to Python’s integrated functools.cache, this approach provides caching outside of a programs lifetime. The (originally) intended use is caching results of long-running database queries during rapid-prototyping data analysis.

The cacher provided in this project can be used as follows:

from functionCacher.Cacher import Cacher
cacher_instance = Cacher()

@cacher_instance.cache
def myfunc(arg1, arg2):
   return [arg1, arg2]

The function cacher caches parameter-based; calling myfunc twice with the same arguments will lead to a cache hit on the second invocation. This is also true for invocations across program restarts (e.g. because your analysis script crashed).

Features

  • cache function results to disk

  • cache compression (LZMA, ZSTD)

  • HMAC-verification of cache file (do not forget to set a proper hmac_key during Cacher initialisation!)

  • cache decorator

  • automatic cache ID generation, based on

    • function parameter

    • function name

    • function source code

Design Considerations

The function cacher assumes that functions are pure in the sense that \(\forall x, y: x = y \Longrightarrow f(x) = f(y)\). Conversely, if the result of a function \(f\) changes unrelated to its provided parameter (for example a database query aggregating new results due to inserts), then the cacher will not catch this!

In addition, the function cacher pays no attention to the number of cached entries, as opposed to e.g. @functools.lru_cache.

The results of a cached function are stored to an initially provided cache path on the file system. This allows the function cacher to store results even across script invocations. The main benefit of this approach (as opposed to caching to RAM) is that caches are retained even on script crash, which may be very useful during rapid-prototyping involving heavy computation or database queries.

Implementation sketch

The function cacher is designed around the Pickle module of Python. HMAC is used to verify cache integrity. On function invocation, the function call is intercepted, its parameters and the source code of the calling function are hashed and compared to a list of cache files. If a matching cache file is found and its age does not exceed the specified threshold (default: 24h), then the result is read from cache and returned. If not, then the calling function is invoked, the result written to disk, and returned to the caller.

Background

The function cacher has been developed for about four years and was open-sourced in September 2023. The original usage was a data analytics scenario involving large database queries and Python-based data exploration. In order to reduce prototyping/ developing time and reduce load on the (back-then under-powered database), the function cacher was developed.

License

This project is licensed under the MIT licence. See LICENSE file.