When a legacy web system becomes noticeably slower, it is important to react before a critical service failure. Studio theYANG introduces reverse proxy caching on Apache, the easiest and the most effective way of relieving the load of your system. Here’s why and how.
Working with Legacy Systems…
There are typically four layers of caching, from backend to frontend: database caching, application caching, reverse proxy caching and browser caching. The sad fact of legacy systems is that we don’t have choices. We would apparently avoid an application-layer caching that risks the system’s functionality, which should always come first than speed. Browser caching won’t help when the load is high and an increased number of users are accessing the system.
Sometimes database caching can be considered, when the legacy system is built on a regular database, such as SQL, but not a legacy database, such as an old version of ZODB. This leaves us to the only option, reverse proxy caching, which relies on HTTP protocol that is always ubiquitous on the Internet, and dodges any chance that would risk the system’s functionality.
But please also note that this caching is not a perennial solution to the legacy system, as there is always the risk of an increased load, a swelled database or cyber attacks. A professional should be consulted for advice and accompaniment during the transition to a new system.
First, Identify the Bottleneck
Before actually stepping into caching, it is necessary to understand which requests have the top slowing-down factor. In Apache configuration, it is easy to log the time taken to serve each request by adding log variable `%T` or/and `%D`, in seconds and microseconds, respectively. To parse Apache log files, there is a great Python library `pip install apache-log-parser` that allows a copy-paste of log format string from Apache config to Python code and works out of box.
It is not surprising to sometimes find the bottleneck at static assets, such as CSS, JavaScript code, even images. Depending on the internal processing of these assets, a legacy system may not serve these assets directly like a modern web backend, but goes into a request-response logic that is hidden somewhere.
The performance bottleneck is more often caused by certain specific requests, somewhat respecting the 80/20 rule. A typical scenario is a GET endpoint serving mobile apps that can be called too frequently and block the server.
Check Prerequisites
For Apache, there is a list of rules that regulate the availability of caching. Most prominently, the request has to be a GET, and returns a success or redirect status code. But GET method accepts parameters, which lead to a special case for Apache… It is necessary to check out the documentation, as always. If a slow request happens to be a POST, as it happens in legacy systems, see if there’s a way to circumvent the problem or take a minor risk to change it to GET.
And don’t forget we’re working on a legacy system: the Apache version may not support certain caching functionalities! For example, the mod_cache‘s `CacheIgnoreQueryString` directive only exists after Apache 2.2.6 and this information doesn’t even appear in the doc of 2.4 — yes, anything could be possible with the most ignored reverse proxy part of the system and even patch number matters. Be vigilant.
Configure Apache
This step is apparent, and also easy when Apache speaks out the syntax errors while trying to reload. The performance of disk or memory caching shouldn’t differ much in our context, as caching itself is the main player here. The disk caching can be a starting point because it allows to monitor the size of cached content and gives an idea of whether the caching is working or not.
Ensure HTTP Caching Headers
The last step before actual testing is to ensure the correct HTTP headers. Apache or other reverse proxy, being uniquely a HTTP server, respects the HTTP norms very strictly. The ideology behind the caching that we are talking about is that the reverse proxy acts like a browser in terms of caching, so we have to check ETag, Last-Modified, Expires, `max-age` etc. Apache refuses to cache a request if it cannot infer how long it should keep the response cached. Therefore, little risk will sometimes need to be carried out to put this information in place.
Up to now, we should expect a more responsive system for a while. If you need consultation on this topic, please don’t hesitate to contact us for information.
