The process of reducing the size of data is known as “data compression,” and it is a deep field of study on its own: many people have spent their entire careers working on algorithms, techniques, and optimizations to improve compression ratios, speed, and memory requirements of various compressors. Needless to say, a full discussion on this topic is out of our scope, but it is still important to understand, at a high level, how compression works and the techniques we have at our disposal to reduce the size of various assets required by our pages.
To illustrate the core principles of these techniques in action, let’s consider how we can go about optimizing a simple text message format that we’ll invent just for this example:
# Below is a secret message, which consists of a set of headers in
# key-value format followed by a newline and the encrypted message.
format: secret-cipher
date: 04/04/14
AAAZZBBBBEEEMMM EEETTTAAA
-
Messages may contain arbitrary annotations, which are indicated by the “#” prefix. Annotations do not affect the meaning or any other behavior of the message.
-
Messages may contain “headers” which are key-value pairs (separated by “:”) and have to appear at the beginning at the message.
-
Messages carry text payloads.
What could we do reduce the size of the above message, which is currently 200 characters long?
-
Well, the comment is interesting, but we know that it doesn’t actually affect the meaning of the message, so we eliminate it when we’re transmitting the message.
-
There are probably some clever techniques we could use to encode headers in an efficient manner – e.g. we don’t know if all messages always have “format” and “date”, but if they did, we could convert those to short integer IDs and just send those! That said, we’re not sure if that’s the case, so we’ll leave it alone for now.
-
The payload is text only, and while we don’t know what the contents of it really are (apparently, it’s using a “secret-message”), just looking at the text seems to show that there is a lot of redundancy in it. Perhaps, instead of sending repeated letters, we can just count the number of repeated letters and encode them more efficiently?
-
E.g. “AAA” becomes “3A” - or, sequence of three A’s.
Combining our techniques, we arrive at the following result:
format: secret-cipher
date: 04/04/14
3A2Z4B3E3M 3E3T3A
The new message is 56 characters long, which means we managed to compress our original message by an impressive 72% - not bad, all things considered, and we’re only getting started!
Of course, you may be wondering, this is all great, but how does this help us optimize our web pages? Surely we’re not going to try to invent our compression algorithms, are we? The answer is no, we won’t, but as you will see, we will use the exact same techniques and way of thinking when optimizing various resources on our pages: preprocessing, context-specific optimizations, and different algorithms for different content.
Minification: preprocessing & context-specific optimizations