Transparent Decompression

If you have used Firefox on Android, you will be quite impressed to know it loads compressed code on demand into memory. In order to do this however a custom linker was required, which would handle a SIGSEGV and then retrieve compressed data from disk and then decompress it in the area reserved for it.

For the past few days, I have been working on getting the filesystem to do that for me transparently. What this means is that now we just SIGSEGV, and load uncompressed data directly. No messing around in userspace and complex error handling. I am almost ready to post v1 of the patchset.

The first step was to pick a compression scheme, which would lend itself easily to seeking and the other good properties we all like. Since one of our primary targets is firefox, we decided to use the same compression scheme as that. However, the implementation needs to not be tied to a compression type.

The seekable zip format is just zlib chunked into 16k chunks, each compressed individually (using zlib), and finally a custom header which has all of this information stored. There is a tool available to generate these here. Just build and use the szip tool to make compressed files.

The big challenge was how to modify the filesystem to handle these cases. For v1, complexity is evil. I have just modified the read path, where we do nothing different with an uncompressed file (except add another level of indirection). The fun starts once we know a file is compressed. First we check if we already have the header information. If not, retrieve it from disk and initialize it. Once it is setup, we then correct the offsets to point to the right chunk, retrieve the data from disk, uncompress and then copy into the userspace buffers. Userspace is not aware anything happened underneath.

The implementation is still tied to the szip format in this version, but it is just a question of code refactoring and designing interfaces which would make it compression scheme agnostic.

The bigger issue imo is that we are too high in the stack. Ideally one would want to be decompressing soon after getting data off the disk. One possible way is to map compressed and uncompressed data to different pages and expose only the uncompressed data. I will tackle these issues in the next version.

Using volatile ranges

Now that we have a kernel which can do volatile ranges, the next question is how to actually use this feature.

WARNING: THE INTERFACE HERE IS STILL BEING DESIGNED, SO THINGS MIGHT CHANGE IN THE FUTURE

The patchset provides a system call interface. So one could probably make vrange.h

#include <sys/syscall.h>
#include <sys/types.h>
#include <unistd.h>

#define SYS_vrange 314
#define VRANGE_VOLATILE 0
#define VRANGE_NOVOLATILE 1

static int vrange(unsigned long start, size_t length, int mode, int *purged)
{
return syscall(SYS_vrange, start, length, mode, purged);
}

int mvolatile(void *addr, size_t length)
{
return vrange((long) addr, length, VRANGE_VOLATILE, 0);
}

int mnovolatile(void *addr, size_t length, int *purged)
{
return vrange((long)addr, length, VRANGE_NOVOLATILE, purged);
}

Now that we have some code which can be used, let’s dissect this code fragment. Since we don’t have a vrange syscall in glibc yet, we have to make a vrange function which will call the syscall in a vrange enabled kernel. This system call takes 4 arguments. The first argument is the address where the volatile range starts. The next argument is the length that should be marked. The third argument is the key. It lets the kernel know whether to mark that range as VOLATILE or NOVOLATILE. The final argument is relevant only when we try to mark a range as NOVOLATILE. It lets the user know if a range had been purged while it was marked volatile.

When a range is marked a volatile, it enters what I like calling the schrodinger state. A user doesn’t know whether it is there or not, till it either marks it as non volatile or tries to access it. This leads to two ways of using this feature. One as we discussed the last time was to mark it as non volatile before use, and fixing up data if it had been purged. The second approach, which was suggested by Taras talks about sending a signal when accessing data which has been purged. Within the signal handler then one would mark it as non volatile, fix it up and let the application continue.

In order to get an idea about how to use these methods, take a look at the test cases in the git tree.

Volatile Ranges

John Stultz and Minchan Kim have been working on the volatile ranges feature in the Linux Kernel for sometime. One of my first tasks is to help them, so that the feature goes into the kernel.

This feature is very useful, especially in memory constrained devices. But this feature is also very useful on desktops as well. One use case that I think is really cool is that of Firefox image cache. This is how Taras explained it to me. Imagine you have a lot of tabs open, with sports illustrated open in each one of them. Everytime we switch tabs, all the decompressed + decoded images in the tab being switched from are dropped, while we decode and draw the images in the new tab. If we don’t do that memory use will just skyrocket.

With the use of volatile ranges, Firefox could consume less memory with the ability to safely use more memory when available. The kernel knows that it can get rid of the image cache if there is memory pressure, and in any other case it will let Firefox retain the caches and that should translate to faster tab switching.

The code might now look something like this,

switch_tab(prev, next) {
clean_up(prev);
setup(next);
}

clean_up(tab) {
vrange(tab_image_data, size, VOLATILE, NULL);
}

setup(tab) {
int ret;
vrange(tab_image_data, size, NON_VOLATILE, &ret);
if (ret == SUCCESS)
return;
else
decode_image_data(tab);
return
}

But now we have the additional overhead of a system call. Not everyone is happy with that, which brings us to another possible way of using volatile ranges. When a page is in schrodinger’s state (it might or it might not exist), if you try to access it, and it doesn’t exist, instead of faulting, instead you get a SIGBUS. Conceivably one could have a signal handler which will mark stuff as non-volatile, fix it up and then let the world continue on. This is a tricky feature to use, but it is very powerful. However that is a topic for another blog post

How do I help out?

In case you wish to help, here are the steps to follow

1. Build the kernel. The version that I am using at time of writing is at available here

2. Get the test cases. I have a git repository here

These test cases assume you have memcg mounted at /sys/fs/cgroup/memory/

Once you have the vranges kernel installed and booted, and the test cases built, run setup.sh which at some point in time will ask for sudo from you, and run all the test cases.

Let me know how it went.