May 8, 2021
The use-case might be a little specific but I’ll describe it nonetheless, hoping it may help someone. A while ago I read Practical Forensic Imaging by Bruce Nikkel and tried out the
sfsimage script that he presented therein.
The idea is as simple as it is brilliant: combine open-source disk recovery tools like
dc3dd with the great compression and usability of readonly
squashfs filesystems. The imaged disk is piped directly into a virtual file inside the archive and some metadata of the acquisition is added alongside it to create a “forensic image”. This image can be mounted and inspected without elevated privileges using
squashfuse – all the while usually only taking a fraction of the original disk size.
Yesterday I migrated one of my small virtual servers at Hetzner to a new one – no big deal in itself. But before I finally deleted the server, I wanted to have a complete “backup image” – just in case I forgot to migrate something. Sure, I could just archive the entire filesystem with
zip. But a ZIP can’t be sent back over SSH on-the-fly and a TAR is not really browseable without first decompressing the entire thing – meh. Furthermore, neither would allow me to restore the entire machine simply by writing out the disk image.
You could of course always use a “reverse” SSHFS mount and just not bother with piping your archives through standard output. Courtesy of boltblog:
dpipe /usr/lib/ssh/sftp-server = ssh vps.example.com sshfs \ :/path/on/local/machine /tmp/mnt -o slave
Either way, there’s so many different ways you could generate some sort of useful data on standard output. So the obvious solution was to just image the entire disk with
dd and mount it in a loopback device if I should need it later. Luckily Hetzner allows you to reboot into a rescue environment, where your disk is completely quiescent.
Speaking in pseudofiles #
I took a look at
sfsimage to see how the tool writes disk image files and it turns out that
mksquashfs has a concept of pseudofiles to create block and character devices in the archive without elevated privileges. It also allows you to use the output of a command as a file’s contents. In order to image a disk you would use
dc3dd and omit the
of= argument so it writes to standard output. If you simply used
cat, the file would contain whatever you pipe into the
mksquashfs command. A simple example that writes the current date to a file called
now inside the archive might look like this:
date | mksquashfs /dev/null archive.sqsh -p "now f 644 0 0 cat"
/dev/nullthere? This will end up writing a character special file called
nullin the archive. To avoid that, you can use an empty directory; this is exactly what
On-the-fly checksumming #
At this point I wanted to add a signed checksum of the raw disk image, so that I could theoretically verify that the image has not been altered later. I don’t really have any particular reason for that; it’s more a case of “just because I can”.
Preferably, you would do that without re-reading the entire disk image a second time, which would effectively double the time required. You cannot simply use a second pseudofile either, because the first command “gobbles” up the entire input until EOF. The solution is to split the input stream with
tee and use a process substitution, which will write the checksum to a temporary file. That file can then be signed and quietly appended to the squashfs archive. (You could probably split into additional file descriptors and avoid the temporary file but I haven’t tried that yet.)
sfsimage again, there is already an example of using “plain old
md5sum all in a single pipe. However, at this point the somewhat inflexible configuration and usage of
sudo thorughout the script (it assumes that you want to image a block device after all …) was annoying me and I wanted it simpler. So without much further ado, this is the simplified core in just a couple of lines:
ssh email@example.com dd if=/dev/sda |\ tee >(sha256sum | sed 's/-/disk.img/' >/tmp/disk.img.sha256) |\ mksquashfs /tmp/emptydir archive.sqsh -comp zstd -p "disk.img f 644 0 0 cat" \ && mksquashfs /tmp/disk.img.sha256 archive.sqsh -quiet
It does add some overhead but I believe that is mostly due to
mksquashfs contending for CPU cycles. Depending on your bandwidth you might also want to compress the stream remotely and decompress it again locally before archival; using the
-C option of
ssh is utterly useless though because it uses the same algorithm as
gzip – which actually made the transfer slower in my tests. Instead use
ssh firstname.lastname@example.org "dd if=... | zstd" | zstd -d |\ ...
Signing the checksum #
At this point it is trivial to just sign the checksum file however you like before you append the
disk.img.sig to the archive. I like
signify as a lightweight alternative to GPG – but you do you. Signify can’t directly sign the disk image because it limits the length of the message that you can sign.
Wrap it up #
In the end I wrote myself a small replacement script for
sfsimage that did just this one thing: checksum and write whatever it receives on standard input into a new squashfs archive. Hence, I’ve called the result
squashpipe and saved it to my dotfiles:
... | squashpipe archive.sqsh
The script handles
-n to specify the filename inside the archive,
-m to adjust the file mode and
-s to optionally sign the checksum with a
signify key as described above:
signify -Gn -p ~/key.pub -s ~/key.sec ... | squashpipe -n data -s ~/key.sec archive.sqsh
The checksum contains a BSD-style tag that
signify can verify directly. Mount the archive and verify the signature and checksum easily with:
mkdir mnt/ && squashfuse archive.sqsh mnt/ && mnt/ signify -Cx data.sig -p ~/key.pub