11

I'm modifying a bunch of initramfs archives from different Linux distros in which normally only one file is being changed.

I would like to automate the process without switching to root user to extract all files inside the initramfs image and packing them again.

First I've tried to generate a list of files for gen_init_cpio without extracting all contents on the initramfs archive, i.e. parsing the output of cpio -tvn initrd.img (like ls -l output) through a script which changes all permissions to octal and arranges the output to the format gen_init_cpio wants, like:

dir /dev 755 0 0
nod /dev/console 644 0 0 c 5 1
slink /bin/sh busybox 777 0 0
file /bin/busybox initramfs/busybox 755 0 0

This involves some replacements and the script may be hard to write for me, so I've found a better way and I'm asking about how safe and portable is:

In some distros we have an initramfs file with concatenated parts, and apparently the kernel parses the whole file extracting all parts packed in a 1-byte boundary, so there is no need to fill each part to a multiple of 512 bytes. I thought this 'feature' can be useful for me to avoid recreating the archive when modifying files inside it. Indeed it works, at least for Debian and CloneZilla.

For example if we have modified the /init file on initrd.gz of Debian 8.2.0, we can append it to initrd.gz image with:

$ echo ./init | cpio -H newc -o | gzip >> initrd.gz

so initrd.gz has two concatenated archives, the original and its modifications. Let's see the result of binwalk:

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
0             0x0             gzip compressed data, maximum compression, has original file name: "initrd", from Unix, last modified: Tue Sep  1 09:33:08 2015
6299939       0x602123        gzip compressed data, from Unix, last modified: Tue Nov 17 16:06:13 2015

It works perfectly. But it is reliable? what restrictions do we have when appending data to initfamfs files? it is safe to append without padding the original archive to a multiple of 512 bytes? from which kernel version is this feature supported?

0

3 Answers 3

12

It's very reliable and supported by all kernel versions that support initrd, AFAIK. It's a feature of the cpio archives that initramfs are made up of. cpio just keeps on extracting its input....we might know the file is two cpio archives one after the other, but cpio just sees it as a single input stream.

Debian advises use of exactly this method (appending another cpio to the initramfs) to add binary-blob firmware to their installer initramfs. For example:

DebianInstaller / NetbootFirmware | Debian Wiki

Initramfs is essentially a concatenation of gzipped cpio archives which are extracted into a ramdisk and used as an early userspace by the Linux kernel. Debian Installer's initrd.gz is in fact a single gzipped cpio archive containing all the files the installer needs at boot time. By simply appending another gzipped cpio archive - containing the firmware files we are missing - we get the show on the road!

4
  • Thank you @cas! I think the initramfs extracting code in kernel is clever than that because it works not only with cpio archives. We can have a combined gzipped and xzipped cpio with a cpio archive without compression... everything inside the initrd file. For example TAILS includes GenuineIntel.bin blob in a cpio archive at the beginning, and another cpio archive compressed with xz. Linux kernel seem to pass the whole file recognizing everything which has been concatenated. When an stream ends, it starts a new recognizing code for the next stream! Commented Nov 18, 2015 at 5:25
  • I read the Debian's wiki page for Netboot firmwares you post for me, but I don't understand the rationale behind delivering an initrd concatenated this way "from factory". Why do Intel blobs are packed apart (before) of the compressed initramfs and not inside the archive containing full initramfs tree? This blob must be the first file to be expanded? in this case (TAILS), they can arrange the file list for cpio to have this file before the rest... I can't find an initrd file of Debian that uses this feature of concatenate two or more gzipped cpio archives as their wiki states. Thanks Commented Nov 18, 2015 at 5:44
  • Debian doesn't do this themselves. As you say, they don't need to. This page contains instructions for users who want to add non-free firmware (e.g. for their NICs) to the initrd. Commented Nov 18, 2015 at 5:56
  • Ok, good! To customize; exactly what I'm doing.... TAILS and another distro which I don't remember do that. May be to obfuscate... Commented Nov 18, 2015 at 6:05
3

No, the existing initramfs needs to be padded in order to reliably continue parsing arbitrary archives appended to compressed archives.

Especially xz and "legacy frame format" lz4 compression are tricky and will fail 3 out of 4 cases - specifically, every time the byte count of the preceding archive is not divisible by 4. This generally goes unnoticed, as it is of no concern when placing raw format=newc cpio in front of a single compressed archive - the uncompressed form is always aligned.

While in theory the initramfs format is not specified beyond being a simple concatenation of (optionally compressed) archives, the padding can still be needed when the decompression routine (in some cases, by design) cannot tell where one archive ends and the next one starts. Some edge cases were improved in Linux release 5.14, others appear difficult if not impossible to unambiguously detect in the kernel. If additional data follows compressed archives, messages such as these indicate it was disregarded:

Initramfs unpacking failed: Decoding failed
Initramfs unpacking failed: invalid magic at start of compressed archive
Initramfs unpacking failed: broken padding

If compression was only applied to the last archive, these messages are harmless - nothing further was parsed, but there was nothing left to be parsed anyway.

1
  • I second the post, something is problematic with the naive initrd.gz appends. More clearly, it does not work. What I did as workaround: 1) I uncompressed the initrd.gz 2) I appended the cpio file with cpio -H newc -o -O initrd --apend 3) re-compressed with gzip. This solution reliably works, but it can not delete and possibly can not modify files in the original gzip. Commented Nov 13, 2023 at 22:43
1

As @anx mentioned the initrd members need to be padded to a four-byte boundary. As a result, to properly pad an initrd, the minimum sequence of commands is:

size(){ stat --printf %s "$1"; }
pad(){ dd if=/dev/zero bs=1 count="$1"; }

oldsize=$(size old.initrd)
to_pad=$(((oldsize+3) / 4 * 4 - oldsize))
cat old.initrd <(pad "$to_pad") new_content.cpio > new.initrd

(new_content.cpio can be other formats as well.)

That said, the "legacy frame format" lz4 relies on the EOF to tell where it ends. The linked commit from Linux 5.14 allows it to also be detected by the presence of at least 4 bytes of zero padding, so it is possibly more appropriate to err in the direction of more padding (allowed by the initramfs format document):

to_pad=$(((oldsize+3) / 4 * 4 - oldsize + 4))

Because the kernel still expects the legacy lz4 format in decompress_unlz4.c, you still need to keep this in mind.


I should point out that the answer by anx does not clearly differentiate two things: the compressed size being divisible by 4 and the possibility of detecting the end of a compressed stream.

  • xz can produce compressed sizes not divisible by 4 but allows for detecting the end without EOF. The same applies to lz4 new frame format and zstd -- really, almost every modern format.
  • lz4 legacy frame format can produce compressed sizes not divisible by 4 and cannot detect the end without EOF. The Linux 5.14 patch tacks on the possibility of detecting the end without EOF using a sequence of 4 zero bytes parsed as B.CSize in the legacy format.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.