Detect I/O failures on Tails partition(s) (!1427) · Merge requests · tails / tails

hefee requested to merge 5856-detect-io-failures into stable Feb 27, 2024

Example logs of errors when trying to create a Persistent Storage, caused by hardware problems:

kernel: sd 6:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=31s
kernel: sd 6:0:0:0: [sdb] tag#0 Sense Key : Unit Attention [current]
kernel: sd 6:0:0:0: [sdb] tag#0 Add. Sense: Not ready to ready change, medium may have changed
kernel: sd 6:0:0:0: [sdb] tag#0 CDB: Write(10) 2a 00 04 cc 80 00 00 00 40 00
kernel: I/O error, dev sdb, sector 80510976 op 0x1:(WRITE) flags 0x800 phys_seg 8 prio class 2
kernel: Buffer I/O error on dev dm-0, logical block 7962624, lost async page write

(from !1427 (comment 228649))

Feb 21 12:03:13 amnesia kernel: JBD2: I/O error when updating journal superblock for dm-0-8.
Feb 21 12:03:13 amnesia kernel: EXT4-fs error (device dm-0): ext4_journal_check_start:83: comm python3: Detected aborted journal
Feb 21 12:03:13 amnesia kernel: sd 10:0:0:0: [sdc] tag#0 device offline or changed
Feb 21 12:03:13 amnesia kernel: I/O error, dev sdc, sector 16809984 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 2
Feb 21 12:03:13 amnesia kernel: Buffer I/O error on dev dm-0, logical block 0, lost sync page write
Feb 21 12:03:13 amnesia kernel: EXT4-fs (dm-0): I/O error while writing superblock
Feb 21 12:03:13 amnesia kernel: EXT4-fs (dm-0): Remounting filesystem read-only

(from #5856 (comment 228671))

kernel: critical target error, dev sdc, sector 21266432 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 2

Execute the following python script as root to trigger a fake I/O error (select one of the three):

from systemd import journal
journal.send("SQUASHFS error: A fake error.", SYSLOG_IDENTIFIER="kernel", PRIORITY=journal.LOG_ERR) # NOT trigger _update_patterns; create /var/lib/live/tails.disk.ioerrors
journal.send("A fake I/O error.", SYSLOG_IDENTIFIER="kernel", PRIORITY=journal.LOG_ERR) # trigger _update_patterns; DON'T create /var/lib/live/tails.disk.ioerrors
journal.send(""EXT4-fs error (device dm-0)", SYSLOG_IDENTIFIER="kernel", PRIORITY=journal.LOG_ERR) # trigger _update_patterns; create /var/lib/live/tails.disk.ioerrors

Manual tests

start ISO in VM, trigger fake I/0 errors.
Check that boot_device is correct
start USB image in VM, trigger fake I/0 errors.
Check that boot_device is correct
start USB image with tps activate in VM, trigger fake I/0 errors.
Check that boot_device is correct

Skipped tests, as we don't have the faulty hardware

tests with real faulty USB sticks
Check that boot_device is correct
tests with real faulty ISOs
Check that boot_device is correct

Tests that may be nice, but out of scope

a solution mentioned by boyska in !1427 (comment 229941) would give us a solution to test this. But I think this is more a case for #15451.

start from USB stick without tps, while creating the tps an I/O happen
Check if we detect it (unclear if udisk already know the device name)
boot with tps created before, while mounting the tps an I/O happen
Check if the error is detected (unclear if udisk already know the device name)

Closes #5856 (closed)

Edited Mar 27, 2024 by hefee

Detect I/O failures on Tails partition(s)

Merge request reports