On Tue, Sep 28, 2021 at 11:22 AM Peter Robinson pbrobinson@gmail.com wrote:
On Mon, Sep 20, 2021 at 9:24 PM stan via test test@lists.fedoraproject.org wrote:
On Mon, 20 Sep 2021 08:39:38 -0600 Chris Murphy lists@colorremedies.com wrote:
Could you file a bug and include: e2fsprogs version; kernel version for initial use (installation); kernel version the problem appears with, if different; and compete dmesg.
Once filed, post the URL here. Thanks
I installed from a F35 netinstall image, and just continued with Rawhide after branching. Because I think the disk is failing, I am reluctant to mess around with it. Since I won't be able to get the data you want, I cloned (rsync) the problem OS install into a partition on another disk. It has been running for several hours with no problems. I've run updates without a problem. I'll keep monitoring, but I think this points strongly to the disk as the issue.
I was surprised that I was able to read from the problem disk, but it performed flawlessly. It seems to indicate that writes are problematic, but not reads. When I get a replacement, and have it working, I'll try booting the problem OS again.
No bugzilla because so far it seems to be a false alarm.
For reference I've had my primary laptop SSD die, I hadn't seen any issues with it and this morning the ext4 FS was RO and on reboot it couldn't find any OS. Not sure if it's the NVME or related to this, it was F35 upgraded from F34 and hadn't exhibited any issues prior. I've not had time to investigate.
Offhand I'm not seeing any suspicious issues reported upstream, on linux-ext4@ - does anyone have a complete dmesg for the boot during which the fs went read only?
SSDs manifest a wide range of failure modes. It's fairly common for the early failure warning to be transient corruption, either zeros or garbage, when reading a sector. And then eventually the whole drive will either report zeros, garbage, or just vanish from the bus. Whether Btrfs or ext4, you're going to experience grief. But on Btrfs it'll tend to complain much earlier, just because it's also checksumming data, and data is a much larger target for corruption than the file system itself.