Need help with btrfs.

List overview All Threads
Download

newer

older

Fedora 33 updates-testing report

Fedora Linux 34 Beta blocker...

George R Goffe

21 Feb 2021 21 Feb '21

9:15 p.m.

Hi,

I have converted a large ext4 filesystem to btrfs and by accident deleted an important users home directory. Reading btrfs doc seems to imply that the directory can be recovered without reverting the filesystem to ext4.

What I've read seems to be telling me to mount the subvolume ext2_saved and then mount the image file residing in that directory.

I am VERY new to btrfs and am seeing some troubling output (see below).

Here's what I've done and what I'm seeing. Could someone help me with this please?

Best regards,

George...

fc35-bash 5.1 ~# df /export/home /dev/sda8 1735405656 1722305136 9896620 100% /export/home # # # fc35-bash 5.1 ~#btrfs subvolume list /export/home/ ID 256 gen 27196 top level 5 path ext2_saved fc35-bash 5.1 ~# fc35-bash 5.1 ~# fc35-bash 5.1 ~# mount -t btrfs -o subvol=ext2_saved /dev/sda8 /ext2_saved fc35-bash 5.1 ~# fc35-bash 5.1 ~# fc35-bash 5.1 ~# ls -al /ext2_saved ls -al /ext2_saved total 1699142444 drwxr-xr-x 1 root root 10 Feb 16 06:07 . dr-xr-xr-x. 81 root root 12288 Feb 21 12:51 .. -r-------- 1 root root 1777055391744 Feb 16 06:07 image fc35-bash 5.1 ~# fc35-bash 5.1 ~# fc35-bash 5.1 ~# mount -t ext4 -o loop,ro /ext2_saved/image /ext4 mount: /ext4: can't read superblock on /dev/loop0. fc35-bash 5.1 ~# fc35-bash 5.1 ~# fc35-bash 5.1 ~# fc35-bash 5.1 ~# mount -t ext4 -o loop,ro /export/home/ext2_saved/image /ext4 mount: /ext4: can't read superblock on /dev/loop0. fc35-bash 5.1 ~# fc35-bash 5.1 ~# fc35-bash 5.1 ~#

Show replies by date

Samuel Sieb

21 Feb 21 Feb

9:21 p.m.

On 2/21/21 1:15 PM, George R Goffe via test wrote:

...

I have converted a large ext4 filesystem to btrfs and by accident deleted an important users home directory. Reading btrfs doc seems to imply that the directory can be recovered without reverting the filesystem to ext4.

First, you should never try converting important data without a backup...

...

What I've read seems to be telling me to mount the subvolume ext2_saved and then mount the image file residing in that directory.

I am VERY new to btrfs and am seeing some troubling output (see below).

Here's what I've done and what I'm seeing. Could someone help me with this please?

ls -al /ext2_saved total 1699142444 drwxr-xr-x 1 root root 10 Feb 16 06:07 . dr-xr-xr-x. 81 root root 12288 Feb 21 12:51 .. -r-------- 1 root root 1777055391744 Feb 16 06:07 image fc35-bash 5.1 ~# fc35-bash 5.1 ~# fc35-bash 5.1 ~# mount -t ext4 -o loop,ro /ext2_saved/image /ext4 mount: /ext4: can't read superblock on /dev/loop0.

I don't know much about btrfs, but what does "file /ext2_saved/image" say?

Chris Murphy

10:56 p.m.

On Sun, Feb 21, 2021 at 2:21 PM Samuel Sieb samuel@sieb.net wrote:

...

...
fc35-bash 5.1 ~# mount -t ext4 -o loop,ro /ext2_saved/image /ext4 mount: /ext4: can't read superblock on /dev/loop0.

I don't know much about btrfs, but what does "file /ext2_saved/image" say?

Good idea.

# file /mnt/btrfs/ext2_saved/image /mnt/btrfs/ext2_saved/image: Linux rev 1.0 ext4 filesystem data, UUID=09955941-c894-4a65-b262-7418d71b5b7a (extents) (64bit) (large files) (huge files)

And with it on /dev/loop0 # blkid /dev/loop0 /dev/loop0: UUID="09955941-c894-4a65-b262-7418d71b5b7a" BLOCK_SIZE="4096" TYPE="ext4"

-- Chris Murphy

Chris Murphy

9:26 p.m.

On Sun, Feb 21, 2021 at 2:16 PM George R Goffe via test test@lists.fedoraproject.org wrote:

...

Hi,

I have converted a large ext4 filesystem to btrfs and by accident deleted an important users home directory. Reading btrfs doc seems to imply that the directory can be recovered without reverting the filesystem to ext4.

Let's establish whether you deleted the file before or after conversion? Depending on exactly when it was deleted makes a big difference what the strategy for recovery is.

Once I know that I can answer more.

-- Chris Murphy

George R Goffe

10:51 p.m.

Chris,

Thanks for responding.

I'm pretty sure it was after conversion.

George...

On Sunday, February 21, 2021, 1:27:12 PM PST, Chris Murphy lists@colorremedies.com wrote:

On Sun, Feb 21, 2021 at 2:16 PM George R Goffe via test

test@lists.fedoraproject.org wrote:

...

Hi,

I have converted a large ext4 filesystem to btrfs and by accident deleted an important users home directory. Reading btrfs doc seems to imply that the directory can be recovered without reverting the filesystem to ext4.

Let's establish whether you deleted the file before or after conversion? Depending on exactly when it was deleted makes a big difference what the strategy for recovery is.

Once I know that I can answer more.

-- Chris Murphy

Chris Murphy

9:34 p.m.

On Sun, Feb 21, 2021 at 2:16 PM George R Goffe via test test@lists.fedoraproject.org wrote:

...

fc35-bash 5.1 ~# mount -t ext4 -o loop,ro /ext2_saved/image /ext4 mount: /ext4: can't read superblock on /dev/loop0.

Hmm. That's unexpected though. I'm going to try it in a VM. It should work.

-- Chris Murphy

Chris Murphy

9:48 p.m.

On Sun, Feb 21, 2021 at 2:34 PM Chris Murphy lists@colorremedies.com wrote:

...

On Sun, Feb 21, 2021 at 2:16 PM George R Goffe via test test@lists.fedoraproject.org wrote:

...
fc35-bash 5.1 ~# mount -t ext4 -o loop,ro /ext2_saved/image /ext4 mount: /ext4: can't read superblock on /dev/loop0.

Hmm. That's unexpected though. I'm going to try it in a VM. It should work.

Works for me. But I'm confused by your paths, i.e. are you really creating /ext2_saved and /ext4 in the root directory?

Assume root user:

mount /dev/vdb /mnt/btrfs mount -t ext4 -o ro,loop /mnt/btrfs/ext2_saved/image /mnt/ext4

That works for me. But you could alternatively try:

mount /dev/vdb /mnt/btrfs losetup -r /dev/loop0 /mnt/btrfs/ext2_saved/image blkid /dev/loop0

What file system does libblkid think it is? Could be ext2, ext3, or ext4, in which case mount -t ext4 exclusively mounts ext4 and will otherwise fail.

-- Chris Murphy

George R Goffe

11:16 p.m.

Chris,

/ext2_saved and /ext4 are in / but only temporarily.

I have "inlined" my responses prefixed with '#' below (is this ok to do?)

George...

On Sunday, February 21, 2021, 1:48:52 PM PST, Chris Murphy lists@colorremedies.com wrote:

On Sun, Feb 21, 2021 at 2:34 PM Chris Murphy lists@colorremedies.com wrote:

...

On Sun, Feb 21, 2021 at 2:16 PM George R Goffe via test test@lists.fedoraproject.org wrote:

...
fc35-bash 5.1 ~# mount -t ext4 -o loop,ro /ext2_saved/image /ext4 mount: /ext4: can't read superblock on /dev/loop0.

Hmm. That's unexpected though. I'm going to try it in a VM. It should work.

Works for me. But I'm confused by your paths, i.e. are you really creating /ext2_saved and /ext4 in the root directory?

Assume root user:

mount /dev/vdb /mnt/btrfs # on this system here vdb is sda8. /mnt/btrfs is /export/home

mount -t ext4 -o ro,loop /mnt/btrfs/ext2_saved/image /mnt/ext4 # result is unchanged: mount -t ext4 -o ro,loop /export/home/ext2_saved/image /mnt/ext4 mount: /mnt/ext4: can't read superblock on /dev/loop0.

That works for me. But you could alternatively try:

mount /dev/vdb /mnt/btrfs losetup -r /dev/loop0 /mnt/btrfs/ext2_saved/image # losetup NAME SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE DIO LOG-SEC /dev/loop0 0 0 0 1 /export/home/ext2_saved/image 0 512

blkid /dev/loop0 # this command responds with nothing # blkid --version responds with: blkid from util-linux 2.36.2 (libblkid 2.36.2, 12-Feb-2020)

What file system does libblkid think it is? Could be ext2, ext3, or ext4, in which case mount -t ext4 exclusively mounts ext4 and will otherwise fail.

# the file command responds with: # file /export/home/ext2_saved/image #/export/home/ext2_saved/image: ERROR: cannot read `/export/home/ext2_saved/image' (Input/output error)

-- Chris Murphy

Chris Murphy

22 Feb 22 Feb

midnight

On Sun, Feb 21, 2021 at 4:16 PM George R Goffe grgoffe@yahoo.com wrote:

...

...
On Sunday, February 21, 2021, 1:48:52 PM PST, Chris Murphy lists@colorremedies.com wrote:

...

That works for me. But you could alternatively try:

mount /dev/vdb /mnt/btrfs losetup -r /dev/loop0 /mnt/btrfs/ext2_saved/image # losetup NAME SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE DIO LOG-SEC /dev/loop0 0 0 0 1 /export/home/ext2_saved/image 0 512

blkid /dev/loop0 # this command responds with nothing

That happens if there's no signature or when running the command as an unprivileged user.

...

# the file command responds with: # file /export/home/ext2_saved/image #/export/home/ext2_saved/image: ERROR: cannot read `/export/home/ext2_saved/image' (Input/output error)

That's not good. What do you see in dmesg at the time of this error?

What about: e2fsck -fvn /dev/loop0

If this file is somehow damaged, I'm skeptical that rolling back the conversion to ext4 will succeed. But that's not the only option...

A bit involved but it's possible to use device-mapper to create an overlay block device, i.e. keep this ext2_saved/image file as read-only and redirect writes to a sparse file, thereby making change reversible. And then use e2fsck -b to specify a backup super block for repair.

Sooo :D this guide is designed to make it easy to setup many such overlays for RAID recovery. You could either use it as-is, or deconstruct it for just the single device you have - where your device is loop0 since it's an image file you're trying to recover.

https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Mak...

-- Chris Murphy

George R Goffe

8:39 a.m.

Chris,

I tried some commands and captured what /var/log/message "said" about each. I'm somewhat alarmed that these errors somehow crept into the image. There were NO outages or interruptions during the conversion. smartctl doesn't "say" anything about drive problems either... hence the alarm.

Regards,

George...

from file command on /export/home/ext2_saved/image:

Feb 21 18:50:29 fc35 kernel: BTRFS warning (device sda8): csum failed root 256 ino 257 off 0 csum 0xe05b8b2e expected csum 0xc90f1f63 mirror 1 Feb 21 18:50:29 fc35 kernel: BTRFS error (device sda8): bdev /dev/sda8 errs: wr 0, rd 0, flush 0, corrupt 149, gen 0

from the mount -t ext4 -o ro,loop /export/home/ext2_saved/image /mnt/ext4 command:

Feb 21 18:55:38 fc35 kernel: BTRFS warning (device sda8): csum failed root 256 ino 257 off 0 csum 0xe05b8b2e expected csum 0xc90f1f63 mirror 1 Feb 21 18:55:38 fc35 kernel: BTRFS error (device sda8): bdev /dev/sda8 errs: wr 0, rd 0, flush 0, corrupt 150, gen 0 Feb 21 18:55:38 fc35 kernel: blk_update_request: I/O error, dev loop0, sector 2 op 0x0:(READ) flags 0x1000 phys_seg 1 prio class 0 Feb 21 18:55:38 fc35 kernel: EXT4-fs (loop0): unable to read superblock

from losetup -d /dev/loop0 and losetup -r /dev/loop0 /mnt/btrfs/ext2_saved/image

Feb 21 19:34:00 fc35 kernel: loop0: detected capacity change from 3470811312 to 0 Feb 21 19:34:00 fc35 kernel: BTRFS warning (device sda8): csum failed root 256 ino 257 off 0 csum 0xe05b8b2e expected csum 0xc90f1f63 mirror 1 Feb 21 19:34:00 fc35 kernel: BTRFS error (device sda8): bdev /dev/sda8 errs: wr 0, rd 0, flush 0, corrupt 151, gen 0 Feb 21 19:34:00 fc35 kernel: BTRFS warning (device sda8): csum failed root 256 ino 257 off 0 csum 0xe05b8b2e expected csum 0xc90f1f63 mirror 1 Feb 21 19:34:00 fc35 kernel: BTRFS error (device sda8): bdev /dev/sda8 errs: wr 0, rd 0, flush 0, corrupt 152, gen 0 Feb 21 19:34:00 fc35 kernel: blk_update_request: I/O error, dev loop0, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 Feb 21 19:34:00 fc35 kernel: BTRFS warning (device sda8): csum failed root 256 ino 257 off 0 csum 0xe05b8b2e expected csum 0xc90f1f63 mirror 1 Feb 21 19:34:00 fc35 kernel: BTRFS error (device sda8): bdev /dev/sda8 errs: wr 0, rd 0, flush 0, corrupt 153, gen 0 Feb 21 19:34:00 fc35 kernel: blk_update_request: I/O error, dev loop0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 Feb 21 19:34:00 fc35 kernel: Buffer I/O error on dev loop0, logical block 0, async page read Feb 21 19:34:00 fc35 kernel: BTRFS warning (device sda8): csum failed root 256 ino 257 off 0 csum 0xe05b8b2e expected csum 0xc90f1f63 mirror 1 Feb 21 19:34:00 fc35 kernel: BTRFS error (device sda8): bdev /dev/sda8 errs: wr 0, rd 0, flush 0, corrupt 154, gen 0 Feb 21 19:34:00 fc35 kernel: blk_update_request: I/O error, dev loop0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 Feb 21 19:34:00 fc35 kernel: Buffer I/O error on dev loop0, logical block 0, async page read Feb 21 19:34:00 fc35 kernel: BTRFS warning (device sda8): csum failed root 256 ino 257 off 0 csum 0xe05b8b2e expected csum 0xc90f1f63 mirror 1 Feb 21 19:34:00 fc35 kernel: BTRFS error (device sda8): bdev /dev/sda8 errs: wr 0, rd 0, flush 0, corrupt 155, gen 0 Feb 21 19:34:00 fc35 kernel: blk_update_request: I/O error, dev loop0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 Feb 21 19:34:00 fc35 kernel: Buffer I/O error on dev loop0, logical block 0, async page read

from file command on /export/home/ext2_saved/image:

from the mount -t ext4 -o ro,loop /export/home/ext2_saved/image /mnt/ext4 command:

from losetup -d /dev/loop0 and losetup -r /dev/loop0 /mnt/btrfs/ext2_saved/image

On Sunday, February 21, 2021, 1:48:52 PM PST, Chris Murphy lists@colorremedies.com wrote:

On Sun, Feb 21, 2021 at 2:34 PM Chris Murphy lists@colorremedies.com wrote:

...

On Sun, Feb 21, 2021 at 2:16 PM George R Goffe via test test@lists.fedoraproject.org wrote:

...
fc35-bash 5.1 ~# mount -t ext4 -o loop,ro /ext2_saved/image /ext4 mount: /ext4: can't read superblock on /dev/loop0.

Hmm. That's unexpected though. I'm going to try it in a VM. It should work.

Works for me. But I'm confused by your paths, i.e. are you really creating /ext2_saved and /ext4 in the root directory?

Assume root user:

mount /dev/vdb /mnt/btrfs mount -t ext4 -o ro,loop /mnt/btrfs/ext2_saved/image /mnt/ext4

That works for me. But you could alternatively try:

mount /dev/vdb /mnt/btrfs losetup -r /dev/loop0 /mnt/btrfs/ext2_saved/image blkid /dev/loop0

What file system does libblkid think it is? Could be ext2, ext3, or ext4, in which case mount -t ext4 exclusively mounts ext4 and will otherwise fail.

-- Chris Murphy

Chris Murphy

9:58 p.m.

On Mon, Feb 22, 2021 at 1:39 AM George R Goffe grgoffe@yahoo.com wrote:

...

Feb 21 18:55:38 fc35 kernel: BTRFS warning (device sda8): csum failed root 256 ino 257 off 0 csum 0xe05b8b2e expected csum 0xc90f1f63 mirror 1 Feb 21 18:55:38 fc35 kernel: BTRFS error (device sda8): bdev /dev/sda8 errs: wr 0, rd 0, flush 0, corrupt 150, gen 0 Feb 21 18:55:38 fc35 kernel: blk_update_request: I/O error, dev loop0, sector 2 op 0x0:(READ) flags 0x1000 phys_seg 1 prio class 0 Feb 21 18:55:38 fc35 kernel: EXT4-fs (loop0): unable to read superblock

Btrfs says the block doesn't match up with the checksum (which is itself checksummed and yet we don't have an error for that which means we can consider this error reasonably reliable). Upon checksum mismatches, Btrfs will issue an I/O error for that block. It doesn't turn over corrupt data. That results in the 3rd line, which in turn results in the 4th line because the ext4 super block is offset one sector, i.e. sector 2.

All the errors you reported are related to the same basic problem which is that btrfs thinks this block is bad. This is not the end of the story though because there's a bunch of backup supers on ext4.

[snip all other error messages, since they're the same issue]

We should find out if there's more widespread corruption. The basic command to scrub that particular Btrfs file system is:

sudo btrfs scrub start /mnt

The command scrubs the whole file system, verifying all metadata and data with their checksums. If this were a default Fedora installation where / and /home/ are on the same Btrfs file system, you can scrub either / or /home/, but there's no point in doing both.

The results will be dumped into kernel messages. Note that these messages, when it finds corrupt files, will include the entire path to file, including filenames. This might be a privacy concern so before posting, check the filenames.

A normal scrub without errors has kernel messages:

[20520.216852] BTRFS info (device sda5): scrub: started on devid 1 [20520.965950] BTRFS info (device sda5): scrub: finished on devid 1 with status: 0

All errors will be between these two messages. Yours will have at least one error for the "/mnt/ext2_saved/image" that we already know about. What I want to know is if there's more problems before moving on.

-- Chris Murphy

Matthew Miller

23 Feb 23 Feb

3:03 p.m.

On Mon, Feb 22, 2021 at 02:58:00PM -0700, Chris Murphy wrote:

...

We should find out if there's more widespread corruption. The basic command to scrub that particular Btrfs file system is:

sudo btrfs scrub start /mnt

It's kind of unfortunate that we also have a command (in the distro since 2007) called just "scrub" which will destroy al of your data. :-/

Anyway... do we have a timer which does this automatically on Fedora Linux systems, or are there plans to add one? Upstream wiki https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-scrub says "The user is supposed to run it manually or via a periodic system service. The recommended period is a month but could be less."

-- Matthew Miller mattdm@fedoraproject.org Fedora Project Leader

Neal Gompa

3:18 p.m.

On Tue, Feb 23, 2021 at 10:03 AM Matthew Miller mattdm@fedoraproject.org wrote:

...

On Mon, Feb 22, 2021 at 02:58:00PM -0700, Chris Murphy wrote:

...
We should find out if there's more widespread corruption. The basic command to scrub that particular Btrfs file system is:

sudo btrfs scrub start /mnt

It's kind of unfortunate that we also have a command (in the distro since 2007) called just "scrub" which will destroy al of your data. :-/

Not going to lie, it took three tries to read this to understand what was being said here. :)

...

Anyway... do we have a timer which does this automatically on Fedora Linux systems, or are there plans to add one? Upstream wiki https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-scrub says "The user is supposed to run it manually or via a periodic system service. The recommended period is a month but could be less."

We *do* have btrfsmaintenance[1] which provides what you're asking for. However, we don't install it by default or have presets set up for the timers. There were arguments for and against shipping and enabling them by default[2].

If we want to ship these, we can easily do so.

[1]: https://src.fedoraproject.org/rpms/btrfsmaintenance [2]: https://pagure.io/fedora-btrfs/project/issue/16

-- 真実はいつも一つ！/ Always, there's only one truth!

Matthew Miller

3:58 p.m.

On Tue, Feb 23, 2021 at 10:18:13AM -0500, Neal Gompa wrote:

...

...
...
sudo btrfs scrub start /mnt

It's kind of unfortunate that we also have a command (in the distro since 2007) called just "scrub" which will destroy al of your data. :-/

Not going to lie, it took three tries to read this to understand what was being said here. :)

Sorry, just... don't accidentally leave "btrfs" out of the above command. :)

...

We *do* have btrfsmaintenance[1] which provides what you're asking for. However, we don't install it by default or have presets set up for the timers. There were arguments for and against shipping and enabling them by default[2].

Ah, thanks.

-- Matthew Miller mattdm@fedoraproject.org Fedora Project Leader

pmkellly＠frontier.com

9:25 p.m.

On 2/23/21 10:58, Matthew Miller wrote:

...

On Tue, Feb 23, 2021 at 10:18:13AM -0500, Neal Gompa wrote:

...
...
...
sudo btrfs scrub start /mnt

It's kind of unfortunate that we also have a command (in the distro since 2007) called just "scrub" which will destroy al of your data. :-/

Not going to lie, it took three tries to read this to understand what was being said here. :)

Sorry, just... don't accidentally leave "btrfs" out of the above command. :)

...
We *do* have btrfsmaintenance[1] which provides what you're asking for. However, we don't install it by default or have presets set up for the timers. There were arguments for and against shipping and enabling them by default[2].

Ah, thanks.

I recall a long and detailed discussion on this list before F33 was released concerning what disk maintenance would be required with BTRFS. As I recall, the final word was along the lines the running Scrub and the other BTRFS utilities wouldn't be necessary since it was being set up so maintenance shouldn't be needed. There was also some hesitancy to call for running scrub because, depending on how often it's run Scrub can be hard on SSDs (they wear out faster).

Hmmm... Now that seems to be changing. I guess we better revisit the BTRFS maintenance issue again. The first part is: Was this a surprise one-off due to operator error or similar? Do we have a problem and BTRFS maintenance will be required?

Have a Great Day!

Pat (tablepc)

Richard Shaw

3 Mar 3 Mar

5:56 p.m.

On Tue, Feb 23, 2021 at 3:26 PM pmkellly@frontier.com pmkellly@frontier.com wrote:

...

I recall a long and detailed discussion on this list before F33 was released concerning what disk maintenance would be required with BTRFS. As I recall, the final word was along the lines the running Scrub and the other BTRFS utilities wouldn't be necessary since it was being set up so maintenance shouldn't be needed. There was also some hesitancy to call for running scrub because, depending on how often it's run Scrub can be hard on SSDs (they wear out faster).

From what I can tell scrubbing only reads data and compares to the stored checksum. Why would that wear out a SSD?

Thanks, Richard

Matthew Miller

6:05 p.m.

On Wed, Mar 03, 2021 at 11:56:58AM -0600, Richard Shaw wrote:

...

From what I can tell scrubbing only reads data and compares to the stored checksum. Why would that wear out a SSD?

If you have atimes enabled, reading a file also makes a metadata write. But I don't think it's that big a deal on modern drives.

-- Matthew Miller mattdm@fedoraproject.org Fedora Project Leader

Samuel Sieb

8:13 p.m.

On 3/3/21 10:05 AM, Matthew Miller wrote:

...

On Wed, Mar 03, 2021 at 11:56:58AM -0600, Richard Shaw wrote:

...
From what I can tell scrubbing only reads data and compares to the stored checksum. Why would that wear out a SSD?

If you have atimes enabled, reading a file also makes a metadata write. But I don't think it's that big a deal on modern drives.

It depends on how the scrubbing works. I would have expected it to be reading data at the filesystem level, not actually opening and reading every file. That seems like a really bad thing to me, resetting the atimes on every file.

Chris Murphy

11:57 p.m.

On Wed, Mar 3, 2021 at 1:14 PM Samuel Sieb samuel@sieb.net wrote:

...

On 3/3/21 10:05 AM, Matthew Miller wrote:

...
On Wed, Mar 03, 2021 at 11:56:58AM -0600, Richard Shaw wrote:

...
From what I can tell scrubbing only reads data and compares to the stored checksum. Why would that wear out a SSD?

If you have atimes enabled, reading a file also makes a metadata write. But I don't think it's that big a deal on modern drives.

It depends on how the scrubbing works. I would have expected it to be reading data at the filesystem level, not actually opening and reading every file. That seems like a really bad thing to me, resetting the atimes on every file.

It works at the block level. A block is read, checksum calculated and compared to the previously recorded checksum for the block. It doesn't know what it's reading, not even whether it's compressed or not. It just becomes a stream of blocks without respect to the file or subvolume. If there's a mismatch, then it does a lookup to find out the owner: what subvolume/snapshot and filename/inode, what offset, and so on - which is then how it figures out where the good copy is (if any) and does self-healing. It pretty much runs at device max read capability.

Related: 'btrfs replace' command utilizes this scrub facility to live replace a drive, and is the preferred way to replace; as compared to 'btrfs device add' followed by 'btrfs device remove'. The replace method in effect creates a temporary virtual/hidden mirror, and does a scrub to make the replacement the same as the source. File system writes during replace go to both devices, and it's expected to be crash safe, resuming automatically at next mount time.

-- Chris Murphy

Matthew Miller

4 Mar 4 Mar

12:33 a.m.

On Wed, Mar 03, 2021 at 04:57:28PM -0700, Chris Murphy wrote:

...

It works at the block level. A block is read, checksum calculated and compared to the previously recorded checksum for the block. It doesn't know what it's reading, not even whether it's compressed or not. It just becomes a stream of blocks without respect to the file or subvolume. If there's a mismatch, then it does a lookup to find out the owner: what subvolume/snapshot and filename/inode, what offset, and so on - which is then how it figures out where the good copy is (if any) and does self-healing. It pretty much runs at device max read capability.

So given this, even with atimes there's only a write in the case where there's a mismatch, right?

-- Matthew Miller mattdm@fedoraproject.org Fedora Project Leader

Chris Murphy

3:25 a.m.

On Wed, Mar 3, 2021 at 5:34 PM Matthew Miller mattdm@fedoraproject.org wrote:

...

On Wed, Mar 03, 2021 at 04:57:28PM -0700, Chris Murphy wrote:

...
It works at the block level. A block is read, checksum calculated and compared to the previously recorded checksum for the block. It doesn't know what it's reading, not even whether it's compressed or not. It just becomes a stream of blocks without respect to the file or subvolume. If there's a mismatch, then it does a lookup to find out the owner: what subvolume/snapshot and filename/inode, what offset, and so on - which is then how it figures out where the good copy is (if any) and does self-healing. It pretty much runs at device max read capability.

So given this, even with atimes there's only a write in the case where there's a mismatch, right?

If there's a mismatch, and no redundancy, there's no fixup. Therefore no write.

If there's a mismatch, and there's redundancy of some kind, a fixup is possible. That involves finding the good copy and overwriting the bad. But this too is at a block level, and atime is not touched. Btrfs scrub is a process that works within the file system, it's not user space so there's no file access happening at all.

-- Chris Murphy

Matthew Miller

12:34 a.m.

On Wed, Mar 03, 2021 at 12:13:33PM -0800, Samuel Sieb wrote:

...

It depends on how the scrubbing works. I would have expected it to be reading data at the filesystem level, not actually opening and reading every file. That seems like a really bad thing to me, resetting the atimes on every file.

I mean.... full-disk backup utilities do it all the time.

-- Matthew Miller mattdm@fedoraproject.org Fedora Project Leader

Samuel Sieb

1:40 a.m.

On 3/3/21 4:34 PM, Matthew Miller wrote:

...

On Wed, Mar 03, 2021 at 12:13:33PM -0800, Samuel Sieb wrote:

...
It depends on how the scrubbing works. I would have expected it to be reading data at the filesystem level, not actually opening and reading every file. That seems like a really bad thing to me, resetting the atimes on every file.

I mean.... full-disk backup utilities do it all the time.

There are a few different types of full-disk backup, but if it's file-based and the atimes are modified, that's the intent of the backup process. The atimes are used to determine which files to backup. A scrub is not a backup and shouldn't modify the atimes.

Matthew Miller

2:50 p.m.

On Wed, Mar 03, 2021 at 05:40:19PM -0800, Samuel Sieb wrote:

...

There are a few different types of full-disk backup, but if it's file-based and the atimes are modified, that's the intent of the backup process. The atimes are used to determine which files to backup. A scrub is not a backup and shouldn't modify the atimes.

I can't imagine why you would use atime instead of ctime for backups.

-- Matthew Miller mattdm@fedoraproject.org Fedora Project Leader

Neal Gompa

2:53 p.m.

On Thu, Mar 4, 2021 at 9:50 AM Matthew Miller mattdm@fedoraproject.org wrote:

...

On Wed, Mar 03, 2021 at 05:40:19PM -0800, Samuel Sieb wrote:

...
There are a few different types of full-disk backup, but if it's file-based and the atimes are modified, that's the intent of the backup process. The atimes are used to determine which files to backup. A scrub is not a backup and shouldn't modify the atimes.

I can't imagine why you would use atime instead of ctime for backups.

Also, this seems really easy to game. Why not maintain a checksum record instead? That makes it possible to maintain more efficient deltas. Using atimes/mtimes is just asking for trouble.

-- 真実はいつも一つ！/ Always, there's only one truth!

Samuel Sieb

7:52 p.m.

On 3/4/21 6:50 AM, Matthew Miller wrote:

...

On Wed, Mar 03, 2021 at 05:40:19PM -0800, Samuel Sieb wrote:

...
There are a few different types of full-disk backup, but if it's file-based and the atimes are modified, that's the intent of the backup process. The atimes are used to determine which files to backup. A scrub is not a backup and shouldn't modify the atimes.

I can't imagine why you would use atime instead of ctime for backups.

Oh, wow, I just realized that I was confusing the atime with the A (archive) bit from DOS... It would be more likely the mtime that would be used by a backup program. But you still don't want the atime getting reset arbitrarily. Anyway, Chris confirmed that scrub doesn't change any of that.

George R Goffe

3 Mar 3 Mar

5:34 p.m.

Chris,

Here's the information you requested.

I'm wondering just how this happened. One of the messages refers to "beyond end of device". I'm alarmed.

Regards,

George...

[ 2017.474378] BTRFS info (device sda8): scrub: started on devid 1 [ 2101.646773] BTRFS warning (device sda8): checksum error at logical 1780042522624 on dev /dev/sda8, physical 6250299392, root 256, inode 257, offset 786432, length 4096, links 1 (path: image) [ 2101.646813] BTRFS warning (device sda8): checksum error at logical 1780042653696 on dev /dev/sda8, physical 6250430464, root 256, inode 257, offset 917504, length 4096, links 1 (path: image) [ 2101.646824] BTRFS error (device sda8): unable to fixup (regular) error at logical 1780042522624 on dev /dev/sda8 [ 2101.646823] BTRFS error (device sda8): bdev /dev/sda8 errs: wr 0, rd 0, flush 0, corrupt 163, gen 0 [ 2101.646842] BTRFS error (device sda8): unable to fixup (regular) error at logical 1780041998336 on dev /dev/sda8 [ 2101.646846] BTRFS error (device sda8): bdev /dev/sda8 errs: wr 0, rd 0, flush 0, corrupt 164, gen 0 [ 2101.646858] BTRFS error (device sda8): unable to fixup (regular) error at logical 1780042653696 on dev /dev/sda8 [ 2101.646862] BTRFS warning (device sda8): checksum error at logical 1780042129408 on dev /dev/sda8, physical 6249906176, root 256, inode 257, offset 393216, length 4096, links 1 (path: image) [ 2101.646894] BTRFS error (device sda8): bdev /dev/sda8 errs: wr 0, rd 0, flush 0, corrupt 165, gen 0 [ 2101.646908] BTRFS error (device sda8): unable to fixup (regular) error at logical 1780042129408 on dev /dev/sda8 [ 2101.646931] BTRFS warning (device sda8): checksum error at logical 1780042391552 on dev /dev/sda8, physical 6250168320, root 256, inode 257, offset 655360, length 4096, links 1 (path: image) [ 2101.646941] BTRFS warning (device sda8): checksum error at logical 1780041736192 on dev /dev/sda8, physical 6249512960, root 256, inode 257, offset 0, length 4096, links 1 (path: image) [ 2101.646954] BTRFS warning (device sda8): checksum error at logical 1780042260480 on dev /dev/sda8, physical 6250037248, root 256, inode 257, offset 524288, length 4096, links 1 (path: image) [ 2101.646960] BTRFS error (device sda8): bdev /dev/sda8 errs: wr 0, rd 0, flush 0, corrupt 166, gen 0 [ 2101.646964] BTRFS warning (device sda8): checksum error at logical 1780041867264 on dev /dev/sda8, physical 6249644032, root 256, inode 257, offset 131072, length 4096, links 1 (path: image) [ 2101.646969] BTRFS error (device sda8): unable to fixup (regular) error at logical 1780041736192 on dev /dev/sda8 [ 2101.646968] BTRFS error (device sda8): bdev /dev/sda8 errs: wr 0, rd 0, flush 0, corrupt 167, gen 0 [ 2101.646978] BTRFS error (device sda8): unable to fixup (regular) error at logical 1780042391552 on dev /dev/sda8 [ 2101.646986] BTRFS error (device sda8): bdev /dev/sda8 errs: wr 0, rd 0, flush 0, corrupt 168, gen 0 [ 2101.646995] BTRFS error (device sda8): bdev /dev/sda8 errs: wr 0, rd 0, flush 0, corrupt 169, gen 0 [ 2101.647000] BTRFS error (device sda8): unable to fixup (regular) error at logical 1780042260480 on dev /dev/sda8 [ 2101.647010] BTRFS error (device sda8): unable to fixup (regular) error at logical 1780041867264 on dev /dev/sda8 [ 2101.648001] BTRFS error (device sda8): bdev /dev/sda8 errs: wr 0, rd 0, flush 0, corrupt 170, gen 0 [ 2101.648017] BTRFS error (device sda8): unable to fixup (regular) error at logical 1780042133504 on dev /dev/sda8 [ 2101.648028] BTRFS warning (device sda8): checksum error at logical 1780042526720 on dev /dev/sda8, physical 6250303488, root 256, inode 257, offset 790528, length 4096, links 1 (path: image) [ 2101.648051] BTRFS warning (device sda8): checksum error at logical 1780042002432 on dev /dev/sda8, physical 6249779200, root 256, inode 257, offset 266240, length 4096, links 1 (path: image) [ 2101.648067] BTRFS error (device sda8): bdev /dev/sda8 errs: wr 0, rd 0, flush 0, corrupt 171, gen 0 [ 2101.648081] BTRFS error (device sda8): bdev /dev/sda8 errs: wr 0, rd 0, flush 0, corrupt 172, gen 0 [ 2101.648089] BTRFS error (device sda8): unable to fixup (regular) error at logical 1780042526720 on dev /dev/sda8 [ 6419.353481] perf: interrupt took too long (2519 > 2500), lowering kernel.perf_event_max_sample_rate to 79000 [ 8850.583857] perf: interrupt took too long (3152 > 3148), lowering kernel.perf_event_max_sample_rate to 63000 [12805.242553] perf: interrupt took too long (3954 > 3940), lowering kernel.perf_event_max_sample_rate to 50000 [17077.326338] perf: interrupt took too long (4943 > 4942), lowering kernel.perf_event_max_sample_rate to 40000 [34090.956936] perf: interrupt took too long (6207 > 6178), lowering kernel.perf_event_max_sample_rate to 32000 [36365.549230] BTRFS error (device sda8): scrub: tree block 1777055424512 spanning stripes, ignored. logical=1777055367168 [36365.549262] attempt to access beyond end of device sda8: rw=0, want=3470811376, limit=3470811312 [36370.664946] BTRFS info (device sda8): scrub: finished on devid 1 with status: 0 [ 2017.474378] BTRFS info (device sda8): scrub: started on devid 1 [ 2101.646773] BTRFS warning (device sda8): checksum error at logical 1780042522624 on dev /dev/sda8, physical 6250299392, root 256, inode 257, offset 786432, length 4096, links 1 (path: image) [ 2101.646813] BTRFS warning (device sda8): checksum error at logical 1780042653696 on dev /dev/sda8, physical 6250430464, root 256, inode 257, offset 917504, length 4096, links 1 (path: image) [ 2101.646824] BTRFS error (device sda8): unable to fixup (regular) error at logical 1780042522624 on dev /dev/sda8 [ 2101.646823] BTRFS error (device sda8): bdev /dev/sda8 errs: wr 0, rd 0, flush 0, corrupt 163, gen 0 [ 2101.646842] BTRFS error (device sda8): unable to fixup (regular) error at logical 1780041998336 on dev /dev/sda8 [ 2101.646846] BTRFS error (device sda8): bdev /dev/sda8 errs: wr 0, rd 0, flush 0, corrupt 164, gen 0 [ 2101.646858] BTRFS error (device sda8): unable to fixup (regular) error at logical 1780042653696 on dev /dev/sda8 [ 2101.646862] BTRFS warning (device sda8): checksum error at logical 1780042129408 on dev /dev/sda8, physical 6249906176, root 256, inode 257, offset 393216, length 4096, links 1 (path: image) [ 2101.646894] BTRFS error (device sda8): bdev /dev/sda8 errs: wr 0, rd 0, flush 0, corrupt 165, gen 0 [ 2101.646908] BTRFS error (device sda8): unable to fixup (regular) error at logical 1780042129408 on dev /dev/sda8 [ 2101.646931] BTRFS warning (device sda8): checksum error at logical 1780042391552 on dev /dev/sda8, physical 6250168320, root 256, inode 257, offset 655360, length 4096, links 1 (path: image) [ 2101.646941] BTRFS warning (device sda8): checksum error at logical 1780041736192 on dev /dev/sda8, physical 6249512960, root 256, inode 257, offset 0, length 4096, links 1 (path: image) [ 2101.646954] BTRFS warning (device sda8): checksum error at logical 1780042260480 on dev /dev/sda8, physical 6250037248, root 256, inode 257, offset 524288, length 4096, links 1 (path: image) [ 2101.646960] BTRFS error (device sda8): bdev /dev/sda8 errs: wr 0, rd 0, flush 0, corrupt 166, gen 0 [ 2101.646964] BTRFS warning (device sda8): checksum error at logical 1780041867264 on dev /dev/sda8, physical 6249644032, root 256, inode 257, offset 131072, length 4096, links 1 (path: image) [ 2101.646969] BTRFS error (device sda8): unable to fixup (regular) error at logical 1780041736192 on dev /dev/sda8 [ 2101.646968] BTRFS error (device sda8): bdev /dev/sda8 errs: wr 0, rd 0, flush 0, corrupt 167, gen 0 [ 2101.646978] BTRFS error (device sda8): unable to fixup (regular) error at logical 1780042391552 on dev /dev/sda8 [ 2101.646986] BTRFS error (device sda8): bdev /dev/sda8 errs: wr 0, rd 0, flush 0, corrupt 168, gen 0 [ 2101.646995] BTRFS error (device sda8): bdev /dev/sda8 errs: wr 0, rd 0, flush 0, corrupt 169, gen 0 [ 2101.647000] BTRFS error (device sda8): unable to fixup (regular) error at logical 1780042260480 on dev /dev/sda8 [ 2101.647010] BTRFS error (device sda8): unable to fixup (regular) error at logical 1780041867264 on dev /dev/sda8 [ 2101.648001] BTRFS error (device sda8): bdev /dev/sda8 errs: wr 0, rd 0, flush 0, corrupt 170, gen 0 [ 2101.648017] BTRFS error (device sda8): unable to fixup (regular) error at logical 1780042133504 on dev /dev/sda8 [ 2101.648028] BTRFS warning (device sda8): checksum error at logical 1780042526720 on dev /dev/sda8, physical 6250303488, root 256, inode 257, offset 790528, length 4096, links 1 (path: image) [ 2101.648051] BTRFS warning (device sda8): checksum error at logical 1780042002432 on dev /dev/sda8, physical 6249779200, root 256, inode 257, offset 266240, length 4096, links 1 (path: image) [ 2101.648067] BTRFS error (device sda8): bdev /dev/sda8 errs: wr 0, rd 0, flush 0, corrupt 171, gen 0 [ 2101.648081] BTRFS error (device sda8): bdev /dev/sda8 errs: wr 0, rd 0, flush 0, corrupt 172, gen 0 [ 2101.648089] BTRFS error (device sda8): unable to fixup (regular) error at logical 1780042526720 on dev /dev/sda8 [ 6419.353481] perf: interrupt took too long (2519 > 2500), lowering kernel.perf_event_max_sample_rate to 79000 [ 8850.583857] perf: interrupt took too long (3152 > 3148), lowering kernel.perf_event_max_sample_rate to 63000 [12805.242553] perf: interrupt took too long (3954 > 3940), lowering kernel.perf_event_max_sample_rate to 50000 [17077.326338] perf: interrupt took too long (4943 > 4942), lowering kernel.perf_event_max_sample_rate to 40000 [34090.956936] perf: interrupt took too long (6207 > 6178), lowering kernel.perf_event_max_sample_rate to 32000 [36365.549230] BTRFS error (device sda8): scrub: tree block 1777055424512 spanning stripes, ignored. logical=1777055367168 [36365.549262] attempt to access beyond end of device sda8: rw=0, want=3470811376, limit=3470811312 [36370.664946] BTRFS info (device sda8): scrub: finished on devid 1 with status: 0

On Monday, February 22, 2021, 1:58:18 PM PST, Chris Murphy lists@colorremedies.com wrote:

On Mon, Feb 22, 2021 at 1:39 AM George R Goffe grgoffe@yahoo.com wrote:

...

Feb 21 18:55:38 fc35 kernel: BTRFS warning (device sda8): csum failed root 256 ino 257 off 0 csum 0xe05b8b2e expected csum 0xc90f1f63 mirror 1 Feb 21 18:55:38 fc35 kernel: BTRFS error (device sda8): bdev /dev/sda8 errs: wr 0, rd 0, flush 0, corrupt 150, gen 0 Feb 21 18:55:38 fc35 kernel: blk_update_request: I/O error, dev loop0, sector 2 op 0x0:(READ) flags 0x1000 phys_seg 1 prio class 0 Feb 21 18:55:38 fc35 kernel: EXT4-fs (loop0): unable to read superblock

[snip all other error messages, since they're the same issue]

We should find out if there's more widespread corruption. The basic command to scrub that particular Btrfs file system is:

sudo btrfs scrub start /mnt

A normal scrub without errors has kernel messages:

[20520.216852] BTRFS info (device sda5): scrub: started on devid 1 [20520.965950] BTRFS info (device sda5): scrub: finished on devid 1 with status: 0

on.

-- Chris Murphy

Chris Murphy

10:57 p.m.

On Wed, Mar 3, 2021 at 10:34 AM George R Goffe grgoffe@yahoo.com wrote:

...

Chris,

Here's the information you requested.

I'm wondering just how this happened. One of the messages refers to "beyond end of device". I'm alarmed.

Don't panic.

But do keep your backups fresh, while you have the chance.

What I can tell you is Btrfs isn't changing the data or the checksums on disk. They've changed since the checksums were computed at btrfs-convert time. While it is possible there's a bug that can explain this, it's not one I'm aware of.

Can you file a bug for this? Component is btrfs-progs. You can delete the prefilled form and just include the following info:

- kernel and btrfs-progs versions at the time of btrfs-convert - before conversion, did you run 'e2fsck -fvy' (or equivalent)? - after conversion, did you run 'btrfs balance' (don't do it now)? - have there been any partition modifications after conversion? - output from the following:

btrfs insp dump-s /dev/sdX fdisk -l /dev/sdX smartctl -x /dev/sdX

- complete dmesg from this scrub you just did, please attach as a file, no trimming; if you've rebooted since that scrub you can get a copy back from the journal using 'journalctl -b-1 -k -o short-monotonic --no-hostname > dmesg.txt' where the -b value is -1 for previous boot, -2 for the one before that, etc.

...

[ 2101.646773] BTRFS warning (device sda8): checksum error at logical 1780042522624 on dev /dev/sda8, physical 6250299392, root 256, inode 257, offset 786432, length 4096, links 1 (path: image)

All of the 'checksum error' messages are about the same file. The ext2_saved/image file used for rollback. This is surely not a coincidence, I just don't know why there'd be checksum errors only with this one file.

The unable to fixup errors match up with each checksum error and indicat there isn't a mirror copy for self-healing. Basically the first error is 'this is bad" and the second error is "I can't fix it".

...

[36365.549230] BTRFS error (device sda8): scrub: tree block 1777055424512 spanning stripes, ignored. logical=1777055367168 [36365.549262] attempt to access beyond end of device sda8: rw=0, want=3470811376, limit=3470811312

I'm not sure about this yet - need more info.

-- Chris Murphy

Chris Murphy

4 Mar 4 Mar

8:48 a.m.

George,

You've found a bug. I've got a reproducer.

https://github.com/kdave/btrfs-progs/issues/349

Both the ext4 rollback file, ext2_saved/image, and the converted btrfs file system are OK. The corruption is bogus, so the bug is that somehow checksums on a handful of specific blocks (not random) are computed incorrectly. Btrfs integrity checking sees these blocks as corrupt, refuses to hand over those blocks when trying to mount the ext4 file system, and therefore mount fails because the ext4 superblock can't be read.

The rollback will still work. But of course that means you lose the btrfs file system and everything on it - it goes back to the state it was in before conversion.

Another idea is a workaround that I've tested that's probably a lot more convenient. First, you need a 5.11 or newer kernel.

mount -o ro,rescue=all /dev/sda8 /mnt/btrfs mount -o ro,loop /mnt/btrfs/ext2_saved/image /mnt/oldext4

That's it. You'll be able to inspect /mnt/oldext4 for the files you've accidentally deleted. Of course you can't copy files to /mnt/btrfs because it's read-only.

-- Chris Murphy

1212

Age (days ago)

1223

Last active (days ago)

test@lists.fedoraproject.org

28 comments

7 participants

tags (0)

participants (7)

Chris Murphy
George R Goffe
Matthew Miller
Neal Gompa
pmkellly＠frontier.com
Richard Shaw
Samuel Sieb