On Tue, Feb 23, 2021 at 2:26 PM pmkellly@frontier.com pmkellly@frontier.com wrote:
I recall a long and detailed discussion on this list before F33 was released concerning what disk maintenance would be required with BTRFS. As I recall, the final word was along the lines the running Scrub and the other BTRFS utilities wouldn't be necessary since it was being set up so maintenance shouldn't be needed.
Correct. Most of the time, for most users, what's provided is self maintaining in Brfs kernel code. If there's necessary maintenance not provided by the kernel, then it should be scheduled, e.g. a systemd timer kicks off a service unit.
There was also some hesitancy to call for running scrub because, depending on how often it's run Scrub can be hard on SSDs (they wear out faster).
Scrub is mostly a read-only operation involving the verification of checksums for all file system metadata and file data. There's no concern on wear, you could run it every day if you wanted to.
But other considerations are how long it will take, how much CPU is used, and will it slow down the computer until it completes?
Hmmm... Now that seems to be changing. I guess we better revisit the BTRFS maintenance issue again. The first part is: Was this a surprise one-off due to operator error or similar? Do we have a problem and BTRFS maintenance will be required?
Scrub is not going to fix George's file system. I want to know if there's more corruption than this one block, because I want to know the big picture. The specific corruptions can provide clues for why they are happening.
If this is the only bad block, there is a way to fix it with e2fs tools. But getting a copy of this superblock before repairing it might give a clue what stepped on it after btrfs-convert.
-- Chris Murphy
On 2/24/21 20:48, Chris Murphy wrote:
On Tue, Feb 23, 2021 at 2:26 PM pmkellly@frontier.com pmkellly@frontier.com wrote:
I recall a long and detailed discussion on this list before F33 was released concerning what disk maintenance would be required with BTRFS. As I recall, the final word was along the lines the running Scrub and the other BTRFS utilities wouldn't be necessary since it was being set up so maintenance shouldn't be needed.
Correct. Most of the time, for most users, what's provided is self maintaining in Brfs kernel code. If there's necessary maintenance not provided by the kernel, then it should be scheduled, e.g. a systemd timer kicks off a service unit.
Out of curiosity I ran scrub on the four machines I have handy here. I always do clean installs so the btrfs doesn't have a lot of time on it; just since F33 was released.
Scrub gives no indication that it's running other than the PID. Nor does it indicate when it's complete; so I had to monitor the PID to know when it was done. Then I had to run:
sudo btrfs scrub status -dR /mnt
to find the results. Do you know if anyone has some code that runs scrub and gets the status and reports it after scrub is complete?
None of the four machines showed any problem.
Running scrub and getting the status might be good for people who do upgrade instead of doing clean installs. Maybe even a before and after upgrade might be revealing. Perhaps a special test day for machines that have been running the prior version for 6 months.
There was also some hesitancy to call for running scrub because, depending on how often it's run Scrub can be hard on SSDs (they wear out faster).
Scrub is mostly a read-only operation involving the verification of checksums for all file system metadata and file data. There's no concern on wear, you could run it every day if you wanted to.
I thought sure that there was one of the btrfs utilities that was hard on SSDs if run regularly. Please refresh my memory.
But other considerations are how long it will take, how much CPU is used, and will it slow down the computer until it completes?
Yes hence the need to know when it's done. Progress indication might be good too. One of the machines I ran it on yesterday was an old Core Duo. Running scrub wasn't noticeable, but the FS content is small on that machine, the run time was seconds. On an AMD machine with 8 cores, but a big (not huge) FS content took minutes maybe 3.
Have a Great Day!
Pat (tablepc)
On Thu, Feb 25, 2021 at 09:25:04AM -0500, pmkellly@frontier.com wrote:
Scrub gives no indication that it's running other than the PID. Nor does it indicate when it's complete; so I had to monitor the PID to know when it was done. Then I had to run:
sudo btrfs scrub status -dR /mnt
to find the results. Do you know if anyone has some code that runs scrub and gets the status and reports it after scrub is complete?
Well, you get `BTRFS info (device nvme0n1p3): scrub: finished on devid 1 with status: 0` in the logs. So these things could be completely separate; one timer that runs the scrub, and one that looks for success or errors in the logs.
On Thu, Feb 25, 2021 at 7:25 AM pmkellly@frontier.com pmkellly@frontier.com wrote:
Out of curiosity I ran scrub on the four machines I have handy here. I always do clean installs so the btrfs doesn't have a lot of time on it; just since F33 was released.
Scrub is pretty performant regardless of file system age.
Scrub gives no indication that it's running other than the PID. Nor does it indicate when it's complete; so I had to monitor the PID to know when it was done. Then I had to run:
sudo btrfs scrub status -dR /mnt
to find the results. Do you know if anyone has some code that runs scrub and gets the status and reports it after scrub is complete?
You can use 'btrfs scrub status /mnt' to see status of scrub in-progress or the results of the most recent scrub.
An alternative is 'btrfs scrub start -BdR' which will not background the scrub, and will give a detailed report upon completion.
Most things Btrfs are implemented in the kernel, so you'll see most messages related to Btrfs in dmesg.
None of the four machines showed any problem.
That is the usual case. I have file systems going back years without showing checksum errors.
Running scrub and getting the status might be good for people who do upgrade instead of doing clean installs. Maybe even a before and after upgrade might be revealing. Perhaps a special test day for machines that have been running the prior version for 6 months.
Checksum errors are almost always hardware related.
I thought sure that there was one of the btrfs utilities that was hard on SSDs if run regularly. Please refresh my memory.
Perhaps 'btrfs balance' - which should really only be used with filters to limit the rewriting of the file system for specific goal. There's no reason to run it just to run it, as it does mean reading every block and writing them elsewhere (into free space).
But other considerations are how long it will take, how much CPU is used, and will it slow down the computer until it completes?
Yes hence the need to know when it's done. Progress indication might be good too.
dmesg will tell you when it's done as will 'btrfs scrub status'.
I suspect upstream would accept an enhancement for 'btrfs scrub start -B' (don't background) such that it has a progress indicator similar to btrfs-convert or other implementation.
On 2/25/21 15:24, Chris Murphy wrote:
An alternative is 'btrfs scrub start -BdR' which will not background the scrub, and will give a detailed report upon completion.
Well I think tried all the combinations of scrub status with and without -B, -BdR, -dR to try and get status to report while scrub was running. I was trying to make a combined command (with the &&) that included both the scrub start and the status. I was just going to make it an alias. I couldn't get it to work. scrub would run, but I never got the status. At the end I just made a little script:
#!/bin/bash
sudo btrfs scrub start -BdR /mnt
sudo btrfs scrub status -dR /mnt
then I made an alias to the script. Now when I type the alias scrub runs and when its done I get the results. I don't get any progress, but That's better than what I was doing. I don't plan to run it often, but now I can run it easily and at the end know what the results are.
Related: We don't seem to be using the drive dismount test in the QA Basic tests any more. At least I didn't see it there. Is that no longer necessary with btrfs?
Have a Great Day!
Pat (tablepc)
On Fri, Feb 26, 2021 at 12:53 PM pmkellly@frontier.com pmkellly@frontier.com wrote:
On 2/25/21 15:24, Chris Murphy wrote:
An alternative is 'btrfs scrub start -BdR' which will not background the scrub, and will give a detailed report upon completion.
Well I think tried all the combinations of scrub status
Subcommand start, not status.
I was trying to make a combined command (with the &&) that included both the scrub start and the status.
Start it, and then use watch with status?
Related: We don't seem to be using the drive dismount test in the QA Basic tests any more. At least I didn't see it there. Is that no longer necessary with btrfs?
QA:Testcase_base_reboot_unmount ? It's listed 7 times on the current summary page.
On 2/26/21 14:53, pmkellly@frontier.com wrote:
On 2/25/21 15:24, Chris Murphy wrote:
An alternative is 'btrfs scrub start -BdR' which will not background the scrub, and will give a detailed report upon completion.
Well I think tried all the combinations of scrub status with and without -B, -BdR, -dR to try and get status to report while scrub was running. I was trying to make a combined command (with the &&) that included both the scrub start and the status. I was just going to make it an alias. I couldn't get it to work. scrub would run, but I never got the status. At the end I just made a little script:
#!/bin/bash
sudo btrfs scrub start -BdR /mnt
sudo btrfs scrub status -dR /mnt
then I made an alias to the script. Now when I type the alias scrub runs and when its done I get the results. I don't get any progress, but That's better than what I was doing. I don't plan to run it often, but now I can run it easily and at the end know what the results are.
Related: We don't seem to be using the drive dismount test in the QA Basic tests any more. At least I didn't see it there. Is that no longer necessary with btrfs?
Okay today I noticed that I was getting two reports and the end of scrub. Then it occurred to me that I hadn't tried:
sudo btrfs scrub start -BdR /mnt
by itself. Sorry, I'll chalk it up to distractions. It works fine. Thank you.