On 10/4/19 11:06 AM, Paolo Valente wrote:
Hi, I'm Paolo, the main developer of the BFQ I/O scheduler.
Hi Paolo!
The switch to the BFQ I/O scheduler by Fedora paves the way to up to a ~10X throughput boost, and up a ~400X latency reduction. This performance improvement concerns I/O workloads generated by multiple containers that share common storage devices. Actually it concerns, in general, also workloads generated by multiple groups, VMs or entities of any kind.
The reason for these apparently impressive numbers is that all other solutions for controlling I/O severely underutilize the speed of storage devices (usually between 10 and 20%).
If so, why probably you have never been warned about such an impressive waste of resources? Because it is extremely difficult to guarantee bandwidths and latencies on a loaded drive. So the most common solution for avoiding starvation, or very high latencies, has always been to keep storage devices underutilized. When an underutilized device is hit by the I/O of some container/group/VM, it is likely to serve this I/O very quickly, because it is unlikely to be already busy serving other I/O. If the I/O demand grows, then one simply adds more drives, so as to keep utilization low. And when this stops scaling, one goes buy faster drives.
More clever solutions do exist. They are based on I/O throttling. But, depending on the workload, these solutions may happen to forcibly lower utilization to about the same values reached with the above solution.
In contrast, BFQ is smart enough to highly utilize drives, with every workload. So, using, e.g., only one drive, BFQ satisfies an I/O demand that requires from 5 to 10 drives with the other solutions.
If you want to take advantage of this performance boost in Fedora CoreOS, I'm willing to help in every step.
It looks like the original request for this was made to Fedora in [1] and applied to F31+.
Fedora CoreOS uses the same systemd from Fedora so unless we explicitly decide against it we'll be using what Fedora does. I don't see any reason to differ here.
Paolo, does that match your understanding?
Dusty
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1738828 [2] https://github.com/systemd/systemd/pull/13321#issuecomment-522700152