Hi, I'm Paolo, the main developer of the BFQ I/O scheduler.
The switch to the BFQ I/O scheduler by Fedora paves the way to up to a ~10X throughput boost, and up a ~400X latency reduction. This performance improvement concerns I/O workloads generated by multiple containers that share common storage devices. Actually it concerns, in general, also workloads generated by multiple groups, VMs or entities of any kind.
The reason for these apparently impressive numbers is that all other solutions for controlling I/O severely underutilize the speed of storage devices (usually between 10 and 20%).
If so, why probably you have never been warned about such an impressive waste of resources? Because it is extremely difficult to guarantee bandwidths and latencies on a loaded drive. So the most common solution for avoiding starvation, or very high latencies, has always been to keep storage devices underutilized. When an underutilized device is hit by the I/O of some container/group/VM, it is likely to serve this I/O very quickly, because it is unlikely to be already busy serving other I/O. If the I/O demand grows, then one simply adds more drives, so as to keep utilization low. And when this stops scaling, one goes buy faster drives.
More clever solutions do exist. They are based on I/O throttling. But, depending on the workload, these solutions may happen to forcibly lower utilization to about the same values reached with the above solution.
In contrast, BFQ is smart enough to highly utilize drives, with every workload. So, using, e.g., only one drive, BFQ satisfies an I/O demand that requires from 5 to 10 drives with the other solutions.
If you want to take advantage of this performance boost in Fedora CoreOS, I'm willing to help in every step.
Thanks, Paolo
On 10/4/19 11:06 AM, Paolo Valente wrote:
Hi, I'm Paolo, the main developer of the BFQ I/O scheduler.
Hi Paolo!
The switch to the BFQ I/O scheduler by Fedora paves the way to up to a ~10X throughput boost, and up a ~400X latency reduction. This performance improvement concerns I/O workloads generated by multiple containers that share common storage devices. Actually it concerns, in general, also workloads generated by multiple groups, VMs or entities of any kind.
The reason for these apparently impressive numbers is that all other solutions for controlling I/O severely underutilize the speed of storage devices (usually between 10 and 20%).
If so, why probably you have never been warned about such an impressive waste of resources? Because it is extremely difficult to guarantee bandwidths and latencies on a loaded drive. So the most common solution for avoiding starvation, or very high latencies, has always been to keep storage devices underutilized. When an underutilized device is hit by the I/O of some container/group/VM, it is likely to serve this I/O very quickly, because it is unlikely to be already busy serving other I/O. If the I/O demand grows, then one simply adds more drives, so as to keep utilization low. And when this stops scaling, one goes buy faster drives.
More clever solutions do exist. They are based on I/O throttling. But, depending on the workload, these solutions may happen to forcibly lower utilization to about the same values reached with the above solution.
In contrast, BFQ is smart enough to highly utilize drives, with every workload. So, using, e.g., only one drive, BFQ satisfies an I/O demand that requires from 5 to 10 drives with the other solutions.
If you want to take advantage of this performance boost in Fedora CoreOS, I'm willing to help in every step.
It looks like the original request for this was made to Fedora in [1] and applied to F31+.
Fedora CoreOS uses the same systemd from Fedora so unless we explicitly decide against it we'll be using what Fedora does. I don't see any reason to differ here.
Paolo, does that match your understanding?
Dusty
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1738828 [2] https://github.com/systemd/systemd/pull/13321#issuecomment-522700152
Il giorno 4 ott 2019, alle ore 17:32, Dusty Mabe dusty@dustymabe.com ha scritto:
On 10/4/19 11:06 AM, Paolo Valente wrote:
Hi, I'm Paolo, the main developer of the BFQ I/O scheduler.
Hi Paolo!
Hi
The switch to the BFQ I/O scheduler by Fedora paves the way to up to a ~10X throughput boost, and up a ~400X latency reduction. This performance improvement concerns I/O workloads generated by multiple containers that share common storage devices. Actually it concerns, in general, also workloads generated by multiple groups, VMs or entities of any kind.
The reason for these apparently impressive numbers is that all other solutions for controlling I/O severely underutilize the speed of storage devices (usually between 10 and 20%).
If so, why probably you have never been warned about such an impressive waste of resources? Because it is extremely difficult to guarantee bandwidths and latencies on a loaded drive. So the most common solution for avoiding starvation, or very high latencies, has always been to keep storage devices underutilized. When an underutilized device is hit by the I/O of some container/group/VM, it is likely to serve this I/O very quickly, because it is unlikely to be already busy serving other I/O. If the I/O demand grows, then one simply adds more drives, so as to keep utilization low. And when this stops scaling, one goes buy faster drives.
More clever solutions do exist. They are based on I/O throttling. But, depending on the workload, these solutions may happen to forcibly lower utilization to about the same values reached with the above solution.
In contrast, BFQ is smart enough to highly utilize drives, with every workload. So, using, e.g., only one drive, BFQ satisfies an I/O demand that requires from 5 to 10 drives with the other solutions.
If you want to take advantage of this performance boost in Fedora CoreOS, I'm willing to help in every step.
It looks like the original request for this was made to Fedora in [1] and applied to F31+.
Fedora CoreOS uses the same systemd from Fedora so unless we explicitly decide against it we'll be using what Fedora does. I don't see any reason to differ here.
Paolo, does that match your understanding?
Yep. The issue I wanted to address with this topic is that maybe few people know about the 10X throughput they can get, with BFQ, for container workloads. And now that they know it, maybe they still don't know how to enable this boost (fortunately, it is extremely easy). So I'm saying mainly "hey, here I am to help!" :)
Thanks, Paolo
Dusty
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1738828 [2] https://github.com/systemd/systemd/pull/13321#issuecomment-522700152
On 10/4/19 1:07 PM, Paolo Valente wrote:
Il giorno 4 ott 2019, alle ore 17:32, Dusty Mabe dusty@dustymabe.com ha scritto:
On 10/4/19 11:06 AM, Paolo Valente wrote:
Hi, I'm Paolo, the main developer of the BFQ I/O scheduler.
Hi Paolo!
Hi
The switch to the BFQ I/O scheduler by Fedora paves the way to up to a ~10X throughput boost, and up a ~400X latency reduction. This performance improvement concerns I/O workloads generated by multiple containers that share common storage devices. Actually it concerns, in general, also workloads generated by multiple groups, VMs or entities of any kind.
The reason for these apparently impressive numbers is that all other solutions for controlling I/O severely underutilize the speed of storage devices (usually between 10 and 20%).
If so, why probably you have never been warned about such an impressive waste of resources? Because it is extremely difficult to guarantee bandwidths and latencies on a loaded drive. So the most common solution for avoiding starvation, or very high latencies, has always been to keep storage devices underutilized. When an underutilized device is hit by the I/O of some container/group/VM, it is likely to serve this I/O very quickly, because it is unlikely to be already busy serving other I/O. If the I/O demand grows, then one simply adds more drives, so as to keep utilization low. And when this stops scaling, one goes buy faster drives.
More clever solutions do exist. They are based on I/O throttling. But, depending on the workload, these solutions may happen to forcibly lower utilization to about the same values reached with the above solution.
In contrast, BFQ is smart enough to highly utilize drives, with every workload. So, using, e.g., only one drive, BFQ satisfies an I/O demand that requires from 5 to 10 drives with the other solutions.
If you want to take advantage of this performance boost in Fedora CoreOS, I'm willing to help in every step.
It looks like the original request for this was made to Fedora in [1] and applied to F31+.
Fedora CoreOS uses the same systemd from Fedora so unless we explicitly decide against it we'll be using what Fedora does. I don't see any reason to differ here.
Paolo, does that match your understanding?
Yep. The issue I wanted to address with this topic is that maybe few people know about the 10X throughput they can get, with BFQ, for container workloads. And now that they know it, maybe they still don't know how to enable this boost (fortunately, it is extremely easy). So I'm saying mainly "hey, here I am to help!" :)
Just checking this one point: On Fedora and Fedora CoreOS (assuming we don't change any defaults) users won't need to do anything to "enable this boost". Correct?
So if you boot up Fedora (or Fedora CoreOS) in F31+ you'll get it by default. No action needed.
Dusty
Il giorno 4 ott 2019, alle ore 19:21, Dusty Mabe dusty@dustymabe.com ha scritto:
On 10/4/19 1:07 PM, Paolo Valente wrote:
Il giorno 4 ott 2019, alle ore 17:32, Dusty Mabe dusty@dustymabe.com ha scritto:
On 10/4/19 11:06 AM, Paolo Valente wrote:
Hi, I'm Paolo, the main developer of the BFQ I/O scheduler.
Hi Paolo!
Hi
The switch to the BFQ I/O scheduler by Fedora paves the way to up to a ~10X throughput boost, and up a ~400X latency reduction. This performance improvement concerns I/O workloads generated by multiple containers that share common storage devices. Actually it concerns, in general, also workloads generated by multiple groups, VMs or entities of any kind.
The reason for these apparently impressive numbers is that all other solutions for controlling I/O severely underutilize the speed of storage devices (usually between 10 and 20%).
If so, why probably you have never been warned about such an impressive waste of resources? Because it is extremely difficult to guarantee bandwidths and latencies on a loaded drive. So the most common solution for avoiding starvation, or very high latencies, has always been to keep storage devices underutilized. When an underutilized device is hit by the I/O of some container/group/VM, it is likely to serve this I/O very quickly, because it is unlikely to be already busy serving other I/O. If the I/O demand grows, then one simply adds more drives, so as to keep utilization low. And when this stops scaling, one goes buy faster drives.
More clever solutions do exist. They are based on I/O throttling. But, depending on the workload, these solutions may happen to forcibly lower utilization to about the same values reached with the above solution.
In contrast, BFQ is smart enough to highly utilize drives, with every workload. So, using, e.g., only one drive, BFQ satisfies an I/O demand that requires from 5 to 10 drives with the other solutions.
If you want to take advantage of this performance boost in Fedora CoreOS, I'm willing to help in every step.
It looks like the original request for this was made to Fedora in [1] and applied to F31+.
Fedora CoreOS uses the same systemd from Fedora so unless we explicitly decide against it we'll be using what Fedora does. I don't see any reason to differ here.
Paolo, does that match your understanding?
Yep. The issue I wanted to address with this topic is that maybe few people know about the 10X throughput they can get, with BFQ, for container workloads. And now that they know it, maybe they still don't know how to enable this boost (fortunately, it is extremely easy). So I'm saying mainly "hey, here I am to help!" :)
Just checking this one point: On Fedora and Fedora CoreOS (assuming we don't change any defaults) users won't need to do anything to "enable this boost". Correct?
Thank you very much for this useful question; it helped me realize that I didn't explain the main problem at all, sorry.
The answer to your question is yes and no. The 'yes' is because, BFQ is already there, as you rightly point out.
The 'no' is the tricky part. The main issue is that BFQ cannot make a disk reach a higher throughput than that requested by the workload. Let me give a simple example. If the only process doing I/O on a disk does a read of 1 MB every second, then the maximum possible throughput that can be reached by the disk is 1 MB/s. A bad solution for controlling I/O may cause throughput to be below 1 MB/s, but no solution could go above 1 MB/s, simply because no more than that is being requested.
So, if a user/sysadmin keeps disk bandwidths underutilized, because this is their long-standing practice for guaranteeing bandwidth and latency, and because they are not aware of what they can now do with BFQ, then nothing changes for them, even if now I/O is scheduled by BFQ.
I hope my concern is clearer now.
The goal of this topic is to spread the word, and offer help.
Thanks, Paolo
So if you boot up Fedora (or Fedora CoreOS) in F31+ you'll get it by default. No action needed.
Dusty
On 10/4/19 1:48 PM, Paolo Valente wrote:
Il giorno 4 ott 2019, alle ore 19:21, Dusty Mabe dusty@dustymabe.com ha scritto:
Just checking this one point: On Fedora and Fedora CoreOS (assuming we don't change any defaults) users won't need to do anything to "enable this boost". Correct?
Thank you very much for this useful question; it helped me realize that I didn't explain the main problem at all, sorry.
The answer to your question is yes and no. The 'yes' is because, BFQ is already there, as you rightly point out.
The 'no' is the tricky part. The main issue is that BFQ cannot make a disk reach a higher throughput than that requested by the workload. Let me give a simple example. If the only process doing I/O on a disk does a read of 1 MB every second, then the maximum possible throughput that can be reached by the disk is 1 MB/s. A bad solution for controlling I/O may cause throughput to be below 1 MB/s, but no solution could go above 1 MB/s, simply because no more than that is being requested.
So, if a user/sysadmin keeps disk bandwidths underutilized, because this is their long-standing practice for guaranteeing bandwidth and latency, and because they are not aware of what they can now do with BFQ, then nothing changes for them, even if now I/O is scheduled by BFQ.
I hope my concern is clearer now.
My interpretation of what you're saying is:
"BFQ is there now, but people will need to change their applications/workloads to really take advantage of it.
Is that correct?
The goal of this topic is to spread the word, and offer help.
Thanks to you!
Il giorno 4 ott 2019, alle ore 20:17, Dusty Mabe dusty@dustymabe.com ha scritto:
On 10/4/19 1:48 PM, Paolo Valente wrote:
Il giorno 4 ott 2019, alle ore 19:21, Dusty Mabe dusty@dustymabe.com ha scritto:
Just checking this one point: On Fedora and Fedora CoreOS (assuming we don't change any defaults) users won't need to do anything to "enable this boost". Correct?
Thank you very much for this useful question; it helped me realize that I didn't explain the main problem at all, sorry.
The answer to your question is yes and no. The 'yes' is because, BFQ is already there, as you rightly point out.
The 'no' is the tricky part. The main issue is that BFQ cannot make a disk reach a higher throughput than that requested by the workload. Let me give a simple example. If the only process doing I/O on a disk does a read of 1 MB every second, then the maximum possible throughput that can be reached by the disk is 1 MB/s. A bad solution for controlling I/O may cause throughput to be below 1 MB/s, but no solution could go above 1 MB/s, simply because no more than that is being requested.
So, if a user/sysadmin keeps disk bandwidths underutilized, because this is their long-standing practice for guaranteeing bandwidth and latency, and because they are not aware of what they can now do with BFQ, then nothing changes for them, even if now I/O is scheduled by BFQ.
I hope my concern is clearer now.
My interpretation of what you're saying is:
"BFQ is there now, but people will need to change their applications/workloads to really take advantage of it.
Yes. More precisely, this holds true for professionals (or power users), who configure/size their storage resources so as to keep the latter underutilized.
Now that this message seems clear, let me complete the picture. There is one more (and last) set of professionals who would see no benefit: those who use throttling-based solutions to control I/O. BFQ cannot boost throughput for them, because throughput is forcibly choked by throttling. The solution is just to stop using throttling.
I think this is more or less all.
Thank you again for your questions, Paolo
Is that correct?
The goal of this topic is to spread the word, and offer help.
Thanks to you!
coreos@lists.fedoraproject.org