I get the feeling that some of the bugs we are seeing is because we have enabled CONFIG_PREEMPT_BKL. I remember looking at the code when it came out and thinking it was too scary to enable, so I never did in my own vanilla kernels.
On Wed, 2007-03-21 at 15:01 -0400, Chuck Ebbert wrote:
I get the feeling that some of the bugs we are seeing is because we have enabled CONFIG_PREEMPT_BKL. I remember looking at the code when it came out and thinking it was too scary to enable, so I never did in my own vanilla kernels.
Well, it's likely to remain around upstream so surely it's better to fix bugs and feed them back upstream than ignore this, it'll just be painful later on IMO :-)
Jon.
On Wed, Mar 21, 2007 at 03:04:02PM -0400, Jon Masters wrote:
On Wed, 2007-03-21 at 15:01 -0400, Chuck Ebbert wrote:
I get the feeling that some of the bugs we are seeing is because we have enabled CONFIG_PREEMPT_BKL. I remember looking at the code when it came out and thinking it was too scary to enable, so I never did in my own vanilla kernels.
Well, it's likely to remain around upstream so surely it's better to fix bugs and feed them back upstream than ignore this, it'll just be painful later on IMO :-)
A few times, I've disabled an option like this in an update as an experiment just to see if "weirdo" bugs 'go away' or not. And then see if it comes back when I reenable it in a further update.
It can be useful to pinpoint problems this way, but yes, the bug wants fixing ultimately.
Dave
Jon Masters wrote:
On Wed, 2007-03-21 at 15:01 -0400, Chuck Ebbert wrote:
I get the feeling that some of the bugs we are seeing is because we have enabled CONFIG_PREEMPT_BKL. I remember looking at the code when it came out and thinking it was too scary to enable, so I never did in my own vanilla kernels.
Well, it's likely to remain around upstream so surely it's better to fix bugs and feed them back upstream than ignore this, it'll just be painful later on IMO :-)
Well yeah, but it's optional. We don't enable CONFIG_PREEMPT and that's been there for a long time...
On Wed, Mar 21, 2007 at 03:10:32PM -0400, Chuck Ebbert wrote:
Jon Masters wrote:
On Wed, 2007-03-21 at 15:01 -0400, Chuck Ebbert wrote:
I get the feeling that some of the bugs we are seeing is because we have enabled CONFIG_PREEMPT_BKL. I remember looking at the code when it came out and thinking it was too scary to enable, so I never did in my own vanilla kernels.
Well, it's likely to remain around upstream so surely it's better to fix bugs and feed them back upstream than ignore this, it'll just be painful later on IMO :-)
Well yeah, but it's optional. We don't enable CONFIG_PREEMPT and that's been there for a long time...
Which bugs in particular were you thinking might be caused by this? There's not that much generic code left under BKL these days anyway is there? And as for drivers.. just ioctls ?
Dave
On Wed, 21 Mar 2007 15:13:42 -0400, Dave Jones davej@redhat.com wrote:
Which bugs in particular were you thinking might be caused by this? There's not that much generic code left under BKL these days anyway is there? And as for drivers.. just ioctls ?
Last time I looked there was a ton of code in fs/* which required BKL. Drivers were written not to assume BKL for years, but due to general ineptitude of driver writers this wasn't really enforced.
-- Pete
FWIW, I have taken to CONFIG_PREEMPT=y in my hacking kernels because it exposed on my clunky test machines bugs that were otherwise reproduced only on big honking machines with lots of parallelism. I haven't experienced a bug that wasn't there at all with CONFIG_PREEMPT=n, only ones that were hard to reproduce in hacker's conditions but more likely in production load conditions. So, out of sight, out of mind, sure. But out of sight, lurking to bite you in the ass later when you really aren't in the mood, also damn likely.
Roland
On Wed, Mar 21, 2007 at 12:27:17PM -0700, Roland McGrath wrote:
FWIW, I have taken to CONFIG_PREEMPT=y in my hacking kernels because it exposed on my clunky test machines bugs that were otherwise reproduced only on big honking machines with lots of parallelism. I haven't experienced a bug that wasn't there at all with CONFIG_PREEMPT=n, only ones that were hard to reproduce in hacker's conditions but more likely in production load conditions. So, out of sight, out of mind, sure. But out of sight, lurking to bite you in the ass later when you really aren't in the mood, also damn likely.
It's crossed my mind to turn it on occasionally in rawhide just to shake out those hard to find bugs. The thing that's put me off has been that well, we've got enough bugs already without needing to find more right now.
Dave
* Dave Jones davej@redhat.com wrote:
It's crossed my mind to turn it on occasionally in rawhide just to shake out those hard to find bugs. The thing that's put me off has been that well, we've got enough bugs already without needing to find more right now.
it's on by default in -rt and i havent had a bug triggered by it for quite a long time. We had many SMP races triggered by PREEMPT_RT.
part of the reason of PREEMPT_BKL's reliability is because when i did it i also wrote debugging infrastructure to catch bugs that would only trigger on PREEMPT_BKL=y: the smp_processor_id() debugging code of CONFIG_DEBUG_PREEMPT. (and that infrastructure, like lockdep, catches bugs before they actually happen, by flagging buggy codepath the first time it ever executes)
Ingo
On Wed, 21 Mar 2007 12:27:17 -0700 (PDT), Roland McGrath roland@redhat.com wrote:
FWIW, I have taken to CONFIG_PREEMPT=y in my hacking kernels because it exposed on my clunky test machines bugs that were otherwise reproduced only on big honking machines with lots of parallelism. [...]
It helps with that, but I just don't trust it to work at all times. It's a really kludgy code from MontaVista, developed with embedded devices in mind. It's a certified miracle that it boots on SMP at all.
That said, I love to hoist it onto others. It really helps to flush mb() from drivers. Before, it always was a hassle to persuade driver writers that they must not do it; not worth the trouble. Now you just turn the preempt on and voila, the box crashes. Heck, I had preempt find a bug in ub once (guess what... used mb() there too -- nobody is perfect).
But running it on a production box is pure madness, IMHO.
-- Pete
* Pete Zaitcev zaitcev@redhat.com wrote:
FWIW, I have taken to CONFIG_PREEMPT=y in my hacking kernels because it exposed on my clunky test machines bugs that were otherwise reproduced only on big honking machines with lots of parallelism. [...]
It helps with that, but I just don't trust it to work at all times. It's a really kludgy code from MontaVista, developed with embedded devices in mind. It's a certified miracle that it boots on SMP at all.
actually, the current code works pretty well - and has been brought to the extreme via PREEMPT_RT. It boots fine on SMP and elsewhere, and it finds us tons of bugs in every kernel release.
Ingo
On Sat, 24 Mar 2007 17:56:17 +0100, Ingo Molnar mingo@elte.hu wrote:
FWIW, I have taken to CONFIG_PREEMPT=y in my hacking kernels because it exposed on my clunky test machines bugs that were otherwise reproduced only on big honking machines with lots of parallelism. [...]
It helps with that, but I just don't trust it to work at all times. It's a really kludgy code from MontaVista, developed with embedded devices in mind. It's a certified miracle that it boots on SMP at all.
actually, the current code works pretty well - and has been brought to the extreme via PREEMPT_RT. It boots fine on SMP and elsewhere, and it finds us tons of bugs in every kernel release.
If you say so, I take it back.
-- Pete
* Pete Zaitcev zaitcev@redhat.com wrote:
actually, the current code works pretty well - and has been brought to the extreme via PREEMPT_RT. It boots fine on SMP and elsewhere, and it finds us tons of bugs in every kernel release.
If you say so, I take it back.
CONFIG_PREEMPT alone is probably not worth having: it doesnt bring enough to the table, has overhead and doesnt satisfy the low-latency users. PREEMPT_RT is much nicer - but quite a bit more intrusive =B-)
Ingo
On Wed, 2007-03-21 at 15:10 -0400, Chuck Ebbert wrote:
Jon Masters wrote:
On Wed, 2007-03-21 at 15:01 -0400, Chuck Ebbert wrote:
I get the feeling that some of the bugs we are seeing is because we have enabled CONFIG_PREEMPT_BKL. I remember looking at the code when it came out and thinking it was too scary to enable, so I never did in my own vanilla kernels.
Well, it's likely to remain around upstream so surely it's better to fix bugs and feed them back upstream than ignore this, it'll just be painful later on IMO :-)
Well yeah, but it's optional. We don't enable CONFIG_PREEMPT and that's been there for a long time...
I'd enable CONFIG_PREEMPT myself...but that's me :-)
Jon.
* Chuck Ebbert cebbert@redhat.com wrote:
Well, it's likely to remain around upstream so surely it's better to fix bugs and feed them back upstream than ignore this, it'll just be painful later on IMO :-)
Well yeah, but it's optional. We don't enable CONFIG_PREEMPT and that's been there for a long time...
but CONFIG_PREEMPT has nontrivial overhead and impact. PREEMPT_BKL on the other hand has no overhead - in fact it helps on SMP, quite a bit at times.
Ingo
* Chuck Ebbert cebbert@redhat.com wrote:
I get the feeling that some of the bugs we are seeing is because we have enabled CONFIG_PREEMPT_BKL. I remember looking at the code when it came out and thinking it was too scary to enable, so I never did in my own vanilla kernels.
yes, we should keep it enabled - it's the default upstream and we havent had PREEMPT_BKL related bugs for a really long time.
in fact i had more !PREEMPT_BKL bugs than PREEMPT_BKL bugs. (BKL spinlock recursion for example)
Ingo
* Ingo Molnar mingo@elte.hu wrote:
I get the feeling that some of the bugs we are seeing is because we have enabled CONFIG_PREEMPT_BKL. I remember looking at the code when it came out and thinking it was too scary to enable, so I never did in my own vanilla kernels.
yes, we should keep it enabled - it's the default upstream and we havent had PREEMPT_BKL related bugs for a really long time.
in fact i had more !PREEMPT_BKL bugs than PREEMPT_BKL bugs. (BKL spinlock recursion for example)
well, PREEMPT_BKL is not the default - my points remain nevertheless.
Ingo
* Ingo Molnar mingo@elte.hu wrote:
yes, we should keep it enabled - it's the default upstream and we havent had PREEMPT_BKL related bugs for a really long time.
in fact i had more !PREEMPT_BKL bugs than PREEMPT_BKL bugs. (BKL spinlock recursion for example)
well, PREEMPT_BKL is not the default - my points remain nevertheless.
let me double-correct myself: PREEMPT_BKL is indeed default enabled on SMP:
config PREEMPT_BKL bool "Preempt The Big Kernel Lock" depends on SMP || PREEMPT default y help
so basically everyone who tests SMP kernels has it.
Ingo
kernel@lists.fedoraproject.org