Re: Server critical path definition proposal

20 Oct 2022


      On Thu, 2022-10-20 at 07:35 +0200, Martin Pitt wrote:
...
Hello Adam,
Adam Williamson [2022-10-19  8:31 -0700]:
...
On Wed, 2022-10-19 at 07:13 +0200, Martin Pitt wrote:
...
Hello Adam,
Adam Williamson [2022-10-18 10:37 -0700]:
...
that would define a critical path for Server. That would mean updates
containing any of those packages or their dependencies would be
'critical path' updates, meaning they have higher requirements to be
pushed stable (+2 karma or 14 days waiting, as opposed to +1 karma or 7
days waiting for regular updates)
What is the reason for increasing the waiting period, what would it achieve?
It's essentially a byproduct. Please see the devel@ discussion I
linked. We do not currently have any other good way to gate updates on
the openQA tests. We can invent one but it's substantially more work
than just adding things to the critpath.
The devel@ thread does not mention/justify the inceased waiting period at all.
So it's just intrinsic to the "critical path" definition? Because all of its
other properties totally make sense with any other explicit wait time.
Ah, yes, sorry if that wasn't clear. That is essentially the entire
original practical purpose of critpath: packages in the critpath are
subject to increased requirements. Exactly what the requirements for
critpath and non-critpath packages *are* has changed several times over
the years, but as of now, AIUI, non-critpath updates can be pushed
stable if they have +1 karma or have been in updates-testing for a
week. critpath updates can be pushed stable if they have +2 karma or
have been in updates-testing for two weeks.
Note, if you set e.g. +3 autopush on your update, that means it will be
*automatically* pushed when it reaches +3 with no -1s, but you can
still *manually* push it as long as it reaches the requirements above.
The minimum requirements act as floors for autopush; you can't set an
autopush of 1 for a critpath update. Well, you shouldn't be able to. I
think we've had bugs with that before...
I came along later and kinda piggybacked on the critical path as the
set of packages to run openQA updates tests on, simply because I needed
*some* way to reduce the set from "all updates" and it was handy and
seemed kind appropriate.
...
...
I'm 75% done with the "do it by adding things to critpath" approach because
nobody objected to that idea on devel@ , so it would've been great to have
this feedback in *that* discussion :(
Sorry, I'm not on devel@ (email overflow, plus I was on PTO when that
happened).
...
The issues you raise are all reasonable, but kind of not something new:
we've never had an official way to override the critical path
requirements if we think a package has good enough CI, so if you want
to call that a problem, it's been a problem forever.
Yes, it is. It seems utterly hard to get Fedora to do a "tests gate by default"
policy, as pretty much everything is opt-in. Opt-in gating is essentially a
complete waste of time IMO and just makes everyone involved grumpy.
Note, gating acts in a kind of "required, but not sufficient" way
regarding stable push. An update passing gating doesn't mean it can be
pushed stable: it has to pass gating *and* meet the karma/wait time
requirement.
Aside from that: so, as for having a "gate on all tests" kind of
thing...until fairly recently we couldn't really do that. The tricky
bit is knowing what "all tests" are for any given Thing. But since:
https://pagure.io/fedora-infrastructure/issue/8989#comment-802213
happened, I guess we could - theoretically, at least - do that. We
would need to make sure both Fedora CI and openQA are submitting queued
"results" for every test they schedule (openQA does not do this
currently, partly because I'd forgotten about this whole thing until
reading this mail). Then I guess it would be possible to somehow teach
greenwave about this; we'd have to invent a new kind of "all tests"
requirement type for greenwave and then teach greenwave that when it
sees that requirement type, it needs to query resultsdb for "queued"
results to figure out what tests are actually scheduled for the thing
in question. There's still a kinda small race condition there - gating
status would I guess be "go" for a small window after the Thing was
created but before the first test system managed to create a "queued"
result. I guess this requirement type could assume there must be at
least *one* test run on the Thing to avoid that issue? Anyway, yeah,
it's an engineering challenge that I guess just nobody really got
around to yet.
It does seem like a tricky thing in a way too, because the whole gating
system is designed to be fairly open: we have two main test systems ATM
(openQA and Fedora CI), but in theory anyone else can set one up and
submit test results that would be considered by greenwave; all it needs
is credentials to submit results to resultsdb. And we could add new
tests that run on many things ("generic" tests) to Fedora CI or openQA
at any time, really. So to use this you'd have to have a high degree of
trust that all the tests added by anyone running a 'production' test
system for Fedora would be sensible to gate your update on. Or be
prepared to issue a lot of waivers, I guess.
...
...
We can leave cockpit out of the definition if you like, but that would
mean it won't be gated by the distribution-wide gating policy. As I
mentioned in the devel@ discussion, an alternative is to configure
gating in the package repos - see
https://gating-greenwave.readthedocs.io/en/latest/policies.html#remoterule-c...

but this requires the package maintainer(s) to maintain the gating

policy.
Fully agreed, let's not do this. The current form of gating.yml is ... not
useful and just a pain.
...
It also has the disadvantage that we don't resolve dependencies like we do
for the critpath. If we put cockpit in the critpath definition, then cockpit
*and all its dependencies* are added to the critpath, and thus will have the
tests run and be gated on them. So if some dependency of cockpit breaks
cockpit, we'll find out, and the update will be gated.
Ah, that is a good thing! The thread only loosely mentioned this:
...
...
Ideally, we should have reverse dependency updates to trigger FreeIPA tests
and give their results a visibility but don't block on failures.
(FWIW, that is not my definition of "ideally"). If this is the case, I'm in.
We can still do the "increase Karma" workaround, if the waiting period is not
the primary reason why you want to do this, but just a byproduct of using
critical-path. Then that would feel much less like an abuse.
Yeah, my point here is definitely not to apply the increased
requirements, that's just a byproduct.
On the idea of having test systems submit feedback: I thought about
this, and the more I think about it, the less it seems like a hack and
the more it seems, well, appropriate. Why isn't it just...the right
thing to do? openQA exercises cockpit pretty extensively. Your tests
also do. Assuming your CI system does actually test the Fedora update
in a correct Fedora context, like openQA does...it seems totally valid
to me that both our test systems should submit a +1 to Bodhi if the
tests pass. Why not? They're testing the software like a human would
(probably better). If I was doing manual testing I might just open up
Cockpit, click around a bit, and say yeah, that looks good and give it
a +1; if that's what we're expecting of humans, what's wrong with
having an automated test system that does things rather more thoroughly
give +1s?
So, if we do that...your system is 1, my system is 1, that makes 2,
which is...all the feedback the updates need to be pushed stable. ;)
-- 
Adam Williamson
Fedora QA
IRC: adamw | Twitter: adamw_ha
https://www.happyassassin.net

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

Re: Server critical path definition proposal