Time to talk about https://www.fedoraproject.org/wiki/Fedora_36_Final_Release_Criteria#Default_... again!
Lots of desktop-app-related blockers this time around, and last time too. I think we're hitting a symptom-of-our-success problem here: increasing popularity and reviews noting how polished everything is makes us very much want to build on that. So I understand why this is here, including the expanded "all installed applications" Workstation criteria.
But I think we might be using the wrong tool for some of this polish, and I think we also need to give ourselves some escape hatches.
By way of concrete example, the Photos application is meant to be a photo organizer, so "album picker duplicates fields, preventing photo organization" https://bugzilla.redhat.com/show_bug.cgi?id=2081291 is easy to classify as failing the basic functionality test currently. But, it's really painful for that to be blocking the release.
So, ideas for discussion:
1. I know we have had GNOME Test days, but this stuff didn't come up. Presumably, it would have if someone had happened to look at GNOME Photos. Can we formally go through https://www.fedoraproject.org/wiki/QA:Testcase_desktop_app_basic and https://www.fedoraproject.org/wiki/QA:Testcase_desktop_app_basic_others much, much earlier (probably when the GNOME pre-release is available), either as part of a test day or some other formal thing? Or is there something else going on here that I'm not looking closely enough to see?
(Same for KDE and any other theoretical release-blocking desktops!)
2. "Basic functionality" seems scoped too broadly currently. I propose, for the release criteria, we change this to: A) "the app doesn't crash on launch", and B) "the app's behavior does not seem immediately embarrassing with a few minutes of playing around with it".
As a barometer for "embarrasing", you can imagine me trying to explain the issue to a tech reporter, and weigh how awkward I will feel saying "this is fine" vs. how I will feel explaining that we delayed the whole release for that same issue.
3. Problems found which are not regressions should not be blockers. We're just hurting ourselves when we make this our forcing function to get something fixed.
Some of these could be Prioritized Bugs, but I don't want to overload that too much.
I propose that the teams responsible for blocking desktop deliverables keep their own prioritized lists of this kind of problem that the team agrees should be fixed for a good user experience. Not just add to the general queues of bugs or tickets, but specific lists of "application experience issues".
These lists, of course, could include problems which also don't qualify for point #2 but which seem important. Like the Photos app issues.
(This could also extend to "desktop experience issues" rather than just "application". Or for that matter there could be a similar mechanism for non-desktop blocking deliverables.)
I could be convinced either way on having these in the teams' issue trackers in pagure or whereever _or_ having it as more targets in the blockerbugs app. I tend towards the latter: I think it might help with the problem where blockers feel like the only obvious way to bugs tracked and fixed. But either way, there should be lists!
4. Desktop application problems discovered during at the last minute should not be blockers. If the problem is really going to impact a lot of people, it should have been discovered in the beta. (Exception for _new_ regressions, of course.) By the time we're in final freeze, this ends up being hero work for everyone.
I don't know how to phrase this in a way that doesn't make Adam sad with me, but maybe something like: Desktop application blockers discovered during the final freeze are automatically waived unless the relevant Spin or Edition team decides otherwise.
That lets us still block if something is really bad and just happened to slip through.
5. Okay, and... bigger: we should aim for more approaches which let us decouple as much as possible from the Release. (My grand hope is that we can release every deliverable on its own schedule, but I also understand the _highly aspirational_ nature of that idea. But...) What if we could just easily ship GNOME Photos from GNOME 41 until a fix is found in the updated one?
On Tue, 2022-05-03 at 18:13 -0400, Matthew Miller wrote:
Time to talk about https://www.fedoraproject.org/wiki/Fedora_36_Final_Release_Criteria#Default_... again!
Lots of desktop-app-related blockers this time around, and last time too. I think we're hitting a symptom-of-our-success problem here: increasing popularity and reviews noting how polished everything is makes us very much want to build on that. So I understand why this is here, including the expanded "all installed applications" Workstation criteria.
To be clear, it's not exactly that the Workstation x86_64 requirement is expanded, but that the other requirements are reduced. Up until a couple of years ago, the requirement was "all installed applications" for all release-blocking desktops, on all release-blocking arches. We narrowed it down to being "all installed applications" for Workstation on x86_64, and just the specified list of apps for other cases (KDE on any arch, Workstation on aarch64).
But I think we might be using the wrong tool for some of this polish, and I think we also need to give ourselves some escape hatches.
By way of concrete example, the Photos application is meant to be a photo organizer, so "album picker duplicates fields, preventing photo organization" https://bugzilla.redhat.com/show_bug.cgi?id=2081291 is easy to classify as failing the basic functionality test currently. But, it's really painful for that to be blocking the release.
So, ideas for discussion:
- I know we have had GNOME Test days, but this stuff didn't come up. Presumably, it would have if someone had happened to look at GNOME Photos. Can we formally go through https://www.fedoraproject.org/wiki/QA:Testcase_desktop_app_basic and https://www.fedoraproject.org/wiki/QA:Testcase_desktop_app_basic_others much, much earlier (probably when the GNOME pre-release is available), either as part of a test day or some other formal thing? Or is there something else going on here that I'm not looking closely enough to see?
Well, kinda, yes. The thing going on is that we *did* go through that test at several earlier points:
https://openqa.fedoraproject.org/testcase_stats/36/Desktop/QA_Testcase_deskt...
but these bugs weren't discovered at that time. This is likely because of your idea 2: "basic functionality" is a bit up for debate. You can take an extremely minimalist approach to this (run the app, click a couple of buttons, say it's OK if nothing explodes and no babies are eaten) or a slightly more maximalist one (run the app, and actually try and do something useful with it). In this case, before Final, when we ran this test we mostly did the minimalist thing. At Final RC stage, we went a bit more maximalist.
(Same for KDE and any other theoretical release-blocking desktops!)
"Basic functionality" seems scoped too broadly currently. I propose, for the release criteria, we change this to: A) "the app doesn't crash on launch", and B) "the app's behavior does not seem immediately embarrassing with a few minutes of playing around with it".
As a barometer for "embarrasing", you can imagine me trying to explain the issue to a tech reporter, and weigh how awkward I will feel saying "this is fine" vs. how I will feel explaining that we delayed the whole release for that same issue.
I agree with the sentiment but I'm not sure about the phrasing. It's extremely subjective, and I don't think subjective criteria work very well. It also wouldn't necessarily "solve the problem": the bugs we discovered this time really are pretty embarrassing, honestly. Imagine doing a keynote showing off the sleek default apps included in GNOME, running them, and trying to do...well...actually anything at all useful with them. It wouldn't go very well.
- Problems found which are not regressions should not be blockers. We're just hurting ourselves when we make this our forcing function to get something fixed.
This is one of those things that sounds great until it doesn't. I can't quite recall any specifics, but I definitely think there have been cases recently where we've had strong support for a bug that is not a regression to be a blocker. Sometimes a bug is just really bad but we didn't see it before; even if you can make a wonk-y argument that there's no point making it a blocker because if we do, it just means the previous release stays as the "current" one for longer and it has the same bug, in practice it's hard to hold that line when now there's a bug report that everyone can see that says how badly this thing is broken.
It also, again, wouldn't have solved this problem, because most of these bugs *are* regressions, IIRC. The Photos bugs weren't in F35, it worked better there.
I propose that the teams responsible for blocking desktop deliverables keep their own prioritized lists of this kind of problem that the team agrees should be fixed for a good user experience. Not just add to the general queues of bugs or tickets, but specific lists of "application experience issues".
I am just gonna leave this here for context, and [snip] to:
Desktop application problems discovered during at the last minute should not be blockers. If the problem is really going to impact a lot of people, it should have been discovered in the beta. (Exception for _new_ regressions, of course.) By the time we're in final freeze, this ends up being hero work for everyone.
I don't know how to phrase this in a way that doesn't make Adam sad with me, but maybe something like: Desktop application blockers discovered during the final freeze are automatically waived unless the relevant Spin or Edition team decides otherwise.
You're right that this makes me sad. I don't think it's a good approach. I think it's an attempt to solve a problem that I would maybe look at differently, and which we're currently discussing in a ticket:
https://pagure.io/fedora-workstation/issue/304
For me, the big question your mail never quite arrived at is, *why* did these bugs show up in Fedora 36 Final RCs at all? They really should not have done. They are bugs in applications that are, supposedly, core parts of upstream GNOME. They appear in 42.0 releases of those applications - i.e. in releases of those applications that are, according to upstream's versioning scheme, stable releases for public consumption.
Stable releases of core components of a major desktop should never contain bugs like "deleting contacts sometimes doesn't work" or "you can't add photos to an album in the Photos application because the dialog where you're supposed to do it is completely broken and the list entries multiply like rabbits who've been dosed up on viagra". Distribution validation testing is not *for* finding bugs like this.
The reason we're all instinctively feeling that something is Not Right here is that something *is* Not Right, but the big thing that's Not Right is the upstream GNOME release process. It's not (IMHO) any part of the Fedora process - there are things we could tighten up there, but I see those as subsidiary problems.
It's Not Right that GNOME can ship a 42.0 release containing entirely broken applications. That should not be happening. It's not something we should have to design our Fedora distribution validation testing process to fix.
The criterion and the test case were written with the unspoken, but IMHO reasonable, assumption that in general we could trust desktops to provide us with more-or-less-working software. They were never written with the intent of finding this kind of bug. The scenario I was envisioning when we wrote them was "oh, we accidentally packaged a broken development version of app X" or "we're still including random third-party app Y in the Workstation edition but it's not been maintained for years and doesn't work any more, let's throw it out". I was never envisaging having to deal with "GNOME shipped us completely broken applications in a stable release". I don't think our goal should be to design a release validation process that deals with that, because *that shouldn't happen*.
- Okay, and... bigger: we should aim for more approaches which let us decouple as much as possible from the Release. (My grand hope is that we can release every deliverable on its own schedule, but I also understand the _highly aspirational_ nature of that idea. But...) What if we could just easily ship GNOME Photos from GNOME 41 until a fix is found in the updated one?
I mean, we maybe could. I dunno if we tried that yet. There's not necessarily anything in the current rules/policies that precludes this, AFAIR.
On Wed, May 4, 2022 at 2:15 AM Adam Williamson adamwill@fedoraproject.org wrote:
- I know we have had GNOME Test days, but this stuff didn't come up. Presumably, it would have if someone had
happened
to look at GNOME Photos. Can we formally go through https://www.fedoraproject.org/wiki/QA:Testcase_desktop_app_basic and
https://www.fedoraproject.org/wiki/QA:Testcase_desktop_app_basic_others
much, much earlier (probably when the GNOME pre-release is available), either as part of a test day or some other formal thing? Or is there something else going on here that I'm not looking closely enough to
see?
Well, kinda, yes. The thing going on is that we *did* go through that test at several earlier points:
https://openqa.fedoraproject.org/testcase_stats/36/Desktop/QA_Testcase_deskt...
but these bugs weren't discovered at that time. This is likely because of your idea 2: "basic functionality" is a bit up for debate. You can take an extremely minimalist approach to this (run the app, click a couple of buttons, say it's OK if nothing explodes and no babies are eaten) or a slightly more maximalist one (run the app, and actually try and do something useful with it). In this case, before Final, when we ran this test we mostly did the minimalist thing. At Final RC stage, we went a bit more maximalist.
I wouldn't call that maximalist, that would mean testing everything possible. A *realistic* approach is more appropriate, I think. To "do something useful with it" is a great description. What good is it if it can start and survive a few button clicks, if you can't do useful tasks? The tasks it was actually designed for. What's the point of shipping a photo organizer where you can't create and organize albums? This is the realistic scenario, and ideally the thing we would always try to test. Of course, not always people have time to do that, and only some pathways can be broken while not others.
Discovering bugs in the realistic scenarios requires time and also some luck. At the same time, bugs in realistic scenarios will very likely prevent actual users from actually using that app, even if just certain pathways are broken. Even if some of those bugs seem trivial to discover, they are not. If you edit any other field than the email address in gnome-contacts, you'll not see contacts duplication. So as a QA, you only have a certain chance to spot it. In the real world with regular usage, however, sooner or later you'll definitely edit someone's email address and then you'll see that contact doubled. (This is just a nice example, this particular bug was not accepted as a blocker, as it is not ground-breaking).
On Tue, May 03, 2022 at 05:14:53PM -0700, Adam Williamson wrote:
but these bugs weren't discovered at that time. This is likely because of your idea 2: "basic functionality" is a bit up for debate. You can take an extremely minimalist approach to this (run the app, click a couple of buttons, say it's OK if nothing explodes and no babies are eaten) or a slightly more maximalist one (run the app, and actually try and do something useful with it). In this case, before Final, when we ran this test we mostly did the minimalist thing. At Final RC stage, we went a bit more maximalist.
Yeah, the tension: we're moving towards more polish, but the available time to fix gets smaller. And it's not like fixing the things that are at that "more maximalist" end takes less time.
As a barometer for "embarrasing", you can imagine me trying to explain the issue to a tech reporter, and weigh how awkward I will feel saying "this is fine" vs. how I will feel explaining that we delayed the whole release for that same issue.
I agree with the sentiment but I'm not sure about the phrasing. It's extremely subjective, and I don't think subjective criteria work very well. It also wouldn't necessarily "solve the problem": the bugs we
Yesssss, sorry. That was meant to be a sentiment-based way to hopefully convey what I'm trying to say, not a suggested phrasing.
discovered this time really are pretty embarrassing, honestly. Imagine doing a keynote showing off the sleek default apps included in GNOME, running them, and trying to do...well...actually anything at all useful with them. It wouldn't go very well.
This is why I don't do live demos. :)
- Problems found which are not regressions should not be blockers. We're just hurting ourselves when we make this our forcing function to get something fixed.
This is one of those things that sounds great until it doesn't. I can't quite recall any specifics, but I definitely think there have been cases recently where we've had strong support for a bug that is not a regression to be a blocker. Sometimes a bug is just really bad but we didn't see it before; even if you can make a wonk-y argument that there's no point making it a blocker because if we do, it just means the previous release stays as the "current" one for longer and it has the same bug, in practice it's hard to hold that line when now there's a bug report that everyone can see that says how badly this thing is broken.
I feel like I can defend that line pretty confortably. Send 'em to me. :)
In seriousness, I think we look to blocking the relase because it's such a big signal. The train comes to a stop and everyone can see. But it's a big signal _because it's painful_, and pain isn't the best motivator. And while that pain does sometimes land on people who can do something, even if that _would_ help, it's indiscriminate.
In the metaphor, people can't get to where they are going, the tracks are blocked, etc., etc., and none of that has anything to do with the issue to fix. (I'm tempted to keep running with the train metaphor, but reluctantly will stop.)
This is the idea of Prioritized Bugs, and I think it's relatively successful. It's another way to make sure we're focusing on serious problems. Making that more visible could help.
I don't know how to phrase this in a way that doesn't make Adam sad with me, but maybe something like: Desktop application blockers discovered during the final freeze are automatically waived unless the relevant Spin or Edition team decides otherwise.
You're right that this makes me sad. I don't think it's a good approach. I think it's an attempt to solve a problem that I would maybe look at differently, and which we're currently discussing in a ticket: https://pagure.io/fedora-workstation/issue/304
For me, the big question your mail never quite arrived at is, *why* did these bugs show up in Fedora 36 Final RCs at all? They really should not have done. They are bugs in applications that are, supposedly, core parts of upstream GNOME. They appear in 42.0 releases of those applications - i.e. in releases of those applications that are, according to upstream's versioning scheme, stable releases for public consumption.
Stable releases of core components of a major desktop should never contain bugs like "deleting contacts sometimes doesn't work" or "you can't add photos to an album in the Photos application because the dialog where you're supposed to do it is completely broken and the list entries multiply like rabbits who've been dosed up on viagra". Distribution validation testing is not *for* finding bugs like this.
The reason we're all instinctively feeling that something is Not Right here is that something *is* Not Right, but the big thing that's Not Right is the upstream GNOME release process. It's not (IMHO) any part of the Fedora process - there are things we could tighten up there, but I see those as subsidiary problems.
It's Not Right that GNOME can ship a 42.0 release containing entirely broken applications. That should not be happening. It's not something we should have to design our Fedora distribution validation testing process to fix.
So, yeah, I think there's something to do this. There were a lot of last-minute-feeling kinds of changes not just to applications but also with, like the dark-mode desktop background stuff.
Shouldn't it be rare that we're discovering general "this doesn't work" issues _that aren't Fedora Linux specific_ in Fedora QA?
Also, I think some of the time-tension (from the start of this message) stems from an assumption that isn't working out: "this app might be rough at beta, but will more polished by the time we're doing final validation, because upstream is working on that".
Maybe this is something we can bring to GUADEC to work together to improve. (Virtual at the end of July.)
The criterion and the test case were written with the unspoken, but IMHO reasonable, assumption that in general we could trust desktops to provide us with more-or-less-working software. They were never written with the intent of finding this kind of bug. The scenario I was envisioning when we wrote them was "oh, we accidentally packaged a broken development version of app X" or "we're still including random third-party app Y in the Workstation edition but it's not been maintained for years and doesn't work any more, let's throw it out". I was never envisaging having to deal with "GNOME shipped us completely broken applications in a stable release". I don't think our goal should be to design a release validation process that deals with that, because *that shouldn't happen*.
This all makes sense.
I still think my suggestion might help even then, for the same reason I gave to Kamil: it flips the default pressure, moves it away from QA, away from the threat of stopping the whole train and motivation-through-guilt.
- Okay, and... bigger: we should aim for more approaches which let us decouple as much as possible from the Release. (My grand hope is that we can release every deliverable on its own schedule, but I also understand the _highly aspirational_ nature of that idea. But...) What if we could just easily ship GNOME Photos from GNOME 41 until a fix is found in the updated one?
I mean, we maybe could. I dunno if we tried that yet. There's not necessarily anything in the current rules/policies that precludes this, AFAIR.
For small things, it might just be a matter of shipping an old RPM. Maybe with the dreaded Epoch. But in a lot of cases it probably means we are gonna need a bigger [container].
On Wed, May 4, 2022 at 12:13 AM Matthew Miller mattdm@fedoraproject.org wrote:
Time to talk about
https://www.fedoraproject.org/wiki/Fedora_36_Final_Release_Criteria#Default_... again!
Lots of desktop-app-related blockers this time around, and last time too. I think we're hitting a symptom-of-our-success problem here: increasing popularity and reviews noting how polished everything is makes us very much want to build on that. So I understand why this is here, including the expanded "all installed applications" Workstation criteria.
As Adam already noted, it is actually cut down. We used to have higher standards in this area (and we lowered them because we couldn't keep them, especially when KDE ships a bazillion of preinstalled apps).
So, ideas for discussion:
- I know we have had GNOME Test days, but this stuff didn't come up. Presumably, it would have if someone had happened to look at GNOME Photos. Can we formally go through https://www.fedoraproject.org/wiki/QA:Testcase_desktop_app_basic and https://www.fedoraproject.org/wiki/QA:Testcase_desktop_app_basic_others much, much earlier (probably when the GNOME pre-release is available), either as part of a test day or some other formal thing? Or is there something else going on here that I'm not looking closely enough to see?
We could have a GNOME Apps Test Day, or perhaps a GNOME Low Profile Apps Test Day. Separating that from GNOME DE basics + settings etc would perhaps motivate people to focus more on those apps and spend more time with them.
- "Basic functionality" seems scoped too broadly currently. I propose, for the release criteria, we change this to: A) "the app doesn't crash on launch", and B) "the app's behavior does not seem immediately embarrassing with a few minutes of playing around with it".
If you only want to block on app launch and close, let's be honest about it and call it that. The basic functionality requirement can stay for those high-profile apps listed explicitly in the criterion, and the other apps would only be required to launch and close without crashing.
However, that's a *massive* step down in quality. Do we really want that?
Your B) is too vague for me. If we ship with our current photos/contacts/etc bugs, I'll feel embarrassed.
If needed, we can try to define "basic functionality" clearer. For example: a) We can say that the tested feature must be in line with the primary goal of the application. For Nautilus, that would probably be managing local files, but not connecting to remote filesystems, or a functional bookmarking system. For Photos, that would be local albums organization (and perhaps viewing remote ones), but not exporting photo thumbnails. For Cheese, that would be recording from your camera, but not applying effects. b) We can say that functionality which is only available through app menus (as opposed to user facing buttons) is not basic. c) We can say that bugs which only occur if you modify default app settings do not qualify.
We can make many clarifications like these, and perhaps it would help us sometimes to decide our arguments. At the same time it can also burn us. And we'd have to decide whether we want to apply the same standards for both high-profile apps listed in the criterion and also low-profile (all the rest) apps, or if we want to have different standards.
But, if we keep at least some reduced "basic functionality" requirement for those low-profile apps, I don't think that would help the current situation. A photo organizer which can't organize photos doesn't meet that criterion, whichever way you look at it. A contacts app which duplicates contacts on edit, crashes when you add a new contact quickly, and fails to delete contacts more often than not... doesn't block the release already (however weird that sounds), so again no change.
So as I see it, we can update the basic functionality description, but as long as it affects low-profile apps, these situations will keep happening. Or we remove it completely (at least for low-profile apps), but have a massive fall in quality.
3. Problems found which are not regressions should not be blockers. We're
just hurting ourselves when we make this our forcing function to get something fixed.
Such a broad rule is a really bad idea. You're probably thinking "this bug was already there, and very few people complained, so why block our next release on it?". Yes, if we include a safeguard "and very few people complained", the proposal starts to sound more reasonable. But imagine that there was a massive disaster in our last release - e.g. a Nautilus data-corruption bug slipped through our fingers, or Anaconda ate hard drives in certain cases, stuff like that. We only found out after the release, when we received a flood of angry bug reports of people leaving Fedora for good, and by your definition... we are **prevented** from blocking our next release on that. That's surely not a good idea?
I think I understand where you're going with this, but it can't be that broad.
I propose that the teams responsible for blocking desktop deliverables keep their own prioritized lists of this kind of problem that the team agrees should be fixed for a good user experience. Not just add to the general queues of bugs or tickets, but specific lists of "application experience issues".
These lists, of course, could include problems which also don't qualify for point #2 but which seem important. Like the Photos app issues.
(This could also extend to "desktop experience issues" rather than just "application". Or for that matter there could be a similar mechanism for non-desktop blocking deliverables.)
I could be convinced either way on having these in the teams' issue trackers in pagure or whereever _or_ having it as more targets in the blockerbugs app. I tend towards the latter: I think it might help with the problem where blockers feel like the only obvious way to bugs tracked and fixed. But either way, there should be lists!
Well, if there are lists which those teams actually follow and maintain, that would of course be a good thing. Bugzilla tracker would be the easiest implementation. I'm a bit worried that those teams will get flooded with an avalanche of reports, but that's not really my problem to solve. I'm more than happy to tag e.g. Workstation bugs against the Workstation bugzilla tracker, to make them aware.
- Desktop application problems discovered during at the last minute should not be blockers.
We already have a rule for last minute blockers, and it applies to everything, not just desktop: https://www.fedoraproject.org/wiki/QA:SOP_blocker_bug_process#Exceptional_ca... I don't understand this sentence.
If the problem is really going to impact a lot of people, it should have been discovered in the beta.
But that's the point. The Photos bug is unlikely to affect many people, because its usage is probably very low. And that's why it wasn't discovered before. Because testers don't use it regularly, and if you spend one minute with it, it might seem "OK" to you, depending on which buttons you click. I wrote my thoughts about this problem in detail here: https://pagure.io/fedora-workstation/issue/304#comment-795150
At the same time, I disagree with the intention of this change. Only Beta-related things get proper testing around Beta, because it's *Beta*, and therefore we focus on *Beta* stuff. Even then, we often miss Beta-blocking bugs and discover them before Final. We can't test (not deeply, and sometimes at all) Final-related stuff, because we simply don't have time for it during Beta. Not to mention that Beta-related bugs sometimes preclude actually testing Final-related things. We can only start properly testing Final once Beta is out. Also, let's not forget that GNOME completely changes around Beta with a new major update. I simply don't see how this "should have been discovered in the beta" could happen in the real world.
(Exception for _new_ regressions, of course.) I don't know how to phrase this in a way that doesn't make Adam sad with me, but maybe something like: Desktop application blockers discovered during the final freeze are automatically waived
I'm sad as well :sad panda:. This would effectively say "we don't care about Desktop". We would ship it horribly broken. Again, if Nautilus eats your documents, it's not a blocker, just because it was discovered after Beta. Why such a broad statement? And why do you single out desktop apps in particular? And why all of them instead of some subset, like the low-profile ones?
And now a bit more technical. How do you want to decide if the bug was present before Beta, but discovered after Beta, or if it appeared after Beta? The nightly composes are only stored on our servers for 2 weeks (!). I already hit this issue this cycle when I could technically figure out when a certain regression started, but I wasn't able to, because the older composes have been already wiped. Another note, this "proof delivery" will double the load on QA, because not just we need to find the bug, we would now also need to prove that it wasn't there before a certain compose. For every proposed blocker.
unless the relevant Spin
or Edition team decides otherwise.
All the power to the working groups. I'd happily let them decide about all bugs related to their Edition, and maintain their own release criteria and everything, because that would be way less work for QA :-) But I don't suppose they'd jump in joy about this.
So unless they really want all this responsibility, perhaps it could work the way around - accepted blockers could be waived by the decision of the relevant team. It's their product after all. Of course, there should be some systematic approach, so that we don't waste much time discussing blockers which are then getting waived. And if there are many waivers in a certain area, the related criteria should be adjusted to reflect that. But in general, I think this approach could work fine.
- Okay, and... bigger: we should aim for more approaches which let us decouple as much as possible from the Release. (My grand hope is that we can release every deliverable on its own schedule, but I also understand the _highly aspirational_ nature of that idea. But...) What if we could just easily ship GNOME Photos from GNOME 41 until a fix is found in the updated one?
The answer could be Flatpak. And honestly in our criteria there's nothing preventing Workstation WG doing just that right now. But technically we're not ready yet, I guess. If this can be done with RPMs as well, let's go ahead. Ubuntu does that all the time.
After reading all these proposed changes, I have to ask - what is the main motive for proposing them? Is it to release F36 already? Is it to prevent future Fedoras from delaying as much as F36? We used to be OK with release slips. Is desktop importance lower than before? Is it the frustration from finding trivial bugs in trivial apps so close to the final release? Something else?
Because depending on what we want to achieve, perhaps there is a better way. With the current proposed changes, I believe that the end result would be less-delayed Fedora with more broken desktop apps. The reason for introducing the basic functionality criterion in the past was, iirc, a bad PR we were given in reviews when desktop apps often broke quickly after the reviewer tried to use them. It seems we'd be heading back in that direction with this proposal.
Instead of shipping broken apps, what if we had a conversation about which is better - shipping broken apps or not shipping those apps at all? And I mean this question honestly. Is it better to ship something we *know* it's in a broken state, and hopefully issue an update later, or is it better to yank it from the default install (provided it's not crucial for the desktop)? Or delay the release? I'd start with defining our priorities in this way. Is it unthinkable to have a plan like "apps from group X can only delay the release for at most Y weeks, otherwise we'll not ship them by default"? It would also make us re-evaluate whether we really need to ship everything we currently have, including unmaintained apps without any developers, or half-baked apps with just a slight community maintenance, etc. I don't mean to be derogatory to some of those gnome apps. But if those apps are problematic, is the right approach to lower the quality bar for all apps included, or should we rather make some adjustments just for the problematic set?
I understand that the Workstation team wants to have some basic functionality set present on the desktop, and that this is a painful topic for them. And perhaps shipping those apps broken will be decided to be the best option. But it seems we're discussing something completely different here instead.
(I'd also be glad if we could put the toolkit and desktop environment wars behind us, and simply ship the best in class app (let's say the best photo organizer available) with our desktop, whatever the toolkit. It would have an existing userbase, more maintenance and QA, and GNOME folks could focus on great integration (looking close to native, online accounts integration, etc) instead of writing everything from scratch. That would also avoid some of the issues we see. But I don't believe that will happen any time soon).
Kamil QA
On Wed, May 04, 2022 at 02:46:16PM +0200, Kamil Paral wrote:
popularity and reviews noting how polished everything is makes us very much want to build on that. So I understand why this is here, including the expanded "all installed applications" Workstation criteria.
As Adam already noted, it is actually cut down. We used to have higher standards in this area (and we lowered them because we couldn't keep them, especially when KDE ships a bazillion of preinstalled apps).
Okay, fair. I didn't mean "expanded" in a temporal sense, just the "expanded universe" one. But that wasn't clear.
We could have a GNOME Apps Test Day, or perhaps a GNOME Low Profile Apps Test Day. Separating that from GNOME DE basics + settings etc would perhaps motivate people to focus more on those apps and spend more time with them.
I think that would help. More on this coming in my (will write after this one!) reply to Adam's message.
- "Basic functionality" seems scoped too broadly currently. I propose, for the release criteria, we change this to: A) "the app doesn't crash on launch", and B) "the app's behavior does not seem immediately embarrassing with a few minutes of playing around with it".
If you only want to block on app launch and close, let's be honest about it and call it that. The basic functionality requirement can stay for those high-profile apps listed explicitly in the criterion, and the other apps would only be required to launch and close without crashing.
I would be fine with calling it something else.
However, that's a *massive* step down in quality. Do we really want that?
Of course not. But....
Your B) is too vague for me. If we ship with our current photos/contacts/etc bugs, I'll feel embarrassed.
I guess another way to put it is: blocking the release causes huge problems. It means people don't get updates they were looking for. It messes up the schedules of people counting on is. It derails publicity momentum. It means other parts of the project can't get the stuff _they_ have worked so hard on out to users. And more. Of course, shipping something that's broken *also* has negative effects (I don't think I need to argue for that, but can if people want!). Because there are going to be far more things that don't work perfectly than we possibly have resources to fix, it's always a balance.
So it's always a matter of "which is worse?". Right now, we mostly push that out and only (officially) consider it at basically the last minute. I think that for some things, it'd be better to consider that factor earlier.
Does that make sense? I'm not sure of the best way to do that in practice. And, I know we've _always_ worked hard on that balance with the "hybrid" approach to the schedule — I don't mean to discount that.
[things snipped but not because I didn't read them... just trying to not repeat too much...]
- Problems found which are not regressions should not be blockers. We're
just hurting ourselves when we make this our forcing function to get something fixed.
Such a broad rule is a really bad idea. You're probably thinking "this bug was already there, and very few people complained, so why block our next release on it?". Yes, if we include a safeguard "and very few people complained", the proposal starts to sound more reasonable. But imagine that there was a massive disaster in our last release - e.g. a Nautilus data-corruption bug slipped through our fingers, or Anaconda ate hard drives in certain cases, stuff like that. We only found out after the release, when we received a flood of angry bug reports of people leaving Fedora for good, and by your definition... we are **prevented** from blocking our next release on that. That's surely not a good idea?
I'm not convinced! If it is at the high end of catastrophic, to reset ourselves we might do something entirely outside of the normal process (like when we did the whole-year cycle for F20). And if it's not that bad — but still important — how does blocking the release for an issue that's already out there _help_? That data-eating Anaconda would still be on the Get Fedora page. Rogue Nautilus still out there.
Obviously we should fix it, but why is _blocking_ the tool? What benefit does it have that other approaches to prioritizing a fix wouldn't?
trackers in pagure or whereever _or_ having it as more targets in the blockerbugs app. I tend towards the latter: I think it might help with the problem where blockers feel like the only obvious way to bugs tracked and fixed. But either way, there should be lists!
Well, if there are lists which those teams actually follow and maintain, that would of course be a good thing. Bugzilla tracker would be the easiest implementation. I'm a bit worried that those teams will get flooded with an avalanche of reports, but that's not really my problem to solve. I'm more than happy to tag e.g. Workstation bugs against the Workstation bugzilla tracker, to make them aware.
I hope that the teams would be interested. As always, the main problem is triage. I think the fact that we _do_ have a well-defined (and aggressive) process for blocker bug triage is one of the reasons a lot of things end up there.
- Desktop application problems discovered during at the last minute should not be blockers.
[much more snipped -- I think the responses below cover what I'd say inline here.]
If the problem is really going to impact a lot of people, it should have been discovered in the beta.
But that's the point. The Photos bug is unlikely to affect many people, because its usage is probably very low. And that's why it wasn't discovered before. Because testers don't use it regularly, and if you spend one minute with it, it might seem "OK" to you, depending on which buttons you click. I wrote my thoughts about this problem in detail here: https://pagure.io/fedora-workstation/issue/304#comment-795150
There's a _lot_ of software with a _lot_ of bugs. When we block for not-so-frequently used Photos, it means that other bugfixes and improvements in other software with more users is held up.
At the same time, I disagree with the intention of this change. Only Beta-related things get proper testing around Beta, because it's *Beta*, and therefore we focus on *Beta* stuff. Even then, we often miss Beta-blocking bugs and discover them before Final. We can't test (not deeply, and sometimes at all) Final-related stuff, because we simply don't have time for it during Beta. Not to mention that Beta-related bugs sometimes preclude actually testing Final-related things. We can only start properly testing Final once Beta is out. Also, let's not forget that GNOME completely changes around Beta with a new major update. I simply don't see how this "should have been discovered in the beta" could happen in the real world.
There's a tension, right? At one end, the rough-to-polish direction that we're going towards the release. But coming from the other side: less and less time to fix problems as we get closer. It's hard to reconcile. Pushing the release back really only creates the _illusion_ of more time (unless we're willing to go back to slipping the schedule constantly around the calendar, which I'm really really sure we should not).
(Exception for _new_ regressions, of course.) I don't know how to phrase this in a way that doesn't make Adam sad with me, but maybe something like: Desktop application blockers discovered during the final freeze are automatically waived
I'm sad as well :sad panda:. This would effectively say "we don't care about Desktop". We would ship it horribly broken. Again, if Nautilus eats your documents, it's not a blocker, just because it was discovered after Beta. Why such a broad statement? And why do you single out desktop apps in particular? And why all of them instead of some subset, like the low-profile ones?
What we care about and what we can do are different. I care about desktop.
But anyway, you split off a crucial part of my suggestion:
Desktop application blockers discovered during the final freeze are automatically waived _unless the relevant Spin or Edition team decides otherwise_.
There's a handle to pull if escalating is warranted.
And now a bit more technical. How do you want to decide if the bug was present before Beta, but discovered after Beta, or if it appeared after Beta? The nightly composes are only stored on our servers for 2 weeks (!).
I probably shouldn't use the US Patent system as a basis for anything, but... what the heck. This is why that changed to date of filing. No complicated proof, just date proposed as a blocker. (And again, if it turns out to have been discovered earlier and _should_ be escalated, it could be.)
[...]
unless the relevant Spin
or Edition team decides otherwise.
All the power to the working groups. I'd happily let them decide about all bugs related to their Edition, and maintain their own release criteria and everything, because that would be way less work for QA :-) But I don't suppose they'd jump in joy about this.
I'd like to push more work there, yeah, and have QA covering the base, the tooling and processes and standards, and acting as consultants. But also, the original plan (which I still believe in!) was to make sure there was a QA team representative on every working group, and presumably that person would be connected with both the WG and the team on decisions like this.
So unless they really want all this responsibility, perhaps it could work the way around - accepted blockers could be waived by the decision of the relevant team. It's their product after all. Of course, there should be some systematic approach, so that we don't waste much time discussing blockers which are then getting waived. And if there are many waivers in a certain area, the related criteria should be adjusted to reflect that. But in general, I think this approach could work fine.
There are two reasons I like this the direction I proposed ('fail open' rather than 'fail safe').
First, it moves the decision earlier; right now, waiving a blocker is very late in the process — that means if it's not waived, it's definitely too late to do anything about it. (Which might particularly be an issue if the people who could implement a fix assume that it will be waived.)
More crucially, it moves the default _pressure_ from "this will ruin the schedule from everyone else so you better fix it" to "oh no! there's a bug in our Edition and we should fix it before the release is out".
- Okay, and... bigger: we should aim for more approaches which let us decouple as much as possible from the Release. (My grand hope is that we can release every deliverable on its own schedule, but I also understand the _highly aspirational_ nature of that idea. But...) What if we could just easily ship GNOME Photos from GNOME 41 until a fix is found in the updated one?
The answer could be Flatpak. And honestly in our criteria there's nothing preventing Workstation WG doing just that right now. But technically we're not ready yet, I guess. If this can be done with RPMs as well, let's go ahead. Ubuntu does that all the time.
I don't think we can easily do it at the rpm-alone level. Too much is integrated. [Mumbles something incomprehensible about modularity, stares into distance.] But I think Flatpak is definitely worth exploring to help here (plus OCI/Podman containers for non-desktop apps).
After reading all these proposed changes, I have to ask - what is the main motive for proposing them? Is it to release F36 already? Is it to prevent future Fedoras from delaying as much as F36? We used to be OK with release slips. Is desktop importance lower than before? Is it the frustration from finding trivial bugs in trivial apps so close to the final release? Something else?
In order: too late for that; yes; I know, and it was bad; absolutely not; yes; and: generally looking at how to improve things both the Edition teams (particularly desktop, obviously), QA, the release as a whole including other Editions and spins impacted, and users.
Because depending on what we want to achieve, perhaps there is a better way. With the current proposed changes, I believe that the end result would be less-delayed Fedora with more broken desktop apps. The reason for introducing the basic functionality criterion in the past was, iirc, a bad PR we were given in reviews when desktop apps often broke quickly after the reviewer tried to use them. It seems we'd be heading back in that direction with this proposal.
Instead of shipping broken apps, what if we had a conversation about which is better - shipping broken apps or not shipping those apps at all? And I mean this question honestly. Is it better to ship something we *know* it's
Do you mean disable the app so it isn't included in the repo? (If so, what does that do to people who upgrade?) Or at least hidden from Software?
It makes sense to scope the release critera to "on the media", because we have to draw a line somewhere. But shuffling things across the line by moving them from the default install doesn't _really_ improve quality from a user perspective. I don't have Photos installed on this system, but if I type "photo" in the GNOME Shell search, I'm given the suggestion to install it.
in a broken state, and hopefully issue an update later, or is it better to yank it from the default install (provided it's not crucial for the desktop)? Or delay the release? I'd start with defining our priorities in this way. Is it unthinkable to have a plan like "apps from group X can only delay the release for at most Y weeks, otherwise we'll not ship them by default"? It would also make us re-evaluate whether we really need to ship everything we currently have, including unmaintained apps without any developers, or half-baked apps with just a slight community maintenance, etc. I don't mean to be derogatory to some of those gnome apps. But if those apps are problematic, is the right approach to lower the quality bar for all apps included, or should we rather make some adjustments just for the problematic set?
I do like in general where you're going with this. This is one of the things I wanted to do with the "Rings" proposal — make it more clear what we're able to focus on and what we're not. All of that other software could be available in some way, but clearly separate.