New subject: A modest proposal: Pungi 4 compose process (what we call composes, when we do them, what information we need about them)

20 Feb 2016


      Hi folks!
So I've been working lately on revamping the release validation process
for Pungi 4 composes. I've made quite a bit of progress, but I'm now
kind of stuck, because we don't know how the full release cycle is
actually going to work with Pungi 4 composes. There are some questions
that haven't been answered:
* What will the compose IDs be for anything other than Rawhide snapshots?
* When do we compose what kinds of composes?
* What information do we need about the composes and where?
I've asked if there are any plans about this a few times, and the
answer has always been "not yet". So I figured instead of sitting
around waiting, I'd think the issues through and come up with a
proposal!
# tl;dr
(LATER) OK, this email got really long, so here's my tl;dr summary.
Proposed compose ID scheme: (RELEASE)-(DATE).(INDEX).(TYPE), e.g.
24-20160401.0.s (types are SNAPSHOT, CANDIDATE, POSTRELEASE).
Alternatively: type is stored as separate bit of metadata instead of
/ as well as in the compose ID.
Proposed additional metadata in PDC: 'nominated' (bool, whether a
SNAPSHOT was nominated for manual validation testing or a CANDIDATE
nominated as an RC), 'release' (one of a set of consts, 'ALPHA',
'BETA', 'FINAL', 'POSTRELEASE', indicating that a CANDIDATE was
released as that milestone).
Rules: all 'override' packages go in all composes, we never build more
than one compose type for one release at a time, we switch from
SNAPSHOT to CANDIDATE when all blockers are addressed and back to
SNAPSHOT after the milestone release, we switch to POSTRELEASE after
Final.
Location: we can either have PDC be the canonical store of location
information, or just have some kind of engine (whether it's still my
'fedfind' or something else) which can work out where to find a given
compose based on all the metadata mentioned in this proposal.
# Full proposal in Epic AdamW Styleee
As a pre-note: I'm really only concerning myself with the "Official
Release Process Composes" here, the composes we consider part of the
(still) more-or-less monolithic 'release cycle'. I didn't try to
think of a design that accounts for separate release cycles per
image or product or 'variant' or whatever (because jeez, this is hard
enough) and I didn't include any possible side/alternative composes
done for testing or whatever. At this point in time I don't care what
anyone wants to call or do with those, so long as I can ask PDC for
a list of the 'official' composes and just get those.
Here's my sort of sample of an imaginary Fedora 24 release cycle with
Pungi 4 composes:
DATE            SNAPSHOT            CANDIDATE                       POSTRELEASE     MILESTONE
2016-02-28      Rawhide-20160228.0.s
== BRANCH POINT ==
2016-03-01      24-20160301.0.s
                24-20160301.1.s
2016-03-02      24-20160302.0.s
== ALPHA FREEZE ==
2016-03-03      24-20160303.0.s
2016-03-08      24-20160308.0.s
== ALL BLOCKERS ADDRESSED: SWITCH TO RCs ==
2016-03-08                          24-20160308.1.c (24_Alpha_RC1)
2016-03-08                          24-20160308.2.c (24_Alpha_RC2)                  = 24 Alpha
2016-03-09      24-20160309.0.s
== ALPHA UPDATES PUSHED STABLE ==
2016-03-10      24-20160310.0.s
== BETA FREEZE ==
== ALL BLOCKERS ADDRESSED ==
2016-04-09                          24-20160409.0.c (24 Beta RC1)                   = 24 Beta
2016-04-11      24-20160411.0.s
== FINAL FREEZE ==
== ALL BLOCKERS ADDRESSED ==
2016-05-10                          24-20160510.0.c (24 Final RC1)
2016-05-11                          24-20160511.0.c (24 Final RC2)                  = 24 Final
== FINAL RELEASE ==
2016-05-25                                                          24-20160525.0.p
2016-05-26                                                          24-20160525.1.p = 24 Postrelease ("2 Week" or whatever)
Obviously, that's just extracts to highlight the interesting points. I
mapped this out a few different ways, but this is the one I liked best.
The basic ideas here are pretty simple. The naming scheme for composes
is:
(RELEASE)-(DATE).(INDEX).(TYPE)
The compose 'types' are SNAPSHOT, CANDIDATE, and POSTRELEASE. Their
shortenings for the compose IDs are 's', 'c' and 'p'. (These don't
sort "correctly" alphabetically, but that shouldn't be a problem). This
is similar to the scheme currently used for snapshots, but with a type
identifier after the index number (I don't know if '.n.' in the current
IDs is supposed to indicate "nightly" or "number" or what, but if we
want to indicate the type in the compose ID, it makes much more sense
to have it after the index than before).
Importantly, the compose IDs for a given release sort into their
release order. The only potential issue is if we have more than 9 of a
compose type on a day. To deal with that we could just make the index
two digits instead of one, or it's relatively easy to do a numeric sort
instead (just filter all the non-digit characters and do a numeric sort
on the rest).
Note that we don't really *need* to indicate the compose 'type' in the
ID. We could instead just have it in the compose metadata. I don't care
strongly either way, though I think it's maybe slightly more convenient
to have it in the ID. Note it should probably be specified separately
at least in PDC even if it's also indicated in the compose ID; it's
going to be important to be able to say "I want to find all release
'foo' composes of type 'bar'").
Rawhide is a release (we do not assign release numbers to Rawhide
composes). This is something Dennis and I agree on, and convinced the
Pungi / productmd folks on.
We do not ever do two types of compose simultaneously; we're not doing
SNAPSHOT composes while we do CANDIDATE composes. At first I kinda
envisaged this happening, but I don't think it's *necessary*, and it
makes ordering difficult if it happens. We *always* increment the index
when doing another compose on the same day, even if we're switching type
(note we go from 20160308.0.s to 20160308.1.c).
We switch from 'SNAPSHOT' to 'CANDIDATE' composes for each milestone
when all blocker bugs are addressed, just as we do now. We then switch
back to 'SNAPSHOT' composes after the release of the milestone.
We switch to 'POSTRELEASE' composes after final release, of course. This
is my attempt to include the current "2 Week Atomic" system in the
process, and I suspect we're only likely to have more desire for "post
release" composes in future.
This is not visible in the mockup, but: the only difference between
'SNAPSHOT' and 'CANDIDATE' composes besides the identifier is that
'CANDIDATE' composes have any switches that we currently flip for RCs
applied. If there aren't actually any such settings besides the ones
specific to Final (where we disable all the "this is a prerelease"
warnings), we could potentially even only have 'SNAPSHOT' composes up
until Final (Alpha and Beta could simply be blessed 'SNAPSHOT's).
Also not visible in the mockup: "compose override" packages are *always
included in all types of compose*. This is the concept Dennis and I
came up with for handling blocker / freeze exception fixes; it's just a
more formal version of the current process, really, whereby we mark
packages that should be pulled into composes. At present these are only
pulled into TCs and RCs, they never appear in the old-style "nightly
composes". I believe we should *always* pull them in; it makes the
system a good deal simpler.
Another important topic is what data we need to store somewhere to aid
things that need to interact with the compose process. I'm assuming we
are going to store all necessary metadata beyond what can be a part of
the compose metadata itself in PDC. I think we can make what PDC needs
to store quite minimal. It only needs a couple of extra attributes
beyond the compose ID and the 'type' (which should be a searchable
attribute in PDC even if it's indicated in the compose ID):
'nominated' - bool
'release' - const (from 'ALPHA', 'BETA', 'FINAL', 'POSTRELEASE')
'nominated' is definitely needed to indicate that a snapshot compose was
"nominated" for manual release validation testing. This is something we
already do; at present the wiki is the canonical source of information
on what composes have been nominated for testing, but I think this is
silly. It should be in PDC.
There's also another issue we could use 'nominated' to answer. That is:
when exactly do we build 'CANDIDATES'? Do we follow the current process
and build them only on manual request, meaning that effectively every
'CANDIDATE' is equivalent to a current RC? Or do we build a 'CANDIDATE',
say, *every time the "compose overrides" set changes*, and then
'nominate' RCs from the larger set of CANDIDATEs? If we want to do that,
then the 'nominated' attribute for CANDIDATE composes would indicate
which were selected as RCs.
'release' indicates that a CANDIDATE (or POSTRELEASE) compose was
promoted as a public 'release'. This is something the compose metadata
cannot possibly reflect, since we do not know it when the compose is
created. PDC is the logical source of such information. The set of valid
values for this attribute can be made as large as we want if we start
doing stuff like staggering "releases" for different variants, or doing
different types of post-stable "release" than "two week atomic".
That's pretty much the entire system. I had thought about things like
storing compose "identifiers" like RC2, RC3 etc. in the compose metadata
or in PDC directly, and stuff like requiring PDC to construct and store
"sequences" of releases. But with this design, I don't *think* any of
that is necessary. I believe the constraints specified in the proposal
and the information in the compose IDs and the extra PDC fields is
actually sufficient to all the tooling purposes I can think of. The idea
is that tools can simply query PDC for groups of composes and apply
logic to construct certain ideas.
For instance: say we decided we're going to build CANDIDATEs for every
change to "compose overrides", and we now want to "nominate" an RC. We
can just ask PDC for all CANDIDATEs for the current release which have
been "nominated" so far, and it's trivial to produce the sequence of RC
names from that and determine what ours should be. (To spot the
milestone changes you just look for the composes which also have the
'release' attribute). So the releng tool to stage the CANDIDATE as an RC
and the QA tool to create wiki pages can easily produce a nice "RC name"
for humans, if we want to do that.
You can use a similar approach for doing various "previous release"
comparisons or data analysis across a series of composes; all useful
series can I think be derived from the attributes suggested above.
Whether we want to use "alternative" names like '24 Final RC2' at all,
or just always use the real compose IDs, is a question, but not one we
need to settle here.
Similarly tools like fedfind that want to let the user do stuff like
"find me Fedora 24" can translate easily - just ask PDC for the
Fedora 24 compose with the 'release' attribute "FINAL".
The final question is the question of location. I don't know for sure
what the plans are here, but my guess is this:
1. All composes will land on one server first of all - kojipkgs or
wherever - in a location that can be determined based on their compose
ID
2. SNAPSHOT composes likely will never be found anywhere else.
3. CANDIDATE composes may be staged to two other places: alt (as RCs)
and the public mirrors (as releases).
4. POSTRELEASE composes may be staged to one other place: the public
mirrors (if blessed as 'releases', whatever we mean by that at any given
time). I guess we may wind up having different public locations for
different types of release.
I would *like* it if we built the necessary bits such that whenever a
release is staged somewhere, that information is transmitted to PDC, so
that you can always just ask PDC "what is the canonical location of
this compose at present?" and it would give you an answer something like
"/fedora/linux/releases/24 in the mirror tree", or an absolute location
for alt or kojipkgs. But we *could*, I guess, just have tools keep doing
what fedfind does for this: know "the rules" about where to go and find
a compose with particular attributes. That's not the ickiest bit of
fedfind, and if I have to keep maintaining it (or we have to build some
extra service that knows those rules and gives answers on request), it's
not the end of the world.
Whew, well that wound up longer than I expected, but I think the core
of the proposal is quite simple! Thoughts?
-- 
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | identi.ca: adamwfedora
http://www.happyassassin.net