Hi, folks. So, our current set of storage validation tests is just a grab-bag of stuff we've held over from oldui, added and patched piecemeal; it's not very coherent or consistent and it doesn't come close to exercising all of the storage stuff in the installer. We wind up sort of inventing test plans particularly for custom partitioning as we go along, with the consequence that we're not sure what we're going to test, what's really important to test when, etc etc.
I've made a few abortive tries at re-doing the storage tests and basically given up because it's just a hideous thing to try and cover, but I thought while I'm still on a momentum roll from F20 and remember some of the issues that came up during F20 validation, I'd take another cut at it.
Here's what I came up with:
https://fedoraproject.org/wiki/User:Adamwill/Draft_storage_matrix
The proposal is to separate out storage testing from the 'installation testing' matrix as its own matrix, because I think we can get further with flexible columns, and it's such a big area the installation matrix gets pretty unwieldy with all the storage tests stuffed into it.
remember this is *extremely* rough. It may be that we don't think my organizational approach here is right at all and we should tear it up and start over. But I thought I'd throw it out there for comments. Most of the tests, obviously, don't exist yet; they'd be fairly trivial to write, I think the hard part of the problem here is deciding what we want to test, and how to organize it.
It's an extremely difficult area; there are so many different variables to storage configuration that it's utterly impractical to try and cover every possible permutation even assuming the user uses the interface perfectly. What I've tried to do is come up with something which would exercise most of the areas we really seem to want to care about, without being utterly unwieldy.
It is still very large, that's probably the first thing you'd notice about it. I doubt we could run this on every build. What I think would be a nice complement is a somewhat improved version of testcase_stats - http://testdays.qa.fedoraproject.org/testcase_stats/ - which could track both dimensions of each matrix, so we could try to strategically ensure every spot on the table is covered at least once per milestone or once per release, say.
Here's a quick key to the test names for tests I've made up which may not be self-explanatory, yell if you need explanations of any of the others:
-----
guided_multi_empty - the 'multi' means multiple disks, this is checking we correctly autopart to multiple disks.
in the custom tests, 'auto' means 'using the autopart mechanism inside custom partitioning', the blue text that lets you automatically create a set of volumes.
existing_retain_home - this is the classic 'install over an existing Fedora install, retain the /home partition' trick.
existing_precreated - this would be a test for running the installer with a set of empty volumes that you just wanted to assign mount points to.
add_to_container - add a new volume to free space within an existing container.
assign_* tests - these are for specifying that a given partition or container must be on a particular disk.
help_text - just checks the 'help' screen works.
small_disks - this would be a test where you check anaconda correctly refuses to install to a disk that's too small.
cancel_encryption - this would be to check what happens if you cancel out of the 'enter a passphrase' dialog.
shrink_maximum - try to shrink a partition to the largest possible size (right-hand end of the slider)
shrink_minimum - shrink a partition as small as possible (left-hand end of the slider)
shrink_no_size_change - set action for a partition to 'shrink' but don't actually change its size
shrink_unusual_sizes - this would be for the bug we discovered in f20, test the shrinker handles partitions with 'weird' sizes
shrink_change_action - this is just for changing the 'action' in shrink a bunch of times and making sure it doesn't explode
multiple_trips - run through Installation Destination more than once
custom_resize_no_size - just clear the 'desired size' field for a partition and hit 'update settings'
custom_resize_invalid_unit - try entering something like '30 ZX' in the desired size field
custom_resize_return - change the desired size and then change it back to the original value
custom_invalid_mount_point - try entering a mount point name that is not allowed (invalid characters, spaces or something)
custom_invalid_filesystem - try putting a system partition on vfat or bios boot or swap or something
----
Please, absolutely all comments, suggestions, alternative proposals, flames etc etc welcome. I'm sure we could improve this proposal or do better, but I think we have to try and do _something_ better than our current tests. And I think drawing up a table like this, if nothing else, illustrates what a hard task this is...
On Fri, 2013-12-13 at 21:04 -0800, Adam Williamson wrote:
It is still very large, that's probably the first thing you'd notice about it.
I counted, for fun: even excluding the 'sanity checks', it contains 101 tests.
On Dec 13, 2013, at 10:07 PM, Adam Williamson awilliam@redhat.com wrote:
On Fri, 2013-12-13 at 21:04 -0800, Adam Williamson wrote:
It is still very large, that's probably the first thing you'd notice about it.
I counted, for fun: even excluding the 'sanity checks', it contains 101 tests.
In my opinion, every one of those tests requires a feature owner. If no one volunteers, if a hand off isn't made, the functionality for the feature represented by the sanity check shall be removed from the next version of Fedora.
Chris Murphy
On Dec 13, 2013, at 11:44 PM, Chris Murphy lists@colorremedies.com wrote:
On Dec 13, 2013, at 10:07 PM, Adam Williamson awilliam@redhat.com wrote:
On Fri, 2013-12-13 at 21:04 -0800, Adam Williamson wrote:
It is still very large, that's probably the first thing you'd notice about it.
I counted, for fun: even excluding the 'sanity checks', it contains 101 tests.
In my opinion, every one of those tests requires a feature owner. If no one volunteers, if a hand off isn't made, the functionality for the feature represented by the sanity check shall be removed from the next version of Fedora.
I note that only two are final release level. How is it so much instability/changes still exist after beta, that there are so many anaconda blockers even though only two, out of a significant pile of tests, are final release level tests?
Is it possible to freeze the installer from anything approaching new functionality after alpha? Consequences?
Chris Murphy
On Sat, 2013-12-14 at 00:06 -0700, Chris Murphy wrote:
On Dec 13, 2013, at 11:44 PM, Chris Murphy lists@colorremedies.com wrote:
On Dec 13, 2013, at 10:07 PM, Adam Williamson awilliam@redhat.com wrote:
On Fri, 2013-12-13 at 21:04 -0800, Adam Williamson wrote:
It is still very large, that's probably the first thing you'd notice about it.
I counted, for fun: even excluding the 'sanity checks', it contains 101 tests.
In my opinion, every one of those tests requires a feature owner. If
no one volunteers, if a hand off isn't made, the functionality for the feature represented by the sanity check shall be removed from the next version of Fedora.
Do you mean someone who is responsible for development of the feature, or testing it?
Right now I'm simply trying to figure out a vaguely practical approach for testing what we can of the installer's storage functions. That's really all I'm shooting for. This is one possible approach, there are many others. I mean, prior to newUI, we placed a _much_ lower emphasis on custom partitioning.
I note that only two are final release level. How is it so much instability/changes still exist after beta, that there are so many anaconda blockers even though only two, out of a significant pile of tests, are final release level tests?
Oh, sorry, I forgot to note: I gave up on the release levels after the first table or so, I figured we could work those out later. A lot more would be final.
Is it possible to freeze the installer from anything approaching new functionality after alpha? Consequences?
I believe viking_ice has proposed something like this before:
https://www.redhat.com/archives/anaconda-devel-list/2013-October/msg00005.ht...
On Dec 14, 2013, at 12:48 AM, Adam Williamson awilliam@redhat.com wrote:
On Sat, 2013-12-14 at 00:06 -0700, Chris Murphy wrote:
In my opinion, every one of those tests requires a feature owner. If no one volunteers, if a hand off isn't made, the functionality for the feature represented by the sanity check shall be removed from the next version of Fedora.
Do you mean someone who is responsible for development of the feature, or testing it?
Testing it.
Right now I'm simply trying to figure out a vaguely practical approach for testing what we can of the installer's storage functions. That's really all I'm shooting for.
I understand that. I'm suggesting an approach that ties functionality retention to community interest. If we can't recruit even temp "QA people" to adopt a test case, then maybe the community doesn't really value the functions indicated by those test cases.
For example the iSCSI test case. It seemed pretty much no one in QA really cared about that functionality, as they didn't depend on or use it themselves. It was just a test case to them. So where are the people who actually want that function to work?
Another example is LVM Thin P. I want to know where the feature's owners have been this whole time, and how it is they didn't test RC1 to see if their own feature, given prime real estate in the installer, was working. On that basis alone I'd say LVM Thin P should be pulled due to lack of community interest, including apparently lack of interest by the feature's own owners.
This is one possible approach, there are many others. I mean, prior to newUI, we placed a _much_ lower emphasis on custom partitioning.
Yes, well you know how I feel about manual partitioning. The test cases are necessary to qualify the function works sufficiently well for release. But to execute the test cases requires either people or some magical automation.
Chris Murphy
On Sat, 2013-12-14 at 10:56 -0700, Chris Murphy wrote:
On Dec 14, 2013, at 12:48 AM, Adam Williamson awilliam@redhat.com wrote:
On Sat, 2013-12-14 at 00:06 -0700, Chris Murphy wrote:
In my opinion, every one of those tests requires a feature owner. If no one volunteers, if a hand off isn't made, the functionality for the feature represented by the sanity check shall be removed from the next version of Fedora.
Do you mean someone who is responsible for development of the feature, or testing it?
Testing it.
Right now I'm simply trying to figure out a vaguely practical approach for testing what we can of the installer's storage functions. That's really all I'm shooting for.
I understand that. I'm suggesting an approach that ties functionality retention to community interest. If we can't recruit even temp "QA people" to adopt a test case, then maybe the community doesn't really value the functions indicated by those test cases.
Personally I don't think that approach really works out; what are we going to do, have a big database of who 'owns' what test at any given time? You have to file paperwork to keep tests in the matrix? Who's going to own the process of tracking who owns which tests?
I think tests with difficult requirements are something we have to deal with, but I don't think having 'feature owners' is the way to go, personally.
For example the iSCSI test case. It seemed pretty much no one in QA really cared about that functionality, as they didn't depend on or use it themselves. It was just a test case to them. So where are the people who actually want that function to work?
I was perfectly willing to test this one, actually, only it turns out my iSCSI target has some kind of weird issue that others don't. So I can't, practically, at the moment.
I can see how your approach kind of feels like it makes sense if you think of every little attribute of the installer as a 'feature', but that itself doesn't quite work, for me. I mean, 'install into free space' isn't really a 'feature', or at least I don't see that it gets us anywhere to think of it as one. What does it mean to be the 'feature owner' for the 'install onto an empty disk' 'feature'?
Another example is LVM Thin P. I want to know where the feature's owners have been this whole time, and how it is they didn't test RC1 to see if their own feature, given prime real estate in the installer, was working. On that basis alone I'd say LVM Thin P should be pulled due to lack of community interest, including apparently lack of interest by the feature's own owners.
I'm honestly willing to cut them some slack here, given that RC1 existed for *one whole day* before Go/No-Go.
One thing I should probably unpack explicitly here is that I'm worried that the way we've done releases the last few cycles is becoming the New Normal, especially for people who've mostly been involved with validation in the last few releases. I don't think it's a good thing at all that we've got 'accustomed' to spinning the RC we wind up releasing about 16 hours before we sign off on its release; we only really started doing that a lot around F18, and it is not at all the optimal way to do things. It is much better if we have the RC we're planning to ship built for, like, 3-4 days, to give us a chance to find issues in it that aren't immediately screamingly obvious from the validation matrices. Building the release image late on Wednesday then doing go/no-go on Thursday is absolutely not how we really _want_ to be doing things. People outside QA aren't reading the test@ list 24/7 ready to jump on a new RC, I don't think it's entirely realistic to expect every Change owner to catch the final RC in a 16-hour window and test their Change.
thinp support as a feature is actually owned by dlehman - https://fedoraproject.org/wiki/Changes/InstallerLVMThinProvisioningSupport - who I expect was taking a well-earned break during the very short window we were testing RC1.
This is one possible approach, there are many others. I mean, prior to newUI, we placed a _much_ lower
emphasis
on custom partitioning.
Yes, well you know how I feel about manual partitioning. The test cases are necessary to qualify the function works sufficiently well for release. But to execute the test cases requires either people or some magical automation.
anaconda team is currently working on implementing CI into their development process, and may come to us for help with that later in the process (to help write test cases). I think that will help quite a lot; we could probably do much better 'sanity checking' with automated tests to stress obvious corner cases like setting things to invalid values, repeatedly changing values and so on. I would expect we would also be able to cover a lot of the cases in that matrix at least in non-UI form. I think there are always likely to be bugs in the UI stuff that only become apparent when you run the full interactive installer and click on stuff, and I personally am fairly sceptical about the practicality of automated interactive testing, so I think we'll need to maintain some kind of set of manual interactive partitioning tests for the foreseeable future. But I'm definitely hoping this CI project works out as I think it will take a lot of the load outside of interactive-specific bugs off of us.
I've made a few abortive tries at re-doing the storage tests and basically given up because it's just a hideous thing to try and cover, but I thought while I'm still on a momentum roll from F20 and remember some of the issues that came up during F20 validation, I'd take another cut at it.
Here's what I came up with:
https://fedoraproject.org/wiki/User:Adamwill/Draft_storage_matrix
When I first saw this, my reaction was "Holy cow!"
After a while... I think it's doable. But... we need to take a completely different approach.
As you say, there is no way we could test this on a regular basis (every TC). But we could try a different approach, one that I have been considering for a long time. You mention testcase_stats could help if they tracked all dimensions. It's quite hard to do that and it's even harder to visualize it afterwards, but we could do the same as part of our editing process without needing any tools at all. Instead of just providing {{result|pass|kparal}} in the fields and then wiping the matrix clean on every new TC, we could input something like {{result|pass|Beta TC1 kparal}} and leave the matrix on a separate static page (no cleanups).
Once in a while it could be helpful to prune old results, let's say if I have the following the a single cell: {{result|pass|Alpha RC2 adamw}} {{result|pass|Beta TC1 kparal}} {{result|pass|Beta TC3 mkrizek}}
Now you if you want to put in yet another pass from Beta RC1, you might just wipe out the previous results (or leave the most recent one), because they add no further value. Many of your proposed test cases are so specific, that we can be fairly sure that just a single walk-through guarantees us that it works correctly (doesn't apply to all of them, of course). We could add some guidelines to the wiki page, or just prune the matrix from time to time, if it becomes bloated (however, can you imagine _that_ matrix bloated with results? I can't, only in some very specific often tested test cases. And people won't probably bother to add yet another pass in there, once there are enough results already).
With this approach, we clearly see what was tested and when was it tested for the last time. It also encourages people to test blank spaces, which is not the case for our usual matrices - without the help of testcase_stats and some serious investigation (which I suppose hardly anyone does) people just blindly pick test cases and complete them, often wasting their time on areas which were more than sufficiently tested in many previous composes. This "timestamping" approach works much better in this respect. I'm bit afraid of a clutter (some cells just growing too big), but I think it should be manageable as described above.
What do you think?
A few extra notes: 1. Let's get rid of QA:Testcase_install_to_SATA_device and QA:Testcase_install_to_PATA_device. Seriously, when was the last time we encountered a bug in either SATA or PATA driver? We need to get rid of all unnecessary cruft. People would notice very soon if it was not working, we don't need a test case for that. For the same reason we don't need a test case for monitors or keyboards or whatever, those are things automatically spotted if broken.
2. Current Go/No-Go requirements state that all QA matrices must be filled out. If we used this new "timestamping" approach combined with extremely detailed installation matrices (as you proposed), we would not be able to satisfy that. But we would be able to say that e.g. this feature worked reliable two weeks ago. Also, since there are so many combinations in the matrix (and this is just a first draft), it's very likely that there will be a lot of blank spaces. We can't simply test it all, this is just a tool how to make us more efficient and better track what was done. So, our Go/No-Go requirement will likely need to be adjusted, if we go this route.
Btw, thanks for moving all of this forward. Creating such proposals is a time-consuming and anything-but-fun work.
On Fri, 2013-12-13 at 21:04 -0800, Adam Williamson wrote:
Hi, folks. So, our current set of storage validation tests is just a grab-bag of stuff we've held over from oldui, added and patched piecemeal; it's not very coherent or consistent and it doesn't come close to exercising all of the storage stuff in the installer. We wind up sort of inventing test plans particularly for custom partitioning as we go along, with the consequence that we're not sure what we're going to test, what's really important to test when, etc etc.
I've made a few abortive tries at re-doing the storage tests and basically given up because it's just a hideous thing to try and cover, but I thought while I'm still on a momentum roll from F20 and remember some of the issues that came up during F20 validation, I'd take another cut at it.
Here's what I came up with:
https://fedoraproject.org/wiki/User:Adamwill/Draft_storage_matrix
So I think now's a good time to get back to this: we now know, I think, that for F21 we will have no filesystem type choice on the guided path. We do have two different defaults for the Workstation and Server products, though.
I need to read back over all the feedback this got when I first put it up, but for now, I looked back over the draft, and I think one small change is all it actually needs, but the implications change:
https://fedoraproject.org/wiki/User:Adamwill/Draft_storage_matrix
what I've changed is quite simple. In the Guided Installation matrix, the result columns used to be for different filesystems. Now they're for different platforms - x86 BIOS, x86 UEFI, and ARM. The fact that we don't have to worry about the filesystem choice makes this rather nicer, I think.
We would probably want to duplicate the guided matrices for Workstation and Server.
I think it might be viable right now to replace the current storage tests in https://fedoraproject.org/wiki/QA:Fedora_20_Install_Results_Template with the simpler ones from the draft (for F21) - we'd have to think about the precise layout, but I think it possibly works as a basic idea.
What does everyone think? Thanks!
On 03/13/2014 07:50 PM, Adam Williamson wrote:
On Fri, 2013-12-13 at 21:04 -0800, Adam Williamson wrote:
Hi, folks. So, our current set of storage validation tests is just a grab-bag of stuff we've held over from oldui, added and patched piecemeal; it's not very coherent or consistent and it doesn't come close to exercising all of the storage stuff in the installer. We wind up sort of inventing test plans particularly for custom partitioning as we go along, with the consequence that we're not sure what we're going to test, what's really important to test when, etc etc.
I've made a few abortive tries at re-doing the storage tests and basically given up because it's just a hideous thing to try and cover, but I thought while I'm still on a momentum roll from F20 and remember some of the issues that came up during F20 validation, I'd take another cut at it.
Here's what I came up with:
https://fedoraproject.org/wiki/User:Adamwill/Draft_storage_matrix
So I think now's a good time to get back to this: we now know, I think, that for F21 we will have no filesystem type choice on the guided path. We do have two different defaults for the Workstation and Server products, though.
I need to read back over all the feedback this got when I first put it up, but for now, I looked back over the draft, and I think one small change is all it actually needs, but the implications change:
https://fedoraproject.org/wiki/User:Adamwill/Draft_storage_matrix
what I've changed is quite simple. In the Guided Installation matrix, the result columns used to be for different filesystems. Now they're for different platforms - x86 BIOS, x86 UEFI, and ARM. The fact that we don't have to worry about the filesystem choice makes this rather nicer, I think.
We would probably want to duplicate the guided matrices for Workstation and Server.
I think it might be viable right now to replace the current storage tests in https://fedoraproject.org/wiki/QA:Fedora_20_Install_Results_Template with the simpler ones from the draft (for F21) - we'd have to think about the precise layout, but I think it possibly works as a basic idea.
What does everyone think? Thanks!
Will the non-blocking SoaS test cases be included? [1] [1]http://wiki.sugarlabs.org/go/Fedora/Sugar_test_cases
Tom Gilliard satellit on IRC
On Thu, 2014-03-13 at 23:36 -0700, Thomas Gilliard (satellit) wrote:
I think it might be viable right now to replace the current storage tests in https://fedoraproject.org/wiki/QA:Fedora_20_Install_Results_Template with the simpler ones from the draft (for F21) - we'd have to think about the precise layout, but I think it possibly works as a basic idea.
What does everyone think? Thanks!
Will the non-blocking SoaS test cases be included? [1] [1]http://wiki.sugarlabs.org/go/Fedora/Sugar_test_cases
This is just about storage testing, for now.
On Fri, 2014-03-14 at 04:47 -0400, Kamil Paral wrote:
the result columns used to be for different filesystems. Now they're for different platforms - x86 BIOS, x86 UEFI, and ARM.
This is a bit unclear for me.
Does "x86 BIOS" mean "x86 BIOS _or_ x86_64 BIOS", and "x86 UEFI" mean "x86_64 UEFI"?
I picked x86 to be "bitness-independent", the point being x86 not ARM. We could have x32 BIOS, x64 BIOS, x64 UEFI, but I was trying to keep the numbers down. Have we ever found a case where storage behaved differently between x32 and x64? I can't recall one.
On Fri, 2014-03-14 at 04:47 -0400, Kamil Paral wrote:
the result columns used to be for different filesystems. Now they're for different platforms - x86 BIOS, x86 UEFI, and ARM.
This is a bit unclear for me.
Does "x86 BIOS" mean "x86 BIOS _or_ x86_64 BIOS", and "x86 UEFI" mean "x86_64 UEFI"?
I picked x86 to be "bitness-independent", the point being x86 not ARM.
x86 is unfortunately often used to refer to 32b arch only. But I see no other term which could be used to mean x86(_32)+x86_64, so let's leave it like that.
We could have x32 BIOS, x64 BIOS, x64 UEFI, but I was trying to keep the numbers down. Have we ever found a case where storage behaved differently between x32 and x64? I can't recall one.
I can't either, and even if I did, I don't think it would justify the result number explosion. Storage is storage, arch is usually completely irrelevant.
When we're at it, why do we have both i686 and x86_64 at "Device tests"? A single results column for x86 should be enough. Same reasoning.
On Monday, March 17, 2014, 4:14:50 AM, Kamil Paral wrote:
I can't either, and even if I did, I don't think it would justify the result number explosion. Storage is storage, arch is usually completely irrelevant.
When we're at it, why do we have both i686 and x86_64 at "Device tests"? A single results column for x86 should be enough. Same reasoning.
In the past, some filesystems have had issues handling 64-bit inodes in 32-bit architectures. User data is too important to make an assumption that these no longer will occur.
On Mon, 2014-03-17 at 10:25 -0400, Al Dunsmuir wrote:
On Monday, March 17, 2014, 4:14:50 AM, Kamil Paral wrote:
I can't either, and even if I did, I don't think it would justify the result number explosion. Storage is storage, arch is usually completely irrelevant.
When we're at it, why do we have both i686 and x86_64 at "Device tests"? A single results column for x86 should be enough. Same reasoning.
In the past, some filesystems have had issues handling 64-bit inodes in 32-bit architectures. User data is too important to make an assumption that these no longer will occur.
Device tests are not filesystem tests, though.
Could you provide some references to these issues?
With either set of tests, though, I don't see that any 'user data' is involved: in each case the only partitions we're creating or touching are new ones with no user data involved. Even if one of the filesystems we create might suffer from a bug further down the line, I don't think any of these tests would catch it, would they?
On Mar 17, 2014, at 10:13 AM, Adam Williamson awilliam@redhat.com wrote:
On Mon, 2014-03-17 at 10:25 -0400, Al Dunsmuir wrote:
On Monday, March 17, 2014, 4:14:50 AM, Kamil Paral wrote:
I can't either, and even if I did, I don't think it would justify the result number explosion. Storage is storage, arch is usually completely irrelevant.
When we're at it, why do we have both i686 and x86_64 at "Device tests"? A single results column for x86 should be enough. Same reasoning.
In the past, some filesystems have had issues handling 64-bit inodes in 32-bit architectures. User data is too important to make an assumption that these no longer will occur.
Device tests are not filesystem tests, though.
Could you provide some references to these issues?
With either set of tests, though, I don't see that any 'user data' is involved: in each case the only partitions we're creating or touching are new ones with no user data involved. Even if one of the filesystems we create might suffer from a bug further down the line, I don't think any of these tests would catch it, would they?
Maybe, but I think it's a really small surface area. I've got an i686 case where mkfs.blah permits the creation of a > 16TB volume, but fails to mount it in the case of ext4 and XFS. I'm told the mount failure is correct behavior (jury is still out on whether Btrfs permitting this is a bug or not). But probably the mkfs shouldn't be allowed.
So if the installer permits 16+TB storage to be created, mkfs works, mount fails, installer blows up. Thing is, when I try to do this in the installer it sometimes permits me to create such large storage, and other times not and I haven't figured out a pattern so I gave up on anaconda 20, and need to redo this testing on 21.
And then some of the storage devs seem to think any user creating such large storage on memory limited i686 hardware are crazy; yet I'm thinking, this is made easy by cheap i686 hardware and cheap 4+TB drives. So it's crazy, but it's a trap!
So, maybe, I don't know.
Chris Murphy
On Monday, March 17, 2014, 12:13:38 PM, Adam Williamson wrote:
On Mon, 2014-03-17 at 10:25 -0400, Al Dunsmuir wrote:
On Monday, March 17, 2014, 4:14:50 AM, Kamil Paral wrote:
I can't either, and even if I did, I don't think it would justify the result number explosion. Storage is storage, arch is usually completely irrelevant.
When we're at it, why do we have both i686 and x86_64 at "Device tests"? A single results column for x86 should be enough. Same reasoning.
In the past, some filesystems have had issues handling 64-bit inodes in 32-bit architectures. User data is too important to make an assumption that these no longer will occur.
Device tests are not filesystem tests, though.
Could you provide some references to these issues?
XFS has had bugs related to this which showed up with large disks and large numbers of files.
Recent example: NFS + large XFS fs sometimes fails uncached lookups for client when inode number >2^32 on 32-bit computers https://bugzilla.redhat.com/show_bug.cgi?id=1003546
Application code had some problems. For example, Adobe code (not that we really care about closed source packages from outside Fedora).
Bug: Adobe Reader 64-bit inode problem http://forums.adobe.com/message/4721987
With either set of tests, though, I don't see that any 'user data' is involved: in each case the only partitions we're creating or touching are new ones with no user data involved. Even if one of the filesystems we create might suffer from a bug further down the line, I don't think any of these tests would catch it, would they? -- Adam Williamson
Likely not a significant risk, but it seems useful to me to explicitly identify the architecture that has been tested, so IF a problem does arise in the future that information is freely available. Al
On Mar 13, 2014, at 8:50 PM, Adam Williamson awilliam@redhat.com wrote:
What does everyone think? Thanks!
Device tests:
-PATA? They aren't made anymore, do we really need to distinguish between SATA and PATA? Is there a case where it worked on one but not the other? I'd think we'd sooner want SATA vs SAS, at least they use a different driver.
- I'm not sure how to do a one line categorization of PCI Express storage. But it seems like we ought to have one, for now since there are products. And then figure out how it all relates with SATA Express and NVMe, and whether those will need separate device tests or if they're collapsable.
-------------
Volume type tests: Guided installation Alpha QA:Testcase_partitioning_guided_empty tests blank drive without partition map, should confirm whether MBR/GPT is used in the right case
Alpha QA:Testcase_partitioning_guided_delete_all tests delete all button and ability to make a populated disk "empty"
Beta QA:Testcase_partitioning_guided_delete_partial tests delete button, makes populated-disk have "free space"
Beta QA:Testcase_partitioning_guided_encrypted_empty Does this need to be empty? Or can it be "encrypted_any"? Seems like the target could be empty, delete_all, delete_partial, or freespace. The code path is the same, in that it must create a partition, encrypt it, make the dmcrypt device a PV, add it to a VG, make LVs from that. Therefore I think the encryption outcome is tested if we do any other test also.
Beta QA:Testcase_partitioning_guided_multi_empty What is this?
Beta QA:Testcase_partitioning_guided_free_space Partially populated drive with freespace, so this is an existing partition map with at least one entry, and also sufficient free space for a Fedora installation (could be existing Linux, OS X, Windows, with free space already set aside prior to arriving in anaconda).
---------
Custom partitioning
- encrypted_empty_auto vs encrypted_empty_manual; doesn't seem to test a different code path because we don't have this inheritance of the encrypted checkbox since Installation Options dialog is vanquished.
- What we do have a meaningful difference in custom partitioning encryption, is that it's possible to encrypt a whole PV/VG, vs encrypting individual LVs. And implicitly we'd want to make sure the user can't encrypt both at the same time (a bug that I think got fixed in F20 but was present in F19). This an LVM/LVMthinp test only though. Nothing else permits double encryption.
- What is QA:Testcase_partitioning_custom_existing_precreated? Layout created elsewhere and this tests the ability of the installer to use that without making changes? Basically assigning mount points to existing? Needs a RAID column I think, if we're going to test the anaconda supported "create raid elsewhere" and use it in anaconda workflow.
- Seems like in general we need more RAID tests. I don't see a hardware raid test. Or any explicit software raid0, raid1, raid10, raid4 (a.k.a. nutcase), raid5 or raid6 tests. Should there be a separate software raid matrix section? And should the matrix show only what we want to "support/test"? Or only those we'd block on? Or all possible checkboxed options, and subjectively list some of them as "bonus" release level, rather than alpha/beta/final?
- ext2 and ext3. I think these should be bonus, or even removed from the test matrix. Yes they're visible in the installer (which I disagree with), therefore they should work, and if they don't I'd consider it a blocker bug. But how many file system choices do we need on Linux? And how many do we need to test? It's appealing to the Smörgåsbord installer mentality which I think is in the realm of OCD please go hire a shrink. Anything ext2/3 can do ext4 can do better and if it can't it's a bug.
Chris Murphy
On Monday, March 17, 2014, 2:47:56 PM, Chris Murphy wrote:
On Mar 13, 2014, at 8:50 PM, Adam Williamson awilliam@redhat.com wrote:
What does everyone think? Thanks!
Device tests:
-PATA? They aren't made anymore, do we really need to distinguish between SATA and PATA? Is there a case where it worked on one but not the other? I'd think we'd sooner want SATA vs SAS, at least they use a different driver.
On ARM the majority will be MMC, with some SATA that may involve USB. On PPC (secondary arch) SCSI is still quite important.
On Mar 17, 2014, at 1:12 PM, Al Dunsmuir al.dunsmuir@sympatico.ca wrote:
On Monday, March 17, 2014, 2:47:56 PM, Chris Murphy wrote:
On Mar 13, 2014, at 8:50 PM, Adam Williamson awilliam@redhat.com wrote:
What does everyone think? Thanks!
Device tests:
-PATA? They aren't made anymore, do we really need to distinguish between SATA and PATA? Is there a case where it worked on one but not the other? I'd think we'd sooner want SATA vs SAS, at least they use a different driver.
On ARM the majority will be MMC, with some SATA that may involve USB. On PPC (secondary arch) SCSI is still quite important.
Right so does it make sense to have these as separate categories:
SATA/PATA SCSI/SAS MMC/SDCard USB PCIe SSD (which may morph into NVMe)
?
I guess I'm not seeing the point of separate PATA/SATA device tests is my main point, and second is that there are other devices not in the list that maybe ought to be in the list.
Chris Murphy
On Mon, 2014-03-17 at 12:47 -0600, Chris Murphy wrote:
On Mar 13, 2014, at 8:50 PM, Adam Williamson awilliam@redhat.com wrote:
What does everyone think? Thanks!
Device tests:
-PATA? They aren't made anymore, do we really need to distinguish between SATA and PATA? Is there a case where it worked on one but not the other?
Not that I can recall, at least in the last few releases.
I'd think we'd sooner want SATA vs SAS, at least they use a different driver.
Yeah, it may be time to ditch this one.
- I'm not sure how to do a one line categorization of PCI Express
storage. But it seems like we ought to have one, for now since there are products. And then figure out how it all relates with SATA Express and NVMe, and whether those will need separate device tests or if they're collapsable.
Patches welcome ;)
We also have SDHCI storage to consider these days, I guess, and not just on ARM: https://bugzilla.redhat.com/show_bug.cgi?id=1063556
Volume type tests: Guided installation Alpha QA:Testcase_partitioning_guided_empty tests blank drive without partition map, should confirm whether MBR/GPT is used in the right case
Yup.
Alpha QA:Testcase_partitioning_guided_delete_all tests delete all button and ability to make a populated disk "empty"
Yup.
Beta QA:Testcase_partitioning_guided_delete_partial tests delete button, makes populated-disk have "free space"
Yup (and ensure the rest of the drive isn't touched).
Beta QA:Testcase_partitioning_guided_encrypted_empty Does this need to be empty? Or can it be "encrypted_any"? Seems like the target could be empty, delete_all, delete_partial, or freespace. The code path is the same, in that it must create a partition, encrypt it, make the dmcrypt device a PV, add it to a VG, make LVs from that. Therefore I think the encryption outcome is tested if we do any other test also.
Yeah, that's probably right.
Beta QA:Testcase_partitioning_guided_multi_empty What is this?
More than one disk. (We *could* have multiple tests covering various scenarios here, but I was trying to keep things relatively compact.)
Beta QA:Testcase_partitioning_guided_free_space Partially populated drive with freespace, so this is an existing partition map with at least one entry, and also sufficient free space for a Fedora installation (could be existing Linux, OS X, Windows, with free space already set aside prior to arriving in anaconda).
Yup.
Custom partitioning
- encrypted_empty_auto vs encrypted_empty_manual; doesn't seem to test
a different code path because we don't have this inheritance of the encrypted checkbox since Installation Options dialog is vanquished.
Er, you're probably right. I'd have to go back and check the context.
- What we do have a meaningful difference in custom partitioning
encryption, is that it's possible to encrypt a whole PV/VG, vs encrypting individual LVs. And implicitly we'd want to make sure the user can't encrypt both at the same time (a bug that I think got fixed in F20 but was present in F19). This an LVM/LVMthinp test only though. Nothing else permits double encryption.
Good point, I'll see what can be tweaked there.
- What is QA:Testcase_partitioning_custom_existing_precreated? Layout
created elsewhere and this tests the ability of the installer to use that without making changes? Basically assigning mount points to existing?
Yeah, I think that's what I was thinking of.
Needs a RAID column I think, if we're going to test the anaconda supported "create raid elsewhere" and use it in anaconda workflow.
Thanks.
- Seems like in general we need more RAID tests. I don't see a
hardware raid test.
I can't recall whether I dropped this intentionally or inadvertently, I'll try and check. But, of course, HW raid and BIOS RAID are really rather different cases from software RAID.
Or any explicit software raid0, raid1, raid10, raid4 (a.k.a. nutcase), raid5 or raid6 tests. Should there be a separate software raid matrix section? And should the matrix show only what we want to "support/test"? Or only those we'd block on? Or all possible checkboxed options, and subjectively list some of them as "bonus" release level, rather than alpha/beta/final?
We certainly need to cover SW RAID in the custom testing, you're right, it's an obvious miss. Not sure of the best way to approach it offhand. If you'd like to draft something up that'd be great, or else I'll try and do it.
- ext2 and ext3. I think these should be bonus, or even removed from
the test matrix.
Note the current draft does not explicitly specify a blocker/bonus separation, but the position we've been working towards is going back to guided being blocker and custom being bonus. But I think we probably realistically want a three-tier separation - Guided is the most important, then a small subset of Custom we really care about, then the rest of custom as very much optional-extra. But that's just not in the current draft at all; I'd agree that if we go for the three-way split ext2/ext3 would definitely be in the 'optional-extra' section.
On Mar 17, 2014, at 1:33 PM, Adam Williamson awilliam@redhat.com wrote:
Beta QA:Testcase_partitioning_guided_multi_empty What is this?
More than one disk. (We *could* have multiple tests covering various scenarios here, but I was trying to keep things relatively compact.)
Patch. Testcase_partitioning_guided_multi_empty > Testcase_partitioning_guided_multidev_empty
Basically, pick two devices and see if anything blows up during install or first boot.
- What is QA:Testcase_partitioning_custom_existing_precreated? Layout
created elsewhere and this tests the ability of the installer to use that without making changes? Basically assigning mount points to existing?
Yeah, I think that's what I was thinking of.
Needs a RAID column I think, if we're going to test the anaconda supported "create raid elsewhere" and use it in anaconda workflow.
Thanks.
We can make that a bonus column *shrug* it's not obviously supported in the installer but the anaconda team has said it's supposed to work.
- Seems like in general we need more RAID tests. I don't see a
hardware raid test.
I can't recall whether I dropped this intentionally or inadvertently, I'll try and check. But, of course, HW raid and BIOS RAID are really rather different cases from software RAID.
Mmm, well I'm not sure what the failure vectors are for HW and BIOS RAID. The hwraid case should just look and behave like an ordinary single device. The firmware RAID case starts out the same way at boot time, but then becomes a variation of software raid, as it's implemented by mdadm, the only difference being on-disk metadata format.
Looks like in Rawhide's installer Firmware RAID is listed in specialized disks, which is different than hardware raid I think.
Anyway, I see why they're tested separately.
Or any explicit software raid0, raid1, raid10, raid4 (a.k.a. nutcase), raid5 or raid6 tests. Should there be a separate software raid matrix section? And should the matrix show only what we want to "support/test"? Or only those we'd block on? Or all possible checkboxed options, and subjectively list some of them as "bonus" release level, rather than alpha/beta/final?
We certainly need to cover SW RAID in the custom testing, you're right, it's an obvious miss. Not sure of the best way to approach it offhand. If you'd like to draft something up that'd be great, or else I'll try and do it.
I think any raid layout is a small population of the user base. But I also think there's broad benefit to resiliently bootable raid1, so it makes sense for us to care about /boot, rootfs, and /home on raid1, and hopefully refine it so that one day it'll work better on UEFI than it does now. And then expand scope as resources permit.
Everything else I think is totally esoteric. Ideology wise, I think since it's offered in the installer it ought to work. But I also don't want to test esoteric stuff when basic, broadly useful stuff, needs attention. I think the iSCSI/SAN stuff is way more useful than enabling installation time creation of or install to software raid other than level 1.
Anyone know if we can boot off a glusterfs volume? Random question…
I'm not going to be much use for anything but occasional emails, taking pot shots, etc. for the next 3-4 weeks: crashed into a tree while skiing Friday, have a week to prepare 3 presentations for Libre Graphics Meeting, a wedding, travel to/fr Germany for LGM. Plus recovery. And mostly I'm emailing now because I'm procrastinating.
Chris Murphy
On Mon, 2014-03-17 at 14:58 -0600, Chris Murphy wrote:
I can't recall whether I dropped this intentionally or inadvertently, I'll try and check. But, of course, HW raid and BIOS RAID are really rather different cases from software RAID.
Mmm, well I'm not sure what the failure vectors are for HW and BIOS RAID. The hwraid case should just look and behave like an ordinary single device.
Yeah. The possible failure cases here, really, are 'the driver's bust' and 'the driver got left out of the initramfs', I think.
I've put it back in for now, but we could probably live without it. It's kind of a PITA to test. (Though it was much *more* of a PITA before I figured out one of the SATA cables hooked up to my HW RAID controller was busted.)
The firmware RAID case starts out the same way at boot time, but then becomes a variation of software raid, as it's implemented by mdadm, the only difference being on-disk metadata format.
Yes. This one can break in quite a few ways, and frequently does. It's an important one to test.
Actually not all firmware RAID is implemented by mdadm; only Intel fwraid. Other forms of fwraid are implemented by dmraid (still). They're less common than they once were, but still around.
Looks like in Rawhide's installer Firmware RAID is listed in specialized disks, which is different than hardware raid I think.
IIRC, it shows up there but it usually *also* shows up as a 'regular' disk too, but I'd have to check again. I test it every cycle and then promptly forget the details.
Anyway, I see why they're tested separately.
Yeah, completely different cases. I don't think hardware RAID has actually seen a failure since I joined RH, but it's at least possible that it could - though really just having a single 'hardware RAID' checkbox on the installation validation test matrix isn't a very sensible approach to testing it, it's a bit like having a 'Graphics Card' line in the same matrix. One graphics card works? OK, I guess we're good! :)
We certainly need to cover SW RAID in the custom testing, you're right, it's an obvious miss. Not sure of the best way to approach it offhand. If you'd like to draft something up that'd be great, or else I'll try and do it.
I think any raid layout is a small population of the user base. But I also think there's broad benefit to resiliently bootable raid1, so it makes sense for us to care about /boot, rootfs, and /home on raid1, and hopefully refine it so that one day it'll work better on UEFI than it does now. And then expand scope as resources permit.
Yeah, straightforward two-disk RAID-0 and RAID-1 are probably the most obvious places to start.
On Mon, 2014-03-17 at 12:33 -0700, Adam Williamson wrote:
Custom partitioning
- encrypted_empty_auto vs encrypted_empty_manual; doesn't seem to test
a different code path because we don't have this inheritance of the encrypted checkbox since Installation Options dialog is vanquished.
Er, you're probably right. I'd have to go back and check the context.
Well, I went back and looked at this. IO is gone, but there's still an "Encrypt my partitions" checkbox on the "disk selection" screen (first screen of the spoke, before you get to the custom part interface) - it's been moved there from IO. So there is a path to cover here: select 'encrypt my partitions' on the disk selection screen, then use 'create them automatically' on custom partitioning, and check the result is encrypted. The current set of custom part tests looks like a bit of a grab bag, really, but I had to start somewhere...
On Thu, 2014-03-13 at 19:50 -0700, Adam Williamson wrote:
I need to read back over all the feedback this got when I first put it up, but for now, I looked back over the draft, and I think one small change is all it actually needs, but the implications change:
https://fedoraproject.org/wiki/User:Adamwill/Draft_storage_matrix
What does everyone think? Thanks!
Thanks for all the feedback to this post! I've gone through the feedback so far, made a few tweaks, and done some work to clean up the 'device tests' - I created some templates for those test cases and edited the test cases to use the templates where possible, and be as close as possible where they have to diverge. I created the 'virtio device' test, too: https://fedoraproject.org/wiki/QA:Testcase_install_to_virtio_device .
Still got more work to do on this - haven't taken care of RAID yet - just reporting progress. Help welcome!
On Wed, 2014-03-26 at 16:35 -0700, Adam Williamson wrote:
On Thu, 2014-03-13 at 19:50 -0700, Adam Williamson wrote:
I need to read back over all the feedback this got when I first put it up, but for now, I looked back over the draft, and I think one small change is all it actually needs, but the implications change:
https://fedoraproject.org/wiki/User:Adamwill/Draft_storage_matrix
What does everyone think? Thanks!
Thanks for all the feedback to this post! I've gone through the feedback so far, made a few tweaks, and done some work to clean up the 'device tests' - I created some templates for those test cases and edited the test cases to use the templates where possible, and be as close as possible where they have to diverge. I created the 'virtio device' test, too: https://fedoraproject.org/wiki/QA:Testcase_install_to_virtio_device .
Still got more work to do on this - haven't taken care of RAID yet - just reporting progress. Help welcome!
I have now created the 'guided' test cases, using a set of templates in order to try and retain consistency. I've converted the matching existing test cases into redirects to the new-style ones. I also did a bit of rearranging of the new matrices based on things that came up while writing the test cases.
At this point I think we're pretty close to covering everything that needs to be covered at least for the 'pre-Alpha' testing we're thinking of doing, and Alpha itself. I'll probably draft up a modified validation matrix with this new-style storage stuff included somehow or other tomorrow, and we can see how that looks.
Thanks, folks!