agenda for todays QA meeting

List overview All Threads
Download

newer

older

Fedora 32 updates-testing report

As a developer I want to easily...

pmkellly＠frontier.com

20 Jul 2020 20 Jul '20

3:23 p.m.

Let's talk about zram.

I just got around to figuring out what zram is. I see it's automatically set up when Anaconda sets up btrfs.

First a bit of background all the PCs (Fedora WS) I maintain are bought with lots of ram 8GB is the min. I don't think I've ever seen swap move off zero and no one has reported any sort of slowdowns. Yet we do lots of memory intensive things. Of course all this is with ext4.

Now I see on my test machine (btrfs) that about 24% (2GB) of the 8GB of memory (according to monitor) (4GB according to disks) is dedicated to zram0. I imagine the difference in size reported to due to the compression used by zram. However there are two issues:

First I can't think of any good reason why I should need or want to have any of my ram dedicated to swapping or other things that speed up btrfs. We've been very happy with swap being on disk since it's never been seen to move off zero.

Second, a few years back, I looked over some of the compression algorithms. The probability of loss or corruption was low, but non-zero. I'm sure they have improved, but I'll bet those probabilities are still non-zero. I was taught back in my early years at school that unnecessary risk is foolish risk. So I've always avoided using compression where ever I had an option.

If this is necessary to make btrfs work or get reasonable performance from it. That is a strike against btrfs.

From this I conclude that to run btrfs on these machines and get the performance we're used to I have to up the minimum ram load to 12GB. Ram modules have a non-zero cost. Given the above this cost must be assigned to btrfs and are born by Fedora users.

I'm hoping someone can tell me I'm wrong and why, or that there's a way to opt-out of zram without a hit to performance when using btrfs.

Have a Great Day!

Pat (tablepc)

Show replies by date

Samuel Sieb

20 Jul 20 Jul

6:31 p.m.

On 7/20/20 7:23 AM, pmkellly@frontier.com wrote:

...

I just got around to figuring out what zram is. I see it's automatically set up when Anaconda sets up btrfs.

It shouldn't have anything to do with btrfs. It's one of the changes to have it turned on by default. Look in the devel list archives for some huge threads on this.

...

First a bit of background all the PCs (Fedora WS) I maintain are bought with lots of ram 8GB is the min. I don't think I've ever seen swap move off zero and no one has reported any sort of slowdowns. Yet we do lots of memory intensive things. Of course all this is with ext4.

That is surprising. Usually at least something ends up in the swap.

...

Now I see on my test machine (btrfs) that about 24% (2GB) of the 8GB of memory (according to monitor) (4GB according to disks) is dedicated to zram0. I imagine the difference in size reported to due to the compression used by zram. However there are two issues:

Careful what you're using to measure with. The only accurate measurement is "zramctl". That will tell you exactly how much RAM the zram device is using.

...

First I can't think of any good reason why I should need or want to have any of my ram dedicated to swapping or other things that speed up btrfs. We've been very happy with swap being on disk since it's never been seen to move off zero.

You really need to read more about zram. There is no dedicated space for it. It only uses RAM as swap is needed. And it uses less RAM than the pages that are getting swapped out, so there's a net reduction in RAM usage without hitting the disk.

...

Second, a few years back, I looked over some of the compression algorithms. The probability of loss or corruption was low, but non-zero. I'm sure they have improved, but I'll bet those probabilities are still non-zero. I was taught back in my early years at school that unnecessary risk is foolish risk. So I've always avoided using compression where ever I had an option.

These compression algorithms have been tested very hard and are used everywhere. I've never seen mention of any corruption issues.

...

I'm hoping someone can tell me I'm wrong and why, or that there's a way to opt-out of zram without a hit to performance when using btrfs.

If you really want to disable zram (there's no reason to), I believe the simplest method is "touch /etc/systemd/zram-generator.conf". And it's nothing to do with btrfs.

pmkellly＠frontier.com

21 Jul 21 Jul

2:35 p.m.

On 7/20/20 13:31, Samuel Sieb wrote:

...

...
First a bit of background all the PCs (Fedora WS) I maintain are bought with lots of ram 8GB is the min. I don't think I've ever seen swap move off zero and no one has reported any sort of slowdowns. Yet we do lots of memory intensive things. Of course all this is with ext4.

That is surprising. Usually at least something ends up in the swap.

I'm a convert from that proprietary "we know what's best for you" OS. One of the major things that annoyed me about it was that it horded all the RAM and Swapped everything. One day while talking with a colleague he suggested that I should switch to Linux and he recommended Fedora. I got a copy of Fedora 16 WS and loaded it. That fact that swapping was non-existent and Fedora didn't hord the RAM was one of the things that helped me decide to switch everything here to Fedora.

On these machines Disks says that swap lives at /dev/fedora_localhost-live/swap00 and showns no usage. Monitor shows 0 bytes used for swap. I've never seen these move off zero. I haven't changed any of the disk parameters. They are all at their default values as installed.

...

Careful what you're using to measure with. The only accurate measurement is "zramctl". That will tell you exactly how much RAM > the zram device is using.

Thanks

...

You really need to read more about zram. There is no dedicated space for it. It only uses RAM as swap is needed. And it uses less RAM than the pages that are getting swapped out, so there's a net reduction in RAM usage without hitting the disk.

cmurf explained that to me yesterday, but thanks.

...

These compression algorithms have been tested very hard and are used everywhere. I've never seen mention of any corruption issues.

I must say that all of my compression experience has been with the algorithms used to compress images. I won't bore you with the details but we wrote software to build various image files with certain characteristics in pristine form. We did not use the standard test images that are sometimes used. The test files we used were structured to see how good a job the algorithms could do preserving data. Then we saved and opened them using the various standardized algorithms used for the associated file types and analyzed the results. The results were not impressive. We concluded that the results were fine for images. If some pixel values change, the average user will not notice it; so it's not critical. However there are many other kinds of data where such changes would be critical. Now I know the algorithms used for images are different from those used for general file compression on disk, but still, I try to minimize risk. Since I'm never short of disk space I prefer not to use compression. I was very excited and pleased when I found out that btrfs check-sums files. However now I understand that it is a patch to make up for the compression. It seems like a zero sum gain to me.

...

If you really want to disable zram (there's no reason to), I believe the simplest method is "touch /etc/systemd/zram-generator.conf". And it's nothing to do with btrfs.

Yes. I understand now that zram is not part of btrfs. It's just that it showed up with btrfs and was yet another thing that raised lot of questions. Now that I understand how zram works, I won't be shutting it off. I do want to have a swap space, and though it doesn't seem to get used, I want it just in case...

Thanks and Have a Great Day!

Pat (tablepc)

Chris Murphy

6:11 p.m.

On Tue, Jul 21, 2020 at 7:36 AM pmkellly@frontier.com pmkellly@frontier.com wrote:

...

I must say that all of my compression experience has been with the algorithms used to compress images. I won't bore you with the details but we wrote software to build various image files with certain characteristics in pristine form. We did not use the standard test images that are sometimes used. The test files we used were structured to see how good a job the algorithms could do preserving data. Then we saved and opened them using the various standardized algorithms used for the associated file types and analyzed the results. The results were not impressive. We concluded that the results were fine for images. If some pixel values change, the average user will not notice it; so it's not critical. However there are many other kinds of data where such changes would be critical. Now I know the algorithms used for images are different from those used for general file compression on disk, but still, I try to minimize risk.

Yeah, lossy algorithms are common in imaging. There are many kinds that unquestionably do not produce identical encoding to the original once decompressed. The algorithms being used by Btrfs are all lossless compression, and in fact those are also commonly used in imaging: LZO and ZLIB (ZIP, i.e. deflate) - and in that case you can compress and decompress images unlimited times and always get back identical RGB encodings to the original. Short of memory or other hardware error.

...

Since I'm never short of disk space I prefer not to use compression. I was very excited and pleased when I found out that btrfs check-sums files. However now I understand that it is a patch to make up for the compression. It seems like a zero sum gain to me.

I'm not sure what you mean.

Btrfs has always had checksumming from day 0. It was integral to the design, before the compression algorithms landed. It is to make up for the fact hardware sometimes lies or gets confused, anywhere in the storage stack. The default for metadata (the fs itself) and data (file contents) is crc32c, it is possible to disable it for data but not possible to disable it for metadata. Compression only ever applies to data. It's not applied to metadata. Checksumming has intrinsic value regardless of compression.

-- Chris Murphy

pmkellly＠frontier.com

10:36 p.m.

On 7/21/20 13:11, Chris Murphy wrote:

...

Yeah, lossy algorithms are common in imaging. There are many kinds that unquestionably do not produce identical encoding to the original once decompressed. The algorithms being used by Btrfs are all lossless compression, and in fact those are also commonly used in imaging: LZO and ZLIB (ZIP, i.e. deflate) - and in that case you can compress and decompress images unlimited times and always get back identical RGB encodings to the original. Short of memory or other hardware error.

At the risk of sounding skeptical, I've heard that word "lossless" applied to lots of algorithms and devices that I didn't think was an appropriate usage. As an approximate example, when we were doing that testing we were hoping to find something in the neighborhood of 10-6 probability of a single byte error in a certain structure / size file when exercised a certain number of times. Sorry for being so vague. Is there any statistical data on these algorithms that is publicly available? The only ones I've ever seen (not a large population since I've been a compression avoid-er) that approach lossless don't compress much and only take out strings of the same byte value.

...

...
Since I'm never short of disk space I prefer not to use compression. I was very excited and pleased when I found out that btrfs check-sums files. However now I understand that it is a patch to make up for the compression. It seems like a zero sum gain to me.

I'm not sure what you mean.

Btrfs has always had checksumming from day 0. It was integral to the design, before the compression algorithms landed. It is to make up for the fact hardware sometimes lies or gets confused, anywhere in the storage stack. The default for metadata (the fs itself) and data (file contents) is crc32c, it is possible to disable it for data but not possible to disable it for metadata. Compression only ever applies to data. It's not applied to metadata. Checksumming has intrinsic value regardless of compression.

Sorry, I have no knowledge of the history of btrfs; so please forgive me when I say or ask silly things.

I know about check-summing and use it manually on files that are important. Ya I know about the hardware too. I'm an electrical engineer. If reliability really matters for a design one of the first things I look for when considering any new chip is to see if the manufacturer has any credible reliability data.

The problem here is that anything to do with PCs or servers is largely driven by cost and there always has to be a new, better, more exciting model tomorrow. That environment produces very little in the way of parts with long histories with good proven reliability data. That's why I was originally so happy about check-summing being automatic with btrfs.

What's considered the meta data. Path to file, file name, file header, file footer, data layout?

Oh I just noticed crc32c. That's acceptable.

Sorry for going on so much.

Thanks and Have a Great Day!

Pat (tablepc)

Chris Murphy

11:06 p.m.

On Tue, Jul 21, 2020 at 3:36 PM pmkellly@frontier.com pmkellly@frontier.com wrote:

...

On 7/21/20 13:11, Chris Murphy wrote:

...
Yeah, lossy algorithms are common in imaging. There are many kinds that unquestionably do not produce identical encoding to the original once decompressed. The algorithms being used by Btrfs are all lossless compression, and in fact those are also commonly used in imaging: LZO and ZLIB (ZIP, i.e. deflate) - and in that case you can compress and decompress images unlimited times and always get back identical RGB encodings to the original. Short of memory or other hardware error.

At the risk of sounding skeptical, I've heard that word "lossless" applied to lots of algorithms and devices that I didn't think was an appropriate usage. As an approximate example, when we were doing that testing we were hoping to find something in the neighborhood of 10-6 probability of a single byte error in a certain structure / size file when exercised a certain number of times. Sorry for being so vague. Is there any statistical data on these algorithms that is publicly available? The only ones I've ever seen (not a large population since I've been a compression avoid-er) that approach lossless don't compress much and only take out strings of the same byte value.

A very simple example is run length encoding. https://en.wikipedia.org/wiki/Run-length_encoding

That is variable depending on the source, but quite a lot of human produced material has a metric F ton of zeros in it, so it turns out we get a lot of compressibility. This is used by the current zram default algorithm, as well as lzo which handles the more complex data. This is typically a 3 to 1 upwards of 4 to 1 compression ratio in my testing, with a conservative 2 to 1 stated in the proposal.

zstd is substantially more complex than lzo, or zlib, and produces similar compression ratios to zlib but at a fraction of the CPU requirement. You can compress and decompress things all day long for weeks and months and years and 100% of the time get back identical data bit for bit. That's the point of them. I can't really explain the math but zstd is free open source software, so it is possible to inspect it.

https://github.com/facebook/zstd

JPEG compression, on the other hand, is intentionally lossy. It is a guarantee that you do not get back original data. It can still be used in high end imaging (and it is) but this is predicated on reducing the number of times the image goes through JPEG compression - or else you end up with obvious artifacts that degrade the image. All of the lossy algorithms involve, in a sense, a kind of data loss. That's the point of them, is that there's so much extraneous information in imaging that quite a lot of it can just be tossed. But this also assumes the final destination is some kind of shitty output: displays, printers, printing presses. That sort of thing. So the loss isn't actually realized, if you do it correctly anyway. Trouble is, quite a lot of people do take JPEG, modify them, and then JPEG them again. Which is known as "doing it wrong" - you need to go back to the original image to make that modification, and then JPEG it. If you don't have the original well then you're making other bad choices :) Or maybe someone else is.

I'm off hand not aware of any lossy compression algorithms that claim to be lossless. The original JPEG is lossy. JPEG2000 has both lossy and lossless variants, but while it's produced by the same organization, the encoding is entirely different. Anyway, short of hardware defects, you can compress and decompress data or images using lossless compression billions of times until the heat death of the universe and get identical bits out. It's the same as 2+2=4 and 4=2+2. Same exact information on both sides of the encoding. Anything else is a hardware error, sunspots, cosmic rays, someone made a mistake in testing, etc.

...

Sorry, I have no knowledge of the history of btrfs; so please forgive me when I say or ask silly things.

I don't think asking questions is silly or a problem at all. It's the jumping to conclusions that gave me the frowny face. :-)

...

What's considered the meta data. Path to file, file name, file header, file footer, data layout?

In a file system context, the metadata is the file system itself. The data is the "payload" of the file, the file contents, the stuff you actually care about. I mean, you might also care about some of the metadata: file name, creation/modification date, but that's probably incidental to the data. The metadata includes the size of the data, whether or not it's compressed, its checksum, the inode, owner, group, posix permissions, selinux label, etc.

...

Oh I just noticed crc32c. That's acceptable.

This is the default. It's acceptable for detecting incidental sources of corruption. Since kernel 5.5 there's also xxhash64, which is about as fast as crc32c, sometimes faster on some hardware. And for cryptographic hashing Btrfs offers blake2b (SHA3 runner up) and sha256. These are mkfs time options.

-- Chris Murphy

pmkellly＠frontier.com

22 Jul 22 Jul

3:54 p.m.

On 7/21/20 18:06, Chris Murphy wrote:

...

On Tue, Jul 21, 2020 at 3:36 PM pmkellly@frontier.com pmkellly@frontier.com wrote:

...
The only ones I've ever seen (not a large population since I've been a compression avoid-er) that approach lossless don't compress much and only take out strings of the same byte value.

A very simple example is run length encoding. https://en.wikipedia.org/wiki/Run-length_encoding

That's what I meant by "take out strings of the same byte value". I had just forgotten the name.

...

That is variable depending on the source, but quite a lot of human produced material has a metric F ton of zeros in it, so it turns out we get a lot of compressibility. This is used by the current zram default algorithm, as well as lzo which handles the more complex data. This is typically a 3 to 1 upwards of 4 to 1 compression ratio in my testing, with a conservative 2 to 1 stated in the proposal.

Wow... I would have never guessed that the ratios had gotten so high. I think I know why. Last holiday time I gave a show and tell to a group and part of it was titled File Bloat. I started with a simple text file that had a size of 475 bytes. Then I showed them the very same text saved as various other file types with very simple formatting and at the worst case (.odt) the file was 22KB with the exact same text. I had showed them printed and on screen examples of each and they were amazed at how little they got at the expense of their storage space. It didn't occur to me at the time, but I'm guessing that file bloat is a major source of the compression ratios you mentioned above.

...

zstd is substantially more complex than lzo, or zlib, and produces similar compression ratios to zlib but at a fraction of the CPU requirement. You can compress and decompress things all day long for weeks and months and years and 100% of the time get back identical data bit for bit. That's the point of them. I can't really explain the math but zstd is free open source software, so it is possible to inspect it.

https://github.com/facebook/zstd

I've never written a compression algorithm; so I don't know anything about their innards, but I think I'll take a peak.

...

JPEG compression, on the other hand, is intentionally lossy.

I'll try to restrain myself from saying much about that particular blight on the land. I've talked to lots of folks about it; even walked some through it with examples. Most just don't care. It's doesn't seem to matter how bad the picture gets as long as they can recognize at all what the picture is of it's okay. The main concern seems to be being able to save LOTS of pictures.

I am happy that my still camera has the ability to save Pictures as RAW or uncompressed TIF(F). I thought it was a sad day when they added compression to TIF.

...

I'm off hand not aware of any lossy compression algorithms that claim

Back when I was working on those compression analyses there were some very hot debates going on about lossless claims. I don't recall which ones or if they went to court, but I do recall reading some articles about it. As I recall it was in the late '80s.

...

Anyway, short of hardware defects, you can compress and decompress data or images using lossless compression billions of times until the heat death of the universe and get identical bits out. It's the same as 2+2=4 and 4=2+2. Same exact information on both sides of the encoding. Anything else is a hardware error, sunspots, cosmic rays, someone made a mistake in testing, etc.

I really like your description. I see now that if the compression is just removing same value byte strings how it really can be truly lossless. As someone who has had to deal with it I'll say that the extra intense radiation from sunspots really does matter. and cosmic rays are ignored only at peril.

...

I don't think asking questions is silly or a problem at all. It's the jumping to conclusions that gave me the frowny face. :-)

I try not to, but I have a lot of "buy in" to Fedora. When I read that there's a change coming and it's on a topic I've had some bad experience with... I apologize for jumping.

...

...
What's considered the meta data. Path to file, file name, file header, file footer, data layout?

In a file system context, the metadata is the file system itself. The data is the "payload" of the file, the file contents, the stuff you actually care about. I mean, you might also care about some of the metadata: file name, creation/modification date, but that's probably incidental to the data. The metadata includes the size of the data, whether or not it's compressed, its checksum, the inode, owner, group, posix permissions, selinux label, etc.

Thanks I appreciate the help. Though I've written lots of software. this is my first time being involved with the innards of an OS. In my prior experience their either wasn't an OS or even an IDE, just my hex code, or I just got to take the OS for granted and wrote my code in a IDE. Some years back I worked on an AI project for a few years. We had lots of discussions about data and metadata. Like what the terms should mean and what items belong in each category. One sort of profound conclusion we reached is AI won't happen (in the StarTrek sense) until object oriented programing really is. And to get real object oriented we must give up the Von Neumann model for computers. One of the main things that means is that we must stop addressing things by where they are and start addressing them by what they are. I think I read once that there was a prototype built in hardware someplace and they were just getting started with the testing of the hardware. Probably abandon by now. First research machines are always very expensive and no one ever wants to invest for the long term.

...

...
Oh I just noticed crc32c. That's acceptable.

This is the default. It's acceptable for detecting incidental sources of corruption. Since kernel 5.5 there's also xxhash64, which is about as fast as crc32c, sometimes faster on some hardware. And for cryptographic hashing Btrfs offers blake2b (SHA3 runner up) and sha256. These are mkfs time options.

My experience with crc32 was in a hardware implementation. We needed good data integrity and the memory controller chip we chose had crc32 built in. I forget how many bits we added to each word to save the check bits, but I think it was four. I can imagine the uproarious laughter that would result if someone at a gathering of PC folks suggested that new memory modules should include extra bits to support hardware crc.

Have a Great Day!

Pat (tablepc)

Samuel Sieb

5:41 p.m.

On 7/22/20 7:54 AM, pmkellly@frontier.com wrote:

...

I really like your description. I see now that if the compression is just removing same value byte strings how it really can be truly lossless. As someone who has had to deal with it I'll say that the extra intense radiation from sunspots really does matter. and cosmic rays are ignored only at peril.

Most compression algorithms are far more than "removing same value byte strings". Check out "huffman encoding" for example. I've never had a lossless compression program corrupt my data. Given your extreme mistrust of compression algorithms, it seems that you don't realize how much compressed data you deal with on a regular basis. tar.gz, gzip, zip, rpm, initramfs, kernel, web pages, etc.

...

My experience with crc32 was in a hardware implementation. We needed good data integrity and the memory controller chip we chose had crc32 built in. I forget how many bits we added to each word to save the check bits, but I think it was four. I can imagine the uproarious laughter that would result if someone at a gathering of PC folks suggested that new memory modules should include extra bits to support hardware crc.

You mean like ECC RAM?

pmkellly＠frontier.com

7:38 p.m.

On 7/22/20 12:41, Samuel Sieb wrote:

...

Most compression algorithms are far more than "removing same value byte strings". Check out "huffman encoding" for example. I've never had a lossless compression program corrupt my data. Given your extreme mistrust of compression algorithms, it seems that you don't realize how much compressed data you deal with on a regular basis. tar.gz, gzip, zip, rpm, initramfs, kernel, web pages, etc.

I know there are a variety of compression algorithms. I know they are used in many places and I have seen that there are many algorithms used in applications that can be trusted or changes tolerated at very low cost. My experience with them has led me to trust them in those applications. I also know from experience that their are applications of compression where the trustworthiness needs to be at least questioned. If this is extreme, well, all I can say is I don't mean to be extreme.

My concern with btrfs Was that it uses compression my data. It could cause a lot of work if bits started changing in our files. My concern with btrfs has been addressed and I am now fine with it. Though I certainly know there are other problems that can cause bits to change. I just try to minimize risk where I can. I was trained early on in school to not take things for granted. I try to be data driven. Since I don't have time to become an expert in compression and have some bad history with it I became concerned with this general application, but that's over now.

...

...
My experience with crc32 was in a hardware implementation. We needed good data integrity and the memory controller chip we chose had crc32 built in. I forget how many bits we added to each word to save the check bits, but I think it was four. I can imagine the uproarious laughter that would result if someone at a gathering of PC folks suggested that new memory modules should include extra bits to support hardware crc.

You mean like ECC RAM?

The hardware I mentioned was way back when PC were quite primitive let alone any ECC memory modules being available for them. The application had nothing to do with a PC. There was a lot going on with computing before PCs became popular. The only programmers were EEs that designed microprocessors into their boards. The mainframe folks just laughed at us and there weren't any folks graduating from school prepared to program micro's. They all wanted to program mainframes.

I know ECC RAM is widely used in servers. but I've never owned or seen a desktop that had ECC capability let alone included it as the default configuration. To be fair though, my users and I use older rehabbed Lenovo machines because they are cheap and they work very well for us.

Have a Great Day!

Pat (tablepc)

...

test mailing list -- test@lists.fedoraproject.org To unsubscribe send an email to test-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/test@lists.fedoraproject.org

Chris Murphy

10:23 p.m.

On Wed, Jul 22, 2020 at 10:42 AM Samuel Sieb samuel@sieb.net wrote:

...

You mean like ECC RAM?

https://bugzilla.redhat.com/show_bug.cgi?id=1857996

Reads like an alien autopsy. (Gory details, and nothing looks familiar.) But you'll totally understand the conclusion.

-- Chris Murphy

Chris Murphy

10:10 p.m.

On Wed, Jul 22, 2020 at 8:54 AM pmkellly@frontier.com pmkellly@frontier.com wrote:

...

On 7/21/20 18:06, Chris Murphy wrote:

...

...
That is variable depending on the source, but quite a lot of human produced material has a metric F ton of zeros in it, so it turns out we get a lot of compressibility. This is used by the current zram default algorithm, as well as lzo which handles the more complex data. This is typically a 3 to 1 upwards of 4 to 1 compression ratio in my testing, with a conservative 2 to 1 stated in the proposal.

Wow... I would have never guessed that the ratios had gotten so high. I think I know why. Last holiday time I gave a show and tell to a group and part of it was titled File Bloat. I started with a simple text file that had a size of 475 bytes. Then I showed them the very same text saved as various other file types with very simple formatting and at the worst case (.odt) the file was 22KB with the exact same text. I had showed them printed and on screen examples of each and they were amazed at how little they got at the expense of their storage space. It didn't occur to me at the time, but I'm guessing that file bloat is a major source of the compression ratios you mentioned above.

I'm not super familiar with the math involved but I've read from zstd and xz source materials that sophisticated algorithms depend on having more data available to do a good job of compressing. And that's why for small data sets they have the option to build a dictionary using a training mode on a data set, that gives the algorithm something of a head start with knowing about redundancies in that data set.

...

...
JPEG compression, on the other hand, is intentionally lossy.

I'll try to restrain myself from saying much about that particular blight on the land. I've talked to lots of folks about it; even walked some through it with examples. Most just don't care. It's doesn't seem to matter how bad the picture gets as long as they can recognize at all what the picture is of it's okay. The main concern seems to be being able to save LOTS of pictures.

I am happy that my still camera has the ability to save Pictures as RAW or uncompressed TIF(F). I thought it was a sad day when they added compression to TIF.

It depends on the compression. TIFF supports arbitrary compression (it just gets added as a new tag, so what's supported is app specific), but commonly supported algorithms for TIFF are JPEG, ZIP, and LZO. The first is lossy. The second two are lossless. You can decompress and recompress a billion times back and forth between ZIP and LZO and always get back identical bits to the original. (Bugs and hardware anomalies excluded because those can also hit uncompressed data.)

...

...
I'm off hand not aware of any lossy compression algorithms that claim

Back when I was working on those compression analyses there were some very hot debates going on about lossless claims. I don't recall which ones or if they went to court, but I do recall reading some articles about it. As I recall it was in the late '80s.

[Virtually, Visually, Effectively, Essentially] lossless? Yes, I take a very dim view of this.

...

...
Anyway, short of hardware defects, you can compress and decompress data or images using lossless compression billions of times until the heat death of the universe and get identical bits out. It's the same as 2+2=4 and 4=2+2. Same exact information on both sides of the encoding. Anything else is a hardware error, sunspots, cosmic rays, someone made a mistake in testing, etc.

I really like your description. I see now that if the compression is just removing same value byte strings how it really can be truly lossless. As someone who has had to deal with it I'll say that the extra intense radiation from sunspots really does matter. and cosmic rays are ignored only at peril.

Oh for sure. On all counts.

...

...
I don't think asking questions is silly or a problem at all. It's the jumping to conclusions that gave me the frowny face. :-)

I try not to, but I have a lot of "buy in" to Fedora. When I read that there's a change coming and it's on a topic I've had some bad experience with... I apologize for jumping.

Remain sceptical! I do not want Fedora users bitten for any reason, but we know this is going to happen because they already do get bitten from time to time. It's just that we're used to that pattern. And the big changes including Btrfs come with a certain amount of "exchanging problems that we know for problems that we don't know." So we have to learn them. Fortunately the Btrfs change owners have been using it for a long time and are familiar with where the bodies are buried. That is not exactly the best descriptive material for a Fedora Magazine article. :D

But for testers, they're necessarily going to get hammered a bit harder with changes, no matter how transparent they're intended to be for regular users, because testers want to understand the problem well enough to know that it is a problem, whether it may be blocker, etc.

So yeah, it's reasonable to be sceptical of the change and to be critical if something is really obviously not transparent.

...

...
...
What's considered the meta data. Path to file, file name, file header, file footer, data layout?

In a file system context, the metadata is the file system itself. The data is the "payload" of the file, the file contents, the stuff you actually care about. I mean, you might also care about some of the metadata: file name, creation/modification date, but that's probably incidental to the data. The metadata includes the size of the data, whether or not it's compressed, its checksum, the inode, owner, group, posix permissions, selinux label, etc.

Thanks I appreciate the help. Though I've written lots of software. this is my first time being involved with the innards of an OS. In my prior experience their either wasn't an OS or even an IDE, just my hex code, or I just got to take the OS for granted and wrote my code in a IDE. Some years back I worked on an AI project for a few years. We had lots of discussions about data and metadata. Like what the terms should mean and what items belong in each category. One sort of profound conclusion we reached is AI won't happen (in the StarTrek sense) until object oriented programing really is. And to get real object oriented we must give up the Von Neumann model for computers. One of the main things that means is that we must stop addressing things by where they are and start addressing them by what they are. I think I read once that there was a prototype built in hardware someplace and they were just getting started with the testing of the hardware. Probably abandon by now. First research machines are always very expensive and no one ever wants to invest for the long term.

...
...
Oh I just noticed crc32c. That's acceptable.

This is the default. It's acceptable for detecting incidental sources of corruption. Since kernel 5.5 there's also xxhash64, which is about as fast as crc32c, sometimes faster on some hardware. And for cryptographic hashing Btrfs offers blake2b (SHA3 runner up) and sha256. These are mkfs time options.

My experience with crc32 was in a hardware implementation. We needed good data integrity and the memory controller chip we chose had crc32 built in. I forget how many bits we added to each word to save the check bits, but I think it was four. I can imagine the uproarious laughter that would result if someone at a gathering of PC folks suggested that new memory modules should include extra bits to support hardware crc.

On Btrfs, it's 4 bytes of crc32c per 4KiB data block. And at least by default it's 4 bytes of crc32c per 16KiB metadata node/leaf block. Max metadata block size is 64KiB. Computationally it's negligible latency, even without hardware acceleration support. In some workloads it can show up in IO latency.

-- Chris Murphy

Chris Murphy

20 Jul 20 Jul

10:22 p.m.

On Mon, Jul 20, 2020 at 8:23 AM pmkellly@frontier.com pmkellly@frontier.com wrote:

...

Let's talk about zram.

I just got around to figuring out what zram is. I see it's automatically set up when Anaconda sets up btrfs.

It's setup by zram-generator and zram-generator-defaults being present, and they're installed by default (or should be) on every installation image: netintstall, DVD, Live. And whether you do an automatic or custom installation, regardless of file system. If you do a custom installation and create a disk based swap partition, you will have two swaps and the zram-based one will be used with a higher priority.

...

First a bit of background all the PCs (Fedora WS) I maintain are bought with lots of ram 8GB is the min. I don't think I've ever seen swap move off zero and no one has reported any sort of slowdowns. Yet we do lots of memory intensive things. Of course all this is with ext4.

Are you changing /proc/sys/vm/swappiness to reduce or avoid swapping?

The typical case, some swap is used. These are evicted anonymous pages, and it's more efficient for the kernel to do that for stale anonymous pages, rather than reclaim (ejecting file pages from memory.

How effective this is does really depend on the workload, but happily that's the vast majority of the time. And I'm still looking for workloads that do poorly (I'm sure we'll find some eventually).

...

Now I see on my test machine (btrfs) that about 24% (2GB) of the 8GB of memory (according to monitor) (4GB according to disks) is dedicated to zram0. I imagine the difference in size reported to due to the compression used by zram.

Yes but also it's a bit misleading because it's dynamically allocated. To see how much memory is actually used you need to look at zramctl output. I'm not actually seeing anywhere in System Monitor where it suggests how much RAM is being used by the zram device.

...

First I can't think of any good reason why I should need or want to have any of my ram dedicated to swapping or other things that speed up btrfs.

It's not Btrfs specific or related.

...

Second, a few years back, I looked over some of the compression algorithms. The probability of loss or corruption was low, but non-zero.

True but if you have corruption resulting from memory problems, that's bad no matter how much gets corrupted. The memory must be replaced or you have to figure out the exact location of bad RAM and setup a kernel exclusion memory map as a boot parameter.

...

I'm sure they have improved, but I'll bet those probabilities are still non-zero. I was taught back in my early years at school that unnecessary risk is foolish risk. So I've always avoided using compression where ever I had an option.

The compression itself isn't going to cause corruption. What happens is more data is effectively corrupted, if there's corruption, due to the compression. But you don't want corruption happening in the first place, no matter whether there's compression.

...

If this is necessary to make btrfs work or get reasonable performance from it. That is a strike against btrfs.

If a workload is going to be slower on btrfs, it'll still be slower whether swaponzram is enabled or not. But if you don't want to use swaponzram, merely 'touch /etc/systemd/zram-generator.conf' and that will permanently disable it.

-- Chris Murphy

1438

Age (days ago)

1440

Last active (days ago)

test@lists.fedoraproject.org

11 comments

3 participants

tags (0)

participants (3)

Chris Murphy
pmkellly＠frontier.com
Samuel Sieb