On Jul 7, 2023, at 7:09 AM, Michael Catanzaro
<mcatanzaro(a)redhat.com> wrote:
On Thu, Jul 6 2023 at 09:27:47 PM +0200, Florian Weimer
<fweimer(a)redhat.com> wrote:
> What about packages which already collect metrics and report them
> somewhere (not necessarily to Red Hat)? Would these packages need to
> change under this proposal? If not, how do we explain this to our
> users?
No, packages that are already collecting their own metrics separately
would not be affected.
I’d almost prefer we work out a policy where anything of the sort is disabled by default,
and with a distro-wide standard bcond to not even compile it in as an option. (No, I don’t
quite know how that could be worded sensibly as a policy…. but it’s where I think I’d
prefer to start from).
Even well intentioned things can be problematic.
Did you know that “lshw" does a DNS query?
Not only that, it’s a DNS query not to where the distro points to, but somewhere out on
the internet.
By running “lshw” you’ve now told a DNS server how many machines / people you have running
“lshw” within some amount of time.
You’ve also now complicated the ability to go “I allow access to the packaging
repositories for security updates, the one two or three endpoints my application needs to
talk to, and if any of these machines EVER tries to do any other network activity, page
people immediately as that can only mean something is wrong”. This *really* isn’t an
unreasonable thing for people to do, in fact I really, really, REALLY want to make it easy
for people to do this (and not start paging people just because someone diagnosing a
problem typed “lshw” or something)
For lshw specifically, this is fixed in c9s, Fedora, and upstream now has an option to
build with this feature disabled:
-
https://gitlab.com/redhat/centos-stream/rpms/lshw/-/merge_requests/3
-
https://bugzilla.redhat.com/show_bug.cgi?id=2098463
-
https://src.fedoraproject.org/rpms/lshw/pull-request/1
-
https://github.com/lyonel/lshw/pull/86
Now, this example is obviously not that extreme or anything. It’s arguably less
information than what’s in your average `curl
http://foo` <
http://foo`/> request.
But the burden we put on our users is to evaluate each of these is to evaluate for them,
in their deployment and security context, if they are okay with a third party having that
information, and that they understand exactly what is being done, and what *could* be done
with it. It sounds like a lot of work.
An example of this, the countme feature
https://docs.fedoraproject.org/en-US/fedora-coreos/counting/ /
https://docs.fedoraproject.org/en-US/infra/sysadmin_guide/dnf-counting/ /
https://lwn.net/Articles/776327/ that lives as default on in Fedora (on my at-home
personal Fedora machines too). I made a personal decision for my own machines, but when
looking at it in the context of building the next (now current) version of Amazon Linux, I
was faced with a choice: do we go through a process of independently working out what our
customer thoughts would be on this feature, be prepared to set up our own infrastructure
around it, how we’d communicate about it, as well as ensure all of that meets the security
and privacy bars we want to uphold….. or do we just not enable it and spend that time on
other things? We chose to spend the time on other things, as setting this up was not
critical for us.
But what was fantastic about this was that Fedora was very very very clear about the
change, how it worked, the efforts gone to etc, and it was so easy to flip on/off and was
really just in one place, and a place we would *have* to modify when we started building
our own distro.