Benefits for LWN subscribersThe primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!
"User tracking" is generally contentious in free-software communities—even if the "tracking" is not really intended to do so. It is often distributions that have the most interest in counting their users, but Linux users tend to be more privacy conscious than users of more mainstream desktop operating systems. The Fedora project recently discussed how to count its users and ways to preserve their privacy while doing so.
Ben Cotton brought
up the topic in the context of a proposal for
Fedora 30. Instead of the current method of counting unique IP
addresses that request
updates from the DNF mirrors, which is an unreliable estimator of Fedora
usage, the proposal would create a unique user ID (UUID) for each installed
system that would be sent with DNF mirror-list requests. It
explicitly calls
out privacy concerns: "We don't want to track; just count.
"
The proposal outlines the kind of information that the project would like
to count, including the version of Fedora, the
Fedora variant (or spin), and the architecture of the machine. It would
also be useful to have some
way to distinguish long-lived installations from one-off test systems in
virtual machines.
Currently, variants cannot be distinguished and the unique IP counting
method both undercounts systems behind network address translation (NAT)
and overcounts systems that change IP addresses frequently. The UUID is
similar to what openSUSE uses, so "this is ground already
traveled
".
Using the machine ID (stored in /etc/machine-id) as the UUID is not part of the plan, since it may be used in other ways that would facilitate tracking. So some kind of random UUID would be generated for this purpose. But, as Lennart Poettering pointed out, sending a UUID makes tracking possible even if the project doesn't want to do that tracking. Essentially, users would need to trust that the project isn't doing the tracking because it says it isn't. While he was skeptical that Fedora really wanted to use a UUID that way, he did suggest using an application-specific machine ID, like those calculated by sd_id128_get_machine_app_specific(). That way, Fedora would be using an existing mechanism that generates a UUID using the machine ID and an ID specific to the counting application.
Poettering also mentioned that Ubuntu counts installations via NTP, which
might be an option if Fedora wanted to run its own NTP servers. Both
Ubuntu and Fedora configure their systems to regularly ping the NTP servers.
Another possibility would be to send a "countme" flag once a day as
part of the captive-portal and connectivity detection that is already
installed with Fedora, but that did not sit well
with Kevin Kofler. He called the existing
NetworkManager-config-connectivity-fedora package "spyware
"
and does not install it on his systems.
Fedora
project leader Matthew Miller (who is also the owner of the feature
proposal) said that
the connectivity check could be used but it would only count a subset of
desktops and not other types of installations, such as server, cloud, or
container. In addition, setting up NTP servers would be much more work
than hosting a UUID-counting service, he said.
Miller said that
the intention is to rotate the logs "fairly
frequently
", but that is not really visible to users so there is
still a trust factor present. But Tom Gundersen suggested
another approach:
You could move the rotation to the client by hashing the UUID with a timestamp of sufficiently coarse granularity (a week?) before submitting it.
Then you make sure that all UUIDs submitted by a given machine during a given time window are the same, but UUIDs submitted in different windows are not related, and you don't have to trust the server to respect your privacy.
That approach would "make sense
" Poettering said, though
he still advocated using NTP or the "HTTP ping" that is done as part of
the captive-portal detection. Others, such as Bruno
Wolff III, are worried that even if the UUIDs
are changed frequently, users still have to trust Fedora (or someone who
gained access to the logs) not to correlate UUIDs, IP addresses, and other
information to track users that way. Beyond that, Nicolas Mailhot is concerned
about interaction with the EU General Data Protection Regulation (GDPR); that
requires a shift in thinking about how data can be misused:
That's what the GDPR is about. It's *your* responsibility as data collector to think about how data could be used, it's *your* problem to protect it, it's *your* problem if it's misused, you can not make it available on a platter for others to do evil things with and claim it's those people's problem.
Wolff also pointed out that attackers may try to send UUIDs that are unexpected. Those could be generated to try to attack the system in some way or they could simply be strings containing profanity or other "not safe for work" (NSFW) content. He wants to ensure that the actual UUID strings don't end up in reports or require review by humans. Even ensuring that the strings are valid hexadecimal doesn't preclude inventive usage that could embarrass the project or offend people. Beyond that, UUIDs could be changed more frequently to try to inflate the statistics.
As these privacy and other problems with the UUID scheme were being discussed, Poettering came up with a scheme that alleviated most of the problems that were identified. He proposed that a "countme" flag simply be added to a single mirror-list query each week. The sum of all such queries over a week's time should provide an accurate estimate of the number of Fedora systems. That way, UUIDs need not be stored, which removes much of the concern—data that is not stored cannot be misused.
Poettering followed up by noting that avoiding even the appearance of tracking will likely result in fewer users disabling the counting mechanism. Miller was enthusiastic about the idea; he suggested that since there would be no UUID associated with the information, the "countme" flag could increment once per week, which would give some additional information about the longevity of systems—without providing much information that could be used for tracking.
It would not even necessarily require that every machine reported, Roberto Ragusa suggested. Machines could decide whether to report based on some property of their machine ID (e.g. divides evenly by 1000) or by combining machine ID and the date so that the counted systems would change over time. Then the counts could simply be multiplied by whatever is used as a modulus to provide the actual estimate.
Overall, there were few complaints about the simpler counting mechanism. Miller has updated the proposal using Poettering's method; it should be posted to the mailing list soon, once he receives some feedback from the DNF developers. It seems likely that Fedora 30 will have the feature when it is released, which is currently scheduled for the end of April.
We have looked at other user-counting initiatives and proposals along the way. In 2010, there was a proposal to add UUID tracking to Yum, but Fedora has been trying to figure how to unobtrusively count users for longer than that. A 2006 scheme involving a tracking image was proposed for Fedora Core 7. More recently, the Django web-framework project discussed adding analytics that would report to Google servers, which was not popular with Debian (at least).
There is a certain amount of tension between the needs of a distribution or software project and the needs of users—especially when it comes to privacy issues. Being able to show the existence of more project users will generally lead to a higher profile and potentially more funding for development and other activities. Counting variants can also help projects make better decisions about where to allocate their scarce resources. But many users do not want to be tracked, though they may be willing to be counted. This Fedora proposal seems like it finds a reasonable balance by reusing an existing mechanism without adding something that could be tracked. It will be interesting to see what Fedora finds once it rolls out this counting feature to users.
| Index entries for this article | |
|---|---|
| Security | Privacy |