Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to distinguish unique installations to anonymous usage report #4077

Open
oleiade opened this issue Nov 27, 2024 · 2 comments
Labels

Comments

@oleiade
Copy link
Member

oleiade commented Nov 27, 2024

Feature Description

Problem Definition

k6 currently collects anonymous usage information as part of its opt-out usage report (--no-usage-report). This report helps us understand k6 usage patterns to improve the tool and guide development decisions.

To better support the k6 development process, we would like to measure the number of active installations of k6 over time.

This requires the ability to track when a given installation of k6 (on a machine that has not opted out of the usage report) was last used. Since each usage report already includes a timestamp, the only additional functionality needed is a mechanism to distinguish one installation from another.

Considerations

The identifier introduced to enable this functionality would:
• Be anonymous.
• Be stored locally on the machine running k6.
• Be included in the usage report only if telemetry has not been opted out.

This identifier:
• Will not contain any personally identifiable information (PII) or system-specific data (e.g., username, hostname, IP address, etc.).
• Will comply with GDPR and other relevant privacy laws by being designed to avoid user identification and to remain strictly anonymous.

Risks

  • Using a random identifier and storing it might cause tampering risks. We should not trust data that the user can provide. As such, the identifier being reproducible/verifiable by k6 before submitting would be a nice to have.

Why This Matters

Having a reliable measure of active installations will:
• Allow us to make more informed decisions about features and improvements.
• Help us better understand k6’s reach and growth while respecting user privacy.

Suggested Solution (optional)

Inspiration & References

ID generation

  • machineid
  • In a previous role, I was exposed to a similar need, and we used a system fingerprinting mechanism that created a hash for the user system. We had the ability to verify this fingerprint, but the hash itself was cryptographic and thus non-reversible.

Proposed solution(s)

TODO

Already existing or connected issues / PRs (optional)

#4038

@oleiade
Copy link
Member Author

oleiade commented Nov 27, 2024

For context, the Alloy project uses a UUID they call a "seed". This seed is saved on disk on the user system as a "seed file".
See https://github.com/grafana/alloy/blob/cc383c1edf988fd4763582c86a2e4b85bcc0f055/internal/alloyseed/alloyseed.go.

cc @joanlopez

@joanlopez
Copy link
Contributor

The most challenging part I see here is to consider what you @oleiade included in the risks section, especially considering that this is an open-source project, which makes it harder to keep some secrets unrevealed.

However, I'm not sure quite sure it does really worth, because as of now we're not doing anything to prevent fake data at the report level, and I see this just a subcase of that.

Do you have any particular idea on how to solve this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants