Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite ETIP as a GitHub repository #133

Open
pnu-s opened this issue Feb 1, 2022 · 17 comments
Open

Rewrite ETIP as a GitHub repository #133

pnu-s opened this issue Feb 1, 2022 · 17 comments
Assignees
Labels
question Further information is requested

Comments

@pnu-s
Copy link
Member

pnu-s commented Feb 1, 2022

After a long thinking and experiencing various pain with the current version of ETIP (as well as seeing the same pain from many contributors), I'm wondering whether we could/should revamp totally ETIP another way.

Hear me out: what about switching to a GitHub repository with a specific file for each tracker. I'd expect Markdown or JSON format (but preferably Markdown to ease the review).

Name of the repository could be: https://github.com/Exodus-Privacy/trackers

Advantages:

  • Each new contribution (tracker creation, tracker modification, tracker validation) can be discussed within the PR (see Add possibility to discuss trackers #106)
  • Each new contribution (tracker creation, tracker modification, tracker validation) can be validated within the PR
  • Everything about a tracker (its signature, name, etc.) can be discussed within an issue
  • Every modification is visible through an associated PR
  • Every past modification is visible through git commits (better transparency)
  • We can have tracker validation within GitHub Actions (check for collisions, validate within the tracker is complete when validated, etc.)
  • We could add administration rights to people outside the Exodus Privacy organization easily
  • This removes the need to create account for everyone
  • This removes the need to maintain the Django app (dependencies, etc.)
  • This puts the focus on what matters here: the trackers details

Drawbacks:

  • This requires a significant amount of work (writing scripts mostly) but I'm up for it if we go that way
  • Probably harder to search for a specific tracker
  • Everyone is required to have a GitHub account to contribute
  • Each contribution needs to be reviewed before being merged (is it a drawback?)

This is a major change so I would love to hear your opinions @U039b @jawz101 @eighthave @blaueente @IzzySoft (feel free to tag any potentially interested person)

@pnu-s pnu-s added the question Further information is requested label Feb 1, 2022
@pnu-s pnu-s self-assigned this Feb 1, 2022
@jfoucry
Copy link

jfoucry commented Feb 1, 2022

What about the existing database of trackers in ETIP?
Did we need to create a script in order to, for each tracker in the database, create a PR?

@pnu-s
Copy link
Member Author

pnu-s commented Feb 1, 2022

Not to create a PR, but indeed we would need to create a script to migrate our existing trackers to the new format.

That goes into this point:

This requires a significant amount of work (writing scripts mostly) but I'm up for it if we go that way

@eighthave
Copy link
Contributor

eighthave commented Feb 1, 2022 via email

@FestplattenSchnitzel
Copy link

This sounds like a very good idea! Without much knowledge about the existing setup, I'd guess this change would make that data a lot easier to read for machines and thus enables other projects (like F-Droid) to use it as well.

Each new contribution (tracker creation, tracker modification, tracker validation) can be validated within the PR

If you use YAML or JSON you can use a JSON schema for validation.

Everyone is required to have a GitHub account to contribute

It'd be great to see you on federated Gitea when it's there.

Speaking of submission: https://www.datenanfragen.de/ / https://www.datarequests.org does provide a web form [0] that will create a PR with a JSON file (e.g. [1]) at GitHub for example.

[0] - German : https://www.datenanfragen.de/suggest/#!type=new&for=cdb
[0] - English : https://www.datarequests.org/suggest/#!type=new&for=cdb
[1] : datenanfragen/data#1646

@jawz101
Copy link

jawz101 commented Feb 3, 2022

I agree. A change at this point should serve multiple purposes. If it is easier to administer as well as easier to consume into Exodus. I don't know what the bottleneck is that caused the 200 tracker signatures to pile up but if it is a delay in moving it from ETIP into the machine-readable formats of Exodus itself, tracker signature finders can certainly step up and write things in a format that is more consumable and less hands off. Or if we need to set up test environments and see how test apk's handle the signatures- I'm fine with that.

I just want to get the current backlog whittled down and let that process dictate how a new system could introduce improvement. My only concern with Github is then another bottleneck is introduced because pull requests get sat on and people get caught up in a back & forth discussion about a tracker signature instead of having someone with the interest to implement new tracker definitions. I wouldn't think the implementer should immediately add every tracker as they are submitted. Rather, wait for 20 or so to pile up and then do the same operation to implement several at a time. If the problem is submitters leave fields blank or the regex isn't correct, then we need required fields and a note that it needs to be in a particular format for the implementers to integrate it. Just a note by the field. It doesn't need to be some fancy syntax checker.

@pnu-s
Copy link
Member Author

pnu-s commented Feb 3, 2022

Thanks for your inputs @jawz101 !

I share your concern to reduce the current backlog. I just added a couple of (very) minor changes to ETIP to ease the review and we had a meeting this week within the organization to try to put more (volunteer) people into this task.

Actually, moving the trackers from ETIP to exodus is probably the only thing which works really well (and it's automated so requires very little human time).

What I miss the most in the current version of ETIP is:

  • seeing when a new tracker is added
  • being able to discuss with the submitter in case of something unclear

What happens for most trackers currently in the backlog:

  1. The tracker profile is fine, it just needs some people to review it -> that do happen, but as I said we try to engage more people into this review task
  2. The tracker profile does not match our definition of a tracker: either a webchat sdk, a gaming development sdk, an identifier generation sdk, etc. -> currently we do not know really well how to treat those
  3. The tracker profile is fine but the signature does not match any report in exodus (0 match) -> that happens a lot, and it's very hard for us to validate the signature if there is 0 match

I would say that the case 3 is the most common, then 2 then 1.

I'm thinking that moving to a code repository would ease the discussion between submitters and reviewers, and allow us to not let a huge backlog like the current one happen. But I can be wrong, this won't solve every issue of ours.

And yes, we probably need to tackle the backlog before moving to a new system.

@jawz101
Copy link

jawz101 commented Feb 3, 2022

Perhaps something that indicates "needs more information" if there is a question about if it is indeed a tracker or not. I still like seeing that a signature is in there even if it does not fit the definition of a tracker because it would likely come up again. I mainly look for technical documentation if it is publicly accessible which tells me that it must be in some application somewhere at least at one point. Since there is not a convenient way to upload unknown apks directly from the phone, a cumbersome part of the submission process is having to go to the Exodus site with a package name in mind and upload it. And with the library only representing a 80,000 or so apps that leaves a large chunk unchecked.

But yeah, the ETIP website did seem like a lot of effort to invest rather than using something like Github. Though if it functions on the backend with a database, that has its own conveniences.

@eighthave
Copy link
Contributor

I have some time to work on this, so I started sketching it out. Here is the first stab at a YAML conversion, it definitely needs work, but it is a good place to continue to conversation:
https://github.com/eighthave/etip/tree/yaml-conversion/trackers

@pnu-s did you have time to work on this at all? If you have code for getting the data out of the database, I'm happy to work on getting it nicely outputted to YAML. I've been working from the JSON from https://reports.exodus-privacy.eu.org/api/trackers

@eighthave
Copy link
Contributor

I was just working with @Miriam-cpu / mobilsicher.de and we thought that we could standardize on a data format here that would work for:

  • Exodus' tracker lists
  • F-Droid's proprietary libs lists
  • Mobil Sicher's third party network services list.

I think we can clearly use the same code data fields and structures, and additional project-specific fields can be added as needed without conflicting with these core fields. This works well when the base data structure is a dictionary. The only notable difference I can think of between these lists would be that Exodus and F-Droid's network_signatures lists mark the problematic domains while Mobil Sicher's third party networks list needs to list the "good" domains, then any other domain found would be considered "third party".

@pnu-s
Copy link
Member Author

pnu-s commented Jun 6, 2022

@eighthave Thanks for the work you've put into this!

To be honest, we put our recent efforts about ETIP into adding new features to its current form, for instance to make it more explicit why some trackers are not accepted into εxodus yet (which is our main problematic at the moment).

Rewriting ETIP would cost us, and I'm not entirely convinced that we would win more than lose in terms of ease of use and of features. That can obviously still be discussed and is not a final decision, but we decided to still invest into ETIP's current form.

This being said, we are obviously open to discuss about the data format for trackers, and about changes in ETIP UI, JSON export format or εxodus JSON API response.

@eighthave
Copy link
Contributor

Can you point me to the new ETIP work? I couldn't find anything.

I'm still convinced that managing the ETIP/Exodus process via files and pull requests will make it easier to follow the work, and contribute to it. Millions of people are familiar with the git workflow at this point, so that alone means it is easier for people to follow. I have time to work on building this out, and we're going to do it anyway for the F-Droid.org proprietary libs list, and probably also the mobilsicher.de third-party list

@pnu-s
Copy link
Member Author

pnu-s commented Jun 9, 2022

Can you point me to the new ETIP work? I couldn't find anything.

What I meant is that we added a couple of new features, such as the number of matches in exodus and the new badge for each tracker, which easily show why a tracker is not added to εxodus yet

I'm still convinced that managing the ETIP/Exodus process via files and pull requests will make it easier to follow the work

I have mixed feelings about this, mostly because we would lose all the efforts we have made to the current form of ETIP (such as the automated integration of trackers from ETIP to εxodus, which would need to be rewritten).

But I obviously see some benefits (otherwise I would not have create this issue in the first place 😄)

we're going to do it anyway for the F-Droid.org proprietary libs list, and probably also the mobilsicher.de third-party list

What do you imagine here?
Do you think we could have a unique repository managed by multiple organizations?

@eighthave
Copy link
Contributor

eighthave commented Jun 9, 2022

Can you point me to the new ETIP work? I couldn't find anything.

What I meant is that we added a couple of new features, such as the number of matches in exodus and the new badge for each tracker, which easily show why a tracker is not added to εxodus yet

Where can I see that?

I'm still convinced that managing the ETIP/Exodus process via files and pull requests will make it easier to follow the work

I have mixed feelings about this, mostly because we would lose all the efforts we have made to the current form of ETIP (such as the automated integration of trackers from ETIP to εxodus, which would need to be rewritten).

But I obviously see some benefits (otherwise I would not have create this issue in the first place smile)

If you point me to the code that does that integration, I can look and see if I can handle the porting.

we're going to do it anyway for the F-Droid.org proprietary libs list, and probably also the mobilsicher.de third-party list

What do you imagine here? Do you think we could have a unique repository managed by multiple organizations?

I think it is possible, as long as we can find agreement on how it should be maintained. I'm talking with mobilsicher.de and @IzzySoft about how to make this happen. mobilsicher.de currently maintains their own list, and @IzzySoft's library list is here in JSON Lines format:
https://gitlab.com/IzzyOnDroid/repo/-/blob/master/lib/libinfo.txt

@eighthave
Copy link
Contributor

I just put together some examples to start thinking about this more:
https://gitlab.com/eighthave/proprietary-libs-list/-/tree/main/profiles

I don't yet see a clear logic to how the libraries are grouped. I think ETIP groups them more or less by "product" as defined by the companies that release it. Now that I've gone through this more, I think fdroid scanner would need things to be grouped by Anti-Feature. So basically, each profile would include:

anti_features:
  - NonFreeDep
  - Tracking

Then all of the code_signatures: entries should mean that something that contains all of those Anti-Features was found. Otherwise, we'd need some mapping of signature to Anti-Feature.

@eighthave
Copy link
Contributor

eighthave commented Jun 10, 2022

After sleeping on this, I think we can actually leave the grouping pretty open because it should be fine if multiple profiles match a given library every now and then. These profiles are ultimately about showing info to a human, so multiple hits for a single library should be fine.

@eighthave
Copy link
Contributor

You can see the first version of F-Droid rewriting its signature profiles as a git repo of YAML files now. We call is "suss" https://gitlab.com/fdroid/fdroid-suss

@eighthave
Copy link
Contributor

Here's more on F-Droid's work on a YAML/git setup for signature profiles:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Development

No branches or pull requests

5 participants