Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

De-dupe across different bridges #800

Open
snarfed opened this issue Jan 23, 2024 · 6 comments
Open

De-dupe across different bridges #800

snarfed opened this issue Jan 23, 2024 · 6 comments

Comments

@snarfed
Copy link
Owner

snarfed commented Jan 23, 2024

(related to #348, #543, gugray/rss-parrot#29; also posted to SocialHub)

One problem we're starting to see is multiple bridged accounts for the same user or site. Eg my web site https://snarfed.org/ is bridged by BF as @[email protected], but it's also bridged by RSS Parrot as @[email protected].

Neither bridge is really doing anything wrong, but still, it'd be nice to prevent this. One approach would be for users to indicate their "preferred" bridge to a given protocol. When bridge A sees a user for the first time, it would determine whether that user is already using (and prefers) a different bridge B. If so, bridge A would ignore it and not bridge it.

Existing approaches:

These approaches usually have a way to explicit tag an identity as a bridge, but they often do have at least a URI scheme or MIME type.

Example of FEP-fffd: say bridge A translates between AP and ATProto. FEP-fffd says that ATProto is a well-known alternative protocol that’s identified by the did: URI scheme. Bridge A would fetch the actor, see if it has a rel=alternate or rel=canonical link pointing to a did:, and if it does, bridge A would ignore it.

(mediaType and URI schemes aren’t a great way to identify other protocols - eg did: and https: URIs are used by more than one protocol - but that’s a separate concern, and probably manageable. It’s also better than depending on each bridge to hard-code other bridges’s domains or other identities, detect those in links, and constantly keep those lists updated.)

The part I wonder about is beyond ActivityPub. FEP-fffd is great for AP actors, but bridges also need to look at users in other protocols to determine whether to bridge them. You could look at their corresponding AP actor, but you’d need to find it first. I’ve taken one stab at mapping ids and handles between protocols, but there’s obviously no standard, so you don’t necessarily know where to look.

Maybe more importantly, bridges bridge a number of different protocols. I’d love a way to do this across protocols, so that eg an ATProto <=> Nostr bridge could de-dupe bridged accounts and satisfy this same use case.

Should we use each of the approaches above individually? Should we look at user-provided profile links and detect bridged accounts that way? Are there other alternatives I’m missing?

@snarfed
Copy link
Owner Author

snarfed commented Jan 23, 2024

The ATProto equivalent might be alsoKnownAs in the DID document. Discussion: bluesky-social/atproto#2075

@snarfed
Copy link
Owner Author

snarfed commented Jan 23, 2024

Current tentative plan: use each protocols' native mechanism, identify protocols by URI scheme and/or MIME type. Not ideal, but it's a start.

@ar-nelson
Copy link

FEP-fffd only discusses merging proxy objects at the client level (for example, in a Mastodon frontend), but it also makes sense to refuse to bridge proxy objects so that proxies-of-proxies aren't created unnecessarily. This isn't discussed in the FEP, but it should be; maybe it needs an update to address this.

Using media type and protocol, rather than an explicit protocol identifier, was a change I made intentionally during the revision process. The original draft of FEP-fffd had protocol identifiers, like atproto, but that seemed too consensus-dependent: how would implementers agree on whether it's at or atproto or bsky or bluesky? I could define a few well-known protocols, but what about new ones? URL schemes at least have a fixed, unambiguous meaning: even if another platform also uses did:, the same URL will necessarily be the same DID, so it makes sense to treat it as a proxy object for the same entity.

@chrismessina
Copy link

chrismessina commented May 5, 2024

Just to provide some real world examples of how this can get confusing... here are some examples of bridged users following me from Mastodon on Bluesky:

Screenshots

I already follow these folks on their original networks, and maybe early adopters are going to experience this kind of redundant following because we each maintain several accounts across many networks, but this kind of account proliferation can create both spam/abuse/impersonation concerns if such bridged accounts aren't somehow able to be identified as "remote accounts" of some kind.

Being notified of this kind of follow isn't necessarily a problem and can likely be explained through clear and effective UI text, but additional behavior warrant consideration:

  • Should I follow these accounts back?
  • If I do, does that mean that I might mention or otherwise try to interact with (DM, etc) bridged accounts?
    • Will the bridged account owner receive notifications of those interactions?
  • What happens if I report these accounts? Will instance or serve operators/support personnel have enough context to not block these kinds of accounts?
  • Should "remote viewing"/bridged accounts receive a special status on federated servers so they can marked differently in UIs so that users don't interact with them and apply their default expectations that they would interacting with any other normal user account?

Open to discussing further, but as we now have these examples showing up in the wild, we should anticipate that additional confusion may arise.

@shiribailem
Copy link

I feel like deduplication is less a bridge side and more a user side concern.

I would suggest against any effort to lock feeds into specific bridges just because that also removes the positive impacts of redundancy and user choice.

What I would suggest however is maybe working towards an AP format for clearly identifying sources on bridged / mirrored content.

For example maybe something like a metadata tag of something like "relay-from" that contains a standard id for the source (think like ISO country codes, but for networks, so like "AP", "AT", "RSS", "FB", "IG", etc), the account handle (in the case of something like RSS it would be the url), then variable other details depending on protocol and bridge (for instance breaking out the source server).

This would mean all the various instance and client software could check these things. They can deduplicate within their account as the software would then be able to know that these two accounts are just bridging the same source.

Additionally user software could potentially very easily block by the original user account or server and not just the bridged handle.

@Tamschi
Copy link
Collaborator

Tamschi commented Nov 18, 2024

Bridgy Fed does do that already, at least for Bluesky users:

  • In the WebFinger response, the bsky.app profile URL (and apparently also any profile bio links?) are served in the "aliases" array.
  • In the Application (bridge actor) object, the at://did:… URI is served in the "alsoKnownAs" array.

The main issue is that nothing else detects this right now… and it may not be the most feasible for bridges to handle this, compared to interacting native ActivityPub users' instances, but that may be a lack of imagination on my part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants