I might not understand the situation well enuf, but why should an actor in one server be able to add resources to a collection on another server? Wouldn’t it be more polite (and respectful of data control) to offer the resource and then let the moderator of the other instance decide whether to add it to their collection or not?
Just for adding a little a context: We are trying to create a full federated system where you can do any task independently of the server your user resides. We are trying to do a collaborative tool.
Your solution was the first thing we thought, however, it suffers the same problem.
Moderator A lives in Server A and Moderator B lives in Server B and the community lives in Server C. Each moderator can create an activity in their respective servers to accept two different resources. This generates two different activities which will be received by Server C creating a conflict (11 resources).
I think the problem resides that the data and the logic are spread among different servers and none of them have the “full truth” making this kind of business logic very difficult to implement.
Don’t store the collection locally in B and C as a source of truth. Treat it as a cache. The server A is the sole authority, so prefer storing an IRI of the collection instead on B and C. Dereference and cache as needed.
Server A could reply with a Reject to one of B or C’s Activities (whichever one is violating the arbitrary constraint). The constraint shouldn’t be solely enforced client side (B or C) but should also be enforced at A. But note these arbitrary limitations could get out of sync or mismatched between servers and clients. This is where a Vocabulary extension that makes use of RDF constraints could help but won’t stop bad actors who disregard specs anyway.
Yeah, I agree with you about using treat the external information as a cache.
I don’t know if I understood the proposed solution. So when server A receives a message which violates a constraint, it should generate and federate a new Activity “Rejecting” the received activity. It could work… but I see two problems:
Eventual inconsistency: for a period of time many servers will have an invalid resource. The servers can send a notification to the users with the new resource, just to realize later it was rejected
Some servers can receive the events in reversed order: Rejecting first and later the rejected activity. This can be tricky to handle.
Yeah, I was thinking on the synchronization to access to A. This sync would be completely out of the scope of ActivityPub and it should be implemented in a specific way for this particular application. So server B makes a request to server A to create a resource, and it is the Server A who generates and federates the activity. The first part isn’t ActivityPub, the second one it is.
Also it’s not out of scope of ActivityPub: this use case is explicitly handled in the C2S (Social Protocol) part of the specification. So Server B is the client telling Server A to create a resource via ActivityPub C2S, and then Server A federates it with ActivityPub S2S.
The first point is valid of any cache based system. It’s the cache invalidation problem. The specific behavior you mention is only a problem for an implementation that manages that problem in an undesired way, so build the solution to manage cache invalidation in a desired way or don’t cache the object at all.
The second criticism I think is because we aren’t aligned with how ActivityPub is behaving and which servers are federating which Activities to whom. In my mental model, B and C are ActivityPub clients using C2S to talk to A, so they can get individual Reject wrapped replies in response to their C2S messages. Server A just federates the successful Activities. Maybe it needs to be more complex than that, but it should be a good foundation.
I disagree here. The AP specs are clear in this aspect:
ActivityPub provides two layers:
A server to server federation protocol (so decentralized websites can share information)
A client to server protocol (so users, including real-world users, bots, and other automated processes, can communicate with ActivityPub using their accounts on servers, from a phone or desktop or web application or whatever)
The situation here is user B is sending a new activity to the server B, this is C2S. Later, if server B communicates with server A will be S2S.
It is not about the cache, it is about the server Z receives an activity because it has users who are the recipients of this activity. The server Z adds this activity to the user’s inbox, but also, it sends an email or a push notification to notify the user about this new activity. After 5 seconds it receives the “Reject” activity. The server Z can remove this activity from the user’s inbox, but it is difficult to handle the email or push notification situation.
I think it is very different to see old data information (ie: A resource is shown in the list because cache, it was a resource but not anymore) to see something that never happened (see a invalid resource that has never been accepted)
I can see your point and I think this can be part of the solution, but I’m afraid to break some part of the AP specs, specifically the security checks. AFAIK the ID of the activities and the objects created by an Actor should share the same server for security reasons. So, if the final activity has an ID from Server A, the object of the activity (the resource) has an ID from Server A, but the actor of the activity is from Server B and the rest of the fediverse accepts this activity as a valid one, it would be very easy to spoof any user activity with a malicious server.
So my “best” option at the moment is:
User B sends the activity to Server B.
Server B makes some checks like it is a valid user in the server.
Server B knows the collection is from Server A, so it makes a custom request (custom protocol, I don’t know how you would do this part in AP) to create the resource. When a server sends an activity to another server is informing about an activity happened, not asking for permission.
Server A receives the request and it makes the checks.
a) If everything is ok it creates the activity and the resource in the name of user B.
Server A returns the activity to the server B.
Server A federates the activity to the followers of the collection which receives the update.
Server B federates the activity to the followers of user B.
Server B returns the new generated activity to user B.
It could also happen:
b) Server A returns an error to Server B because of the 10 resources limit
Server B undoes any related operation in the database about this activity.
Server B returns the error to user B.
I appreciate the time you’re using in this issue. I think it is a very interesting discussion. Thank you.
I need to solve a very similar problem for a project I have coming up, so I’m happy to bounce these ideas around. It’s been stuff I’ve been thinking on for a long time, since probably the fall of last year.
I’m not sure what you’re disagreeing with: The client to server protocol doesn’t forbid two random servers talking C2S to each other, if that’s the relationship they want to take on. It also doesn’t prevent two servers from talking both C2S and S2S to each other, if they have different kinds of relationships between different pairs between their actors. C2S and S2S are two tools that aren’t mutually exclusive.
I don’t think I am doing well communicating how this plays out, so I apologize in advance for the coming verbosity:
User B on server B is leveraging server B, which is acting as an ActivityPub C2S client, to server A, which is acting an ActivityPub C2S server initially. User C on server C is likewise analogue to the B’s. If servers B and servers C both submit C2S-Activity B and C2S-Activity C that conflict at the same time, server A gets to be the authority that brokers which is the source of truth. The C2S spec grants server A full authority into processing side effects of these Activities but has some recommendations of expected behaviors. So server A is within its right to pick one (let’s say B) and not the other one (let’s say C).
I won’t address how server A informs server C of the failure now. I think it is solvable though.
Now that server A has accepted a proposed C2S-Activity-B from the actor represented by user B on server B, it can then do the conversion and convert the C2S-Activity-B to a S2S-Activity-A and then federate it to whom it needs to go to. Which could very well also include the actor represented by user B on server B!
Server A is simple, centralized, and authoritative and doesn’t federate rejected-C2S-Activity-C which could cause the race-condition push notification problems.
You bring up a great point, which is:
This I view as a separate but now-we’re-getting-somewhere kind of problem, the original one about separation of business logic and ownership is solved nicely, and we can tackle the next rough edge. So I am happy you’ve brought this up. I hope you don’t mind if I now digress from your original topic to this problem.
The root of the new problem is: how does one get server A to be the linked-data authority for an Activity, but the end-user authority is delegated from server B?
The key insight I’ve mused to myself is the bolding of the above Activity: it is Server A who has the actor that is federating an S2S-Activity-A which is owned by Server A. So it is server A’s actor federating server A’s Activity. The C2S has already solved half the problem: the linked-data authority being server A. So now the question becomes a familiar one to the fediverse: how to delegate authorization that server A is acting on behalf of server B?
I personally would not re-invent the wheel to this problem, and instead will attempt leverage what better minds have created: OAuth 2.
The first time user B on server B does a C2S-Activity-B to server A, the user B can go through the OAuth delegation flow to grant server A a delegated token for user B. Once this is done, the following flow is then ready to take place (which is not standard ActivityPub but generally preserves both existing C2S and S2S functionality, and only inserts a new step in between; then again anything security is non-standard and cutting edge):
Server A gets a C2S-Activity-B from user B
Server A processes the C2S-Activity-B to create S2S-Activity-A, but does not yet federate it
Server A looks up delegation token for user B (if it wasn’t granted one, fail this process)
Server A contacts Server B at an arbitrary endpoint, providing the not-yet-federated S2S-Activity-A, signature of user A for it, user B, and the delegation token for user B
Server B can verify that (and should fail if any of these are false):
The signature for user A matches the S2S-Activity-A payload
The delegation token for user B hasn’t been revoked and is the correct one
Server B then signs S2S-Activity-A (omitting user A’s credentials) with user B’s credentials. It responds to Server A with this signed S2S-Activity-A.
Server A now holds an S2S-Activity-A whose linked-data authority is tied to an actor A on server A but whose end-user-identity is user B on server B without knowing user B’s private credentials!
Server A then federates this doubly-signed S2S-Activity-A, and others can verify both the linked-data authority/integrity (actor A on server A’s signature is valid) and the end-user delegation authorization (user B’s signature is valid).
This isn’t the simplest flow, and is the current idea I’ve been stuck on for a while, so I’m all ears on feedback. I’m sure there’s alternate and simpler flows using OAuth 2 (ex: I’m not familiar with OpenID so I could be reinventing the wheel), so someone else experienced in this can probably tear apart the above proposal.
Another problem is that neither LD-Sigs (too linked-data-y) nor OCAP-LD (these signatures aren’t for capabilities on the Activity being federated) feel like they perfectly nail this use case, though are in the right ballpark.
Also, for the general record: I’m beginning to believe that the core of every ActivityPub problem is a security problem.
I understood the whole process, it took me a while
I see your point about the server working as a client, I think it is possible but I’m not sure if it is a “good interpretation” of the specs. I also see the double signed activity to guarantee the security and to avoid spoofing attacks. It is really complex but it seems like it could work.
I’m not dening that all of this would be technically possible. However, my current thought is “Is worth?” I mean: “value / effort” equation. We are trying to follow a standard as close as possible so other applications could interact with our users. We found some limitations in the protocol for our use cases, so following the same philosophy of the standard, we extend (in fact only you the protocol to admit new cases, but it isn’t the standard after all. Then, if a developer wanted to integrate “some” of our custom messages with her application, he has to implement all non-standard stuff we are doing (and this stuff is a complex one in my opinion).
Right now, I’m checking how our app could integrate with Mastodon. Mastodon is much “simpler” than our app in business logic. I’m still investigating how to do it, but it seems like we have to change the behaviour of our app because to show a note we need a custom context. If we receive a “Note” without a context it won’t be shown in the app. If we receive a “Note” with a context whose type is not a “MoodleNet:Community” or “MoodleNet:Collection” it won’t be shown in the app. It also seems like we have to implement webfinger protocol to allow Mastodon users to follow MoodleNet actors.
So my point is, even if Mastodon and MoodleNet speak the same protocol, we find some difficulties to integrate with each other, and “Note” is the most simple entity I can imagine!. So if we are finding difficulties to connect two apps with similar and simple functionality (about conversation I mean), I cannot imagine to integrate with complex stuff with custom and non-standard parts I feel like it is very unlikely to happen.
Of course, I’m talking about from a “Product” perspective. Anyone is free to experiment, to learn and to create new things that can become new “standard”.
When I originally posted my question, I was expecting to find a easy and fast solution that maybe I missed, but it seems like the only solution is adding more complexity to the project.
Yep, I don’t think there’s going to be an easy, clean win. You’re right about the federation interoperability problem.
Strategically, the good news is that if you can solve it for the general use case, it’s a compelling feature for existing software to adopt because it will enable interoperability with a whole new class of applications (beginning with both yours and mine) for the entire Fediverse. If you try to find a quick solution your specific use case and targeting one particular piece of software for interoperability, then any changes you demand for the target software to suit your particular use case as your feature requirements grow and they grow may not necessarily be as compelling for them to implement to maintain the compatibility you’re targeting.
I’ve been thinking on this more and I think the best and simplest route is to do the typical S2S federation of events (Add in this case) and design the implementation to handle the asynchronous-update of an external resource, and not design it to do failing use cases like too-optimistic push notifications and avoid local caching entirely. And revisiting these design choices later if they wind up being problematic.
I think that’s the path I may follow, as it requires no new functionality.
Server B cannot modify the Activity created by A: no metadata or other contextual information available to B can be used to create the Activity A due to the signing
This cannot be mitigated, at all
I think your intention w/ 3 requires synchronous processing (or is async reply B->A OK?)
Discovering how the id needs to be generated in B is a problem
Without discovery, opens a new dimension for interoperability incompatibility
Without discovery, others wanting to support this feature must interop-by-convention versus interop-by-general-spec
Is not a clean addition to either AP S2S nor C2S
Could easily be mistaken as an extension of S2S, but receiving Activity A via the normal “inbox” versus this method does NOT mean they should be processed by the same business logic.
Is not AP’s C2S protocol but could also easily be mistaken as such, since the id needs to be preserved but it doesn’t belong to the originating server.
This adds a “third concept” to side effect logic, in addition to the existing C2S and S2S ones.
The other approach I outlined:
Server B can modify the Activity A, adding contextual information and metadata
A is still given the opportunity to introspect before signing to make sure B is not proposing to leak data (ex: applying a whitelist of properties)
Can either be synchronously processed or asynchronously processed by B
B can still tell A it’s “done” via S2S
Leverages the existing S2S specification as-is
Only puts new side-effects into the C2S concept without introducing a “third concept”
Is a clean superset addition to the C2S protocol
Due to using C2S, A doesn’t care about B’s id generation at all
Client must have a generic key-signing endpoint
Can be reused for future innovations and is not specifically tied to this concept
The superset is adding an intermediary step of getting the final activity signed before doing outgoing federation delivery
Side effect: Also a giant step in the direction of having C2S client keys reside on the client (currently, ActivityPub requires C2S client keys to be stored on the server)
Not sure how big this is, but a pretty cool property to have.
With the downsides:
Requires two roundtrips (A->B C2S, B->A->B signing, B->A S2S)
Is more complex to explain
But is still pretty straightforward to implement: an intermediary step to C2S.
ActivityPub C2S is modified to require the client supporting a key-signing endpoint, which eliminates certain kinds of clients.
Non-problem if one only cares about other clients being servers and not browsers.
Non-problem if the server supports backwards compatibility (as this solution permits)
Discovering the key-signing endpoint is a problem
However, mitigated by leveraging prior solution with the endpoints property on actors.
I tend to value:
Solutions that promise adoption
Solutions that build on other solutions
Solutions that permit backwards compatibility
Solutions that are rigorously defined (I feel this could be hammered out in a several-page doc as an extension to the C2S spec)
Solutions that have future promise (C2S keys living client side instead of server side? Maybe it’ll be a big deal)
Let machines do the work
2 roundtrips for me is not a problem versus 1: I’m not trying to optimize for network congestion
From my value system and my biased perspective, I’m in favor of what I outlined earlier.
To address some of your comparisons:
I view it the opposite: It is dangerous to mix existing normal-federated-S2S-side-effect Activities coming from Mastodon, Pleroma, PixelFed, etc into the flow you proposed. I view your proposal as creating a third category of side effects, in addition to the AP-defined “C2S” and “S2S” side effects.
I am also contrarian. I don’t view any these as complicated to implement: C2S when S2S is already done, adding a signing endpoint, and calling a signing endpoint.
I am not sure how
In my opinion this is an insidious bug waiting to happen. The presence of data at a URI on server B is not equivalent to server B having fully processed the data. Another implementation can come along and optimistically host the data in advance of its async queue finishing processing and delivering (via S2S) the activity. It is better to wait for the S2S delivery (in either solution) and assume it hasn’t occurred until then. (Note: ActivityPub C2S has an idempotency problem in general, neither solution here is solving it)
I’m not sure what “being atomic” is solving, honestly. [As a tangent, most implementations are asynchronously processing side effects or, failing that, are merely asynchronously processing S2S delivery instead of failing a whole request due to an unreachable peer]. Feedback on a personal level, things like this have been giving me the vibe that it seems like you are very much fixated on a notification problem and are constantly adding an implicit requirement of “I need to be guaranteed that this happened (maybe synchronously)”. I addressed this above and I’ll repeat here: simply have an outgoing notification work just like the incoming notifications: leverage the incoming S2S activities to mark completion. Maybe it requires some additional application logic to do the bookkeeping properly since the user experience is a full-page spinner instead of a notification dot, but forcing these kinds of details onto the Fediverse is not going to go well. Other devs that want to interop will look at that detail as a restriction, not a freedom, and harm the chances of adoption. And I fear that hard work will be just for one small corner of the Fediverse. I intend this in good faith , it’s coming from a guy whose AP implementation is scantly used compared to the big players .
My last proposal focused on the “user experience” and the simplification of the process. I didn’t explain it in much detail, but I’ll try to do it now.
You’re right. When a client wants to create a new resource in an external server, I want to give her a response, even if it is an error, in the same request. I think this is basic for a good user experience. With only one interaction between servers is much easier to provide this functionality.
User B does the request to server B, in the same process the server B makes the request to server A. If there is a timeout or an error, it just returns the error to user B. If the response is the signed activity, we can ensure the process will be finished.
However, in your flow AFAIK, server B should wait for a new request from server A. This means the request of server A and the request of user B will be in different processes. The communication between different process is always harder than a single process.
I don’t know if I’m giving too much importance to the complexity of your workflow, or I didn’t understand correctly, or maybe you didn’t think about all the implications:
Should each server store the temporal data between the requests?
What would happen if there is a timeout?
How to handle the specific business logic? Should the server A reserve the resource slot in the first request or should it wait for the second request?
What should the response be if the server receives a new activity with fields the server doesn’t want?
Not having to deal with these kinds of issues I mean when I say:
Easier to recover
Simpler to implement
Because you don’t have to deal with intermediate states, with more than one request flow, with more timeouts, etc.
Anyway, I think it is a lot of work in our project–any of the two solutions–for so little reward:
I want you to deliver a good user experience too. I agree that one request and response to the users client is easy and has all the benefits you describe.
On the other hand, I think applying a specific application’s client-to-server REST API towards a generic server-to-server (an evolution of a federation protocol) is a mistake. Example done right: When a user clicks “Follow” in Pleroma, the client browser app doesn’t hang and wait for a confirm/deny from the server because they want to do it all in one synchronous round trip: it progresses whenever the server asynchronously gets an “Accept” received. It could be a synchronous UI-to-server and asynchronous backend, But for that use case it wouldn’t make sense to stare at a UI spinner. This probably doesn’t apply to your use case though. I won’t claim asynchronous processing is easy but lots of languages, frameworks, and apps make it tractable. I also don’t know the state of your specific application so I don’t have any concrete suggestions.
I think we’ve brainstormed your original generic question to come up with general purpose answers. If none of them work for how you want your specific application to function because of speed-of-development, incompatibility with existing implementation, and a synchronous requirement, that’s fine. I can’t argue against those.
I will be trying some of our general ideas out in the future, and I hope others will adopt it.
Edit: Reflecting more, I guess I’m of the opinion it is actually harder and more error prone to do things synchronously in a dynamic, asynchronous, federating world.