ForgeFed - Project hosting federation


#1

Hi people! I’d like to share with you the work I’ve been doing related to project hosting. I haven’t written about it anywhere yet, so I guess this is also sort of an announcement. And you’re welcome to spread the word :slight_smile: Since there’s no ForgeFed category at the moment, I’m posting in the ActivityPub category.

Some history

Long time ago, the software freedom movement started. And then came more ideas:

  • “Free software, free tools” - this is the idea that you don’t just make free software, you also use free software for that process. Many many years ago this was difficult, but for a very long time we’ve had free software development tools. Sadly, there are still some popular proprietary limited tools, such as github.
  • Software freedom when using software over the network - this is the idea that if you use some software as a service that runs on another computer, you still depend on this software and the freedoms you have with respect to it are important. The AGPL license was created, which allows you to write and publish free software service applications and require that derived works aren’t only free software, but they also provide the source such that when you access them even over the network, you’re using free software. The AGPL seems common in the free-sharing part of the community, but companies and profit-first entities have been slow to adopt this idea. For example, GitLab CE is free software, under a “permissive” license, and there’s the proprietary GitLab EE. And the popular gitlab.com server runs the latter, proprietary version.
  • Decentralization: Instead of one entity running a service, there may be multiple instances of the service, allowing not to depend on a single provider. And the different instances may be communicating with each other. This is something we barely have. GitLab, githu8 etc. do allow login using accounts on other services, but you still get a separate user account and there’s no full transparent remote collaboration support or anything close to that.
  • Federation: Different servers, possibly running different software, communicate with each other, forming a single network where users on different instances can transparently communicate and collaborate. We have this sort of thing for some social networking and blogging applications, that’s what we call the Fediverse. Servers communicating with OStatus and ActivityPub. There’s also plain old email, in which people send and receive messages across servers. But we don’t have project hosting support in the fediverse.

A few years ago, I started writing a project hosting platform called Vervis. And I always wanted it to federate! In 2017 there was a series of discussion among the Peers Community about the possibility of replacing NotABug instance of Gogs with a new federated forge system. That resulted in this NotABug-2.0 design document which outlines the motivation for and the essential features of such a system, without specifying any implementation details.

ForgeFed is a protocol and specification in development, for federating project and version control repository hosting. It started as a repository on githu8 a few days after githu8 announced their sale. This gained traction very quickly and a work-group formed soon afterward. In order to be maximally accessible, a mailing list was started for discussions, followed by a sister repository at NotABug.org (which runs free software Gogs). ForgeFed was chosen as the project name, CC0 as the license of the work, and ActivityPub as the base protocol for federation. (Note: unfortunately the last vote for licensing was unclear and more-so unfortunately the githu8 repo has a no-derivatives license before anyone voted. I think we need another vote to make a final decision; either way all my own work has been CC0, and I must say I prefer consensus-based decisions over majority-wins votes)

During the last few months I’ve been working on research and implementation of the federation, based on ActivityPub and related technologies into Vervis. And things are going really well!

Since all that discussion, what’s been happening with ForgeFed? I’m here to tell you :slight_smile:

Current status

  • Vervis has a somewhat confusing and not-exactly-shiny UI, but it’s been self-hosting its own code and issues etc. and I’m implementing federation on it to build the ForgeFed protocol
  • It’s based on how ActivityPub is used on the Fediverse, with some changes and extras
  • Soon I’ll start writing a specification of how federation works
  • Forming a standard vocabulary will require collaboration of the existing forges, such as GitLab, Gitea and so on, and I suppose also the proprietary ones like githu8 if they decide to participate
  • I hope my specification draft and my implementation in Vervis will attract interest and collaboration from forges, developers and communities, both established and new, so that we can refine the draft and the concepts into a polished protocol and get it implemented in forges and turn all the many many lonely islands into a single big federated network.

How to participate

Soon there will be 2-3 Vervis instances running my federation demo, and I’ll announce it and invite everyone to test it.

Vervis doesn’t have a good UI, and there’s no client program of any kind. Any help with that would be highly appreciated, but there’s something much more important: Motivated people willing to work with me and the community on implementing ForgeFed in other forges! Such as GitLab, Gitea, Gogs, Pagure and so on.

Places for discussion:

  • The Vervis ticket tracker, here
  • The ForgeFed development repo at NotABug.org, here
  • The SocialHub forum, here
  • Freenode IRC channels #peers and #vervis
  • All the usual federation related IRC channels on Freenode (such as #feneas) and on W3C IRC (such as #social)
  • The Matrix rooms corresponding to these IRC channels :slight_smile:

There’s also the mailing list, but I suggest we use SocialHub instead and keep the mailing list just for the archive.

Technically there’s also the repo on githu8, but I’d like to recommend and ask you not to use it (githu8 is proprietary and centralized, among other reasons, both technical and political I’m afraid), and switch all new work and discussion to the resources I linked above.

Important clarification

My work isn’t anything officially voted or chosen, and I don’t represent anyone except for myself. But I’ve been working on project federation and I hope we can all work together in the community and make it happen!!! :slight_smile:


#2

someone opened an issue on github earlier today asking about progress - maybe i could just point them to this post


#3

really excited to see progress on this! I was just posting about the lack of activity in the mailing list on mastodon the other day.


#4

as for the mailing list, the discussions really did run their course there - there were few stones left unturned other than that pesky licensing issue - it was pretty clear even years ago, what the feature set would need to be

mostly what needed to happen next was to get some reference implementation ready in order to see what is entailed by passing AP messages back and forth, then to codify the vocabulary spec - fr33domlover is getting confident that vervis is nearly ready to be the needed reference implementation, and wanted to plant some early seeds, in order to collect a bit of wider attention than what initially came from github, especially in the “fediverse” zone

the only point of real urgency being that the UI of vervis is still very unpolished in appearance, and we would not want that to dissuade anyone away from the interesting technical nature of the initial demo; hence the embedded request for designers to get involved now before the initial demo


#6

Whoa, very cool! How will the protocol account for Git’s inherently federated nature? I guess what I mean is, will git changes pushed to one node get propagated to the repositories on other nodes, and if so how will the protocol handle merge conflicts?


#7

@jdormit, excellent question!!!

There are 2 different aspects here:

  1. Federation of resource manipulation and access
  2. Decentralization/distribution of content storage

These 2 things are related, but basically the approaches for them are independent. Right now I’m focusing on the 1st part. The 2nd part is about how to store content: Locally on the server’s DB? Or in some distributed hash table (DHT)? Blockchain? IPFS? I2P? Dat? Torrent? There is room for research but for now I’m just assuming each server stores its repos and issues and MRs etc. locally on the filesystem and database, the regular common way.

So if you create a repo, it lives on one server: Your home server on which you created it. You can add team members to your repo and give them push access, including people from other instances. When those people git push to your repo on your home instance, your server determines whether they have push permission by securely exchanging information (such as the SSH public key, team member role permissions etc.) as necessary with the instance of the person pushing.

When I publish the initial spec draft I’ll explain there exactly how my initial suggestion and implementation for this works :slight_smile:


#8

Hey fr33domlover,

Thanks for the update! This is exciting. I would be interested in the detailed specification you come up with for ForgeFed:

  1. The types and properties in the ForgeFed ActivityStreams Vocabulary extension
  2. Any side effect behaviors for the ActivityPub Federation API (S2S) and Social API (C2s)

I am especially interested in the first one is because it would allow me to create an OWL definition of the ForgeFed ActivityStreams extension (like the ActivityStreams one here or the example fake extension here). This definition should be sufficient to be the JSON-LD object returned for the ForgeFed @context IRI. It would also allow the go-fed project to autogenerate the ForgeFed types, which would allow Go projects like Gogs and Gitea to readily adopt ActivityPub via go-fed if they so choose.

Looking forward to the draft!


#9

Hi @_cj!

The vocabulary keeps changing, and it’s too early to write it (I just add terms in my code whenever I need them), but I’ve been having the following idea. Let me know what you think.

I feel like JSON-LD is useful in software written in JS but very painful to almost everyone else. I’ve been considering not to use any JSON-LD context features in ForgeFed, and if anyone wants compacted JSON-LD they can do the compaction by themselves.

When I say not use context features I mean:

  • Properties are full URIs, not terms or compact etc.
  • No language maps, no index maps, no reverse properties
  • Perhaps also no usage of AS2 terms anywhere that is not required by the AP spec, use full URIs whenever possible

I have written a JSON-LD library in Haskell, its not finished but it’s been a horrible experience and I really really wish to steer us away from JSON-LD as much as possible.

This is just a thought, not a decision. What do you think? :slight_smile:

By the way, parsing and writing JSON ActivityPub objects, even with full URIs and so on, is a trivial task in Haskell and I imagine in most programming languages. I don’t think anyone should have to deal with JSON-LD if it’s just pain. JS developers who want JSON-LD can use the JS implementation and everyone else can just use plain JSON, or RDF triples.


#10

@fr33domlover Thanks for all your work on this. Is this your current ForgeFed implementation?

I’m getting ahead of where things are currently, but it might be nice to have a test suite like AP has for implementers. https://test.activitypub.rocks/


#11

@fr33domlover I agree with you 100%. I think you’re misunderstanding my proposal. I’m not advocating for bringing JSON-LD to life and forcing folks to do the linked data paradigm.

I’m simply suggesting something like:

{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    {
      "ff": "https://example.com/forgefed",
    }
  ],
  "id": "https://example.com/some/id/123",
  "type": "ff:ForgeFedType",
  "ff:ForgeFedProperty": 42
}

Where the base URI is aliased in @context as a convention. This:

  • Keeps properties and types human-readable without digesting long IRIs
  • Frees you from name collisions in the original AS spec
  • Allows other impls to ignore JSON-LD and pattern-match on the well-known and legal <vocab alias>:<vocab ontology member> and not collide with types/properties in other specs.
  • Remain JSON-LD compatible in spirit, but not require others to do any JSON-LD things.
  • Can still host the ontology definition at https://example.org/forgefed

This final bullet point I am merely suggesting be an OWL definition, so I can rapidly ingest the spec as data to let impls do the go-fed non-linked-data thing more quickly. That’s it, no additional linked data stuff is being suggested by me. As someone who is now on their third code generator around this linked data stuff, I am fully aware of the pain in dealing with linked data without being a linked data solution.


#12

Hey @jmwright,

That’s just one of the source files :slight_smile: There’s no clear documentation of the protocol in 1 place yet. I’ll start writing one soon! If you look at the log of that repo though, you can see where I’ve been making changes.

A test suite is a wonderful idea! It’s indeed early, but I’d love to have such a test suite in place when the time comes :slight_smile:

@_cj, oh I see! Yeah, a minimal context item with a prefix to use with all ForgeFed terms would be a really nice solution. When I make a list of properties and classes etc. I’ll announce and we can create an OWL ontology from it :slight_smile:


#13

@fr33domlover The ForgeFed protocol will be an extension of ActivityPub/ActivityStream, correct? So if someone starts with an AP implementation, the ForgeFed parts can be added on later?


#14

@jmwright Yes, indeed! It’s an extension. Currently, some of my changes are compatibility-breaking, and I’d like to suggest that these changes are discussed on the fediverse and perhaps adopted by the various fediverse server applications (most notably Mastodon). But otherwise, yeah it’s an extension and you can safely start with plain AP and later add ForgeFed support.

@_cj I forgot to refer to the side effects point! There are going to be some new mechanisms not present in AP itself, such as possibly adding a new activity type, or otherwise a new kind of behavior that doesn’t work the regular way you self-publish a Note on the Fediverse. The new behaviors are quite simple, I just mean to say that they do exist. For example, whenever you wish to do some editing of resources on another server, for which you’ve been granted explicit permission, your instance will have to keep a capability grant for you and send it along with your activities whenever needed. Also some activities will be instructions for posting content to another server, where an actor on that other server becomes the “owner” of the content. It sounds like a Create, but behaves more like when you request to follow someone and need to wait for them to Accept. Idk yet if this will be a new activity type, but either way it’s new behavior. I’ll document the precise behavior soon :slight_smile:


#15

Looking forward to following this work mature :v: Thanks for pushing forwards.

Personally I hope the spec will focus on purely transport side, just like ActivityPub proper does. Storage etc is really more of an implementation concern for the platform implementing federation. ActivityPub just defines how to interact between a server2server or client2server, but not what the server or client look like internally.


#16

@fr33domlover Will ForgeFed support the sharing of projects/repos across instances, the same way PeerTube allows one instance to display the videos of other instances it’s federated with? As a user I might want to be able to browse all of the projects local to that instance, but I might also want to browse all the projects available on all federated instances too.


#17

@jmwright, before I directly answer, I’d like to observe the various reasons that content is duplicated across servers and across clients. There are at least the following reasons:

  1. Content storage distribution: Instead of storing some data in one place, and depending on that place to always be available to serve the data, we store the same data in multiple places, and that way, it remains available even if some servers go down.
  2. Content network access stability: If we store something on one server, and suddenly it gets very popular, that server may fail to handle the work load of requests. So, we distribute the access mechanism, allowing to share and download using multiple peer hosts, all in parallel, so that when many people download, also many people upload, and as a network we’re able to handle high loads by sharing the work of delivering the content.
  3. Caching: To save time and network delivery, when we download something from another server, we may store it locally on our database, and next time use our copy instead of having to contact that remote server every single time. If the content we’re accessing changes rarely, or never changes, then we save time and reduce network load (at the cost of using more DB hard drive space) by caching remote content, holding local copies of it.

About which of these 3 uses (or maybe some other use I haven’t thought of) are you asking?

I’ll answer about all 3 :slight_smile:

  1. At least for now, my focus is on only on the transmission of events across servers. In other words, I’m generally assuming repos have a single HTTP(S) URI on one server and that’s where everyone goes to clone them and push to them. Duplicating content across servers for resilience is interesting to me and I hope to explore it, but for now in the most basic ForgeFed protocol, I’m not touching this aspect yet.
  2. Same thing, for now I’m assuming no content duplication mechanisms.
  3. In Mastodon and other microblogging servers and fediverse projects, you can see a federated timeline, containing all the messages the server can see. I suppose in ForgeFed servers, you can see the same thing: A stream of all the events the server sees, such as tickets and merge requests etc. opened and closed and updated, including some that occur on other servers.

When using a client program, whether it’s in a web browser or a native GUI or a terminal-based client, you can browse content stored on your server, and you can browse content stored on other servers. Where exactly physically that content comes from - your home instance or other instances - it doesn’t matter to your user experience and user interface. It’s an optimization that clients and servers do. For example maybe your client program downloads some video by HTTP, or maybe it downloads it using torrents, or WebTorrent, or maybe it gets it from IPFS or I2P or Dat or TahoeLAFS or idk what else. The point is, you can still browse everything, it’s just a question of how things work behind the scenes.

So, for now, things will be duplicated only for caching, just like it generally happens on the Fediverse. Stuff like what Peertube does, i.e. use distributed content sharing mechanisms, is interesting to do with repos and projects, but it’s a future research thing that I’m not exploring yet. I think @cwebber is working on it, actually, exploring distributed storage for the fediverse, and I’m curious what he comes up with and happy to adopt his ideas and use them for protecting repos and projects against the problems that may arise with single-location storage.

Did I answer your question? :slight_smile:


#18

I think so, thanks.

So my instance would pull repo/project listings from another instance’s VCS server rather than getting that information through ForgeFed activities/objects (whatever the correct terminology is)? If that’s correct, I think I understand. ForgeFed would be for things like issues, comments, merge requests, etc instead of the repo listings themselves, unless maybe a user you were following created a new repo. Then you might get a notification for that.


#19

@jmwright, ForgeFed is for all aspects of project and repo collaboration :slight_smile: What do you mean by repo/project listings?


#20

@fr33domlover Lets say I have two servers (“instances” in Mastodon terms), Server A and Server B, and each has their own local projects.

Server A

  • Project 1
  • Project 2

Server B

  • Project 3
  • Project 4

If the two servers are federated with one another, I would expect to see the following listings on each server when requesting the federated projects.

Server A

  • Project 1
  • Project 2
  • Project 3
  • Project 4

Server B

  • Project 1
  • Project 2
  • Project 3
  • Project 4

At this point I’m not concerned about where the data is stored, just that I get a view of all the projects available across both of the servers, whether I’m on Server A or Server B. There would probably be a “Local” filter to show just my instance’s projects.

Will the sharing of repo listings be part of the ForgeFed spec, or will the git (or whatever) server need to be interrogated directly?

I hope that’s making some sense.


#21

@jmwright, oh I see! As usual, let’s break it up into parts, the features involved in that federated project view.

Let’s start with how the social/blog/toot Fediverse works. There is user search, but as far as I’m aware, at least on Mastodon (which is a big part of the network) there is no global user search by local username (only search for users that your server is aware of, on instances it’s aware of, which can be really good but technically it’s not full global search). Also, there is no post search: You can filter your timeline by tag, but you can’t get a list of all known public posts containing a given word. And that’s an intentional choice, putting the focus on the social interactions and not on data collection or curation.

Here are some possible features related to listing projects and repositories, and my thoughts about them:

  • Listing the local public projects hosted on a given server: This should probably be available just like it already is available these days on forge websites such as GitLab and so on.
  • Listing the public projects known to a given server: This can work exactly the same way federated timeline view already works on the Fediverse. It doesn’t show you the whole network, only the part that your server sees.
  • Listing all public projects on the whole network, and doing project search over the whole network: As far as I’m aware, global search and global indexing like this is something we don’t have at this point. But I suppose it could be useful, and I’d like to look into it. One way to do this is to have a single server that collects the project lists from all other servers, but that’s centralized. Another way is to use a distributed mechanism, perhaps some form of DHT, to collaboratively maintain a global index of projects. I haven’t gotten to implementing or defining this sort of thing yet, but I definitely want to look into it, and feedback is welcome :slight_smile: