Python JSON-LD modules?


#1

Call all Pythonistas. I’m in testing phase of delivery and receive for my federation library for ActivityPub support. After that it’s mostly implementing objects and such.

Q: are there any well working JSON-LD libraries or shall I just go the way I initially was going to, ie “it’s all JSON”? :slight_smile:


#2

I did the “it’s all json” thing initially for Funkwhale, but I don’t think it’s a robust way of dealing with the issue and it can lead to problems on the long term.

Let’s take a concrete example. Those are four valid reprentation of an activity in JSON-LD (I’ve voluntarily ommited the contexts):

{
  "type": "Delete",
  "object": "https://domain/object1"
}
{
  "type": "Delete",
  "object": ["https://domain/object1", "https://domain/object2"]
}
{
  "type": "Delete",
  "object": [{"id": "https://domain/object1"}, {"id": "https://domain/object2"}]
}
{
  "type": "https://www.w3.org/ns/activitystreams#Delete",
  "object": [{"id": "https://domain/object1"}, {"id": "https://domain/object2"}]
}

Basically any attribute can be a string, a boolean, an object or a list of those, and if you want to be spec compliant, you cannot assume every software will use the same thing. And the attributes themselves can be full urls, or simple strings that are resolved againt the context. You don’t want to deal with that by hand (unless you have a lot of free time available :wink:

Fortunately, you can use https://github.com/digitalbazaar/pyld which is a json-ld processor that can help you dealing with that. This is what we now do in Funkwhale (see the related Merge Request):

Expanding

When we receive an activity, we call pyld.expand(payload) on that activity to expand it (cf spec). Given a payload such as

{
  "@context": ["https://www.w3.org/ns/activitystreams"],
  "type": "Delete",
  "object": "https://domain/object1",
  "summary": "There was a typo"
}

This will return the following payload:

{
  "@context": ["https://www.w3.org/ns/activitystreams"],
  "type": ["https://www.w3.org/ns/activitystreams#Delete"],
  "object": [{"@id": "https://domain/object1"}]
  "summary": [{"@value": "There was a typo"}]
}

This gives you predictability on what kind of payload you will be manipulating, because all the documents from my first example would expand to the same representation.

Collecting the attributes you need

Most of the time, for the purpose of your application, you’ll only need a subset of the attributes available in the document, like the type, actor and object ID. So the next step is to collect the attributes you need, by referencing the full attributes (payload["https://www.w3.org/ns/activitystreams#actor"], instead of payload['actor']). Since attributes are expanded too based on the context, this will handle cases when someone use {"@context": {"as": "https://www.w3.org/ns/activitystreams"}, "type": "as:Delete"}.

Dereferencing

Depending on your application behaviour you may also need to dereference the content of the activities. For instance, if you receive:

{
  "type": "Create",
  "actor": "http://domain/@bob",
  "object": {
    "id": "https://domain/object1",
    "type": "Audio",
    "attributedTo": {
      "id": "http://musicians.com/@alice"
      "summary": "I make pop music!"
    }
  }
}

You may be tempted to persist the artist data directly in your database (for example in an “Artist” table). However, you don’t have any guarantee that the artist representation sent by Bob actually matches the reality. If you fetch http://musicians.com/@alice, you may end up with something completely different, like:

{
  "id": "http://musicians.com/@alice",
  "summary": "I make **electro** music!"
}

That’s the kind of poisoning that could occur if you trust blindly that nested objects represented in the payload are correct.

A way to avoid that is to dereference (I think it’s the proper term) the objects you want to reuse in your application. If you know your app will need to store whatever data is in attributedTo, then you will only grab the id for the object provided in attributedTo, ignore the other attributes, and do an http request on the ID to retrieve the real, valid payload from its origin. This one you can trust (unless, of course, it has nested objects you may need to dereference too :wink:

Cleaning

At this point, you have relevant attribute/values, and dereferenced everything, so you can clean what you have, ensure it matches your internal formats/expectations, and proceed with execution :slight_smile:

Conclusion

If you do that, you’re likely end up with a more robust and secure implementation. To be fair, this landed only a few days ago in Funkwhale, we relied on pure Json parsing before that, so you can also experiment with that and switch to real json-ld support at a later point. Other people may have different ways of doing this.

If you’re interested, this is the module containing most of those functions in Funkwhale: https://dev.funkwhale.audio/funkwhale/funkwhale/blob/develop/api/funkwhale_api/federation/jsonld.py and the related tests: https://dev.funkwhale.audio/funkwhale/funkwhale/blob/develop/api/tests/federation/test_jsonld.py

One thing we do is intercepting requests on common contexts (because pyld will try to fetch those) and serve a cached version, to avoid unnecessary HTTP request.


#3

Thanks for the awesome reply @eliotberriot <3 Really helpful.

I think I’ll try the JSON-LD way. I had started to have concerns of going JSON only with the growing ecosystem creating platforms faster than I would likely be able to keep up with the small tweaks required here and there. JSON-LD is a bit of a nightmare tbh, but what can we do, it’s there and people will create those payloads which as pure JSON are not compatible but with JSON-LD are valid representations, like you highlighted.

My library is actually BSD licensed, so I’m going to have to be careful with copying code. I hope copying ideas will not get me lawsuits :smiley:

One thing we do is intercepting requests on common contexts (because pyld will try to fetch those) and serve a cached version, to avoid unnecessary HTTP request.

Thanks good tip :+1:

Again, thanks for the post.