Skip to main content


Another example of how the flexibility in #ActivityPub is so high it makes it hard to work with: parser edition. Again, this is me thinking out loud while I figure out how to solve problems in my own project.

For this, we'll look at Collection and OrderedCollection.

Collection is an object to wrap, as one might expect, collections of objects.

It's designed to allow paging through collections, which is a hard problem in and of itself, using a forward and backwards cursor.

1/

In addition to all of the properties on Object (why does my collection have a physical location?), Collection has the following five fields:

totalItems | current | first | last | items

current, first, and last are generally supposed to refer to collection pages.

Collection pages are a type of Collection that also have the following fields:

partOf | next | prev

2/

#ActivityPub

OrderedCollection is similar, but because "items" is defined as being "unordered" in the JSON-LD @'context they need to override that with a different name. So they introduce orderedItems and put an index on the page (which I have opinions about, but that's a story for another time).

We've already gone into how we have problems with the fact that next and prev can be objects, Link objects (which are different), or URIs. That isn't what I want to focus on here.

3/

#ActivityPub

Instead, I want to ask the following question:

Given a typical JSON parser in most mainstream language, where we construct the representation of the object declaratively (throw a dart: Jackson, Gson, Yojson, literally anything based on JSON Schema or OpenAPI, etc), how much work should be expected in order to extract the items from the (potentially paginated) collection and put them in whatever the local equivalent of a List or a Set is?

4/

#ActivityPub

This entry was edited (9 months ago)

So first of all, the Collection itself is completely untyped. I can't hand a collection to you and expect you to know what's inside of it.

What can be inside are any mixture of:

* URI references
* Objects, which can be heterogeneous
* Link objects

So we're probably extracting it into a List<Object> in our first pass, though some modern serializers know how to work around this, self-describing types are relatively rare and mostly these sorts of tools prefer a little more.

5/

#ActivityPub

But that's not what we're focusing on here.

Here instead you need to first check the _items_ to see if it is populated. If it isn't then you need to check the _current page_ and see if this collection is a paged collection.

If items is populated you are done, except wait, in an ordered collection it can be orderedItems

This means that, instead of having OrderedCollection be a subtype of Collection (OrderedCollection <: Collection), in most of these tools it needs a oneof

6/

#ActivityPub

Or separate classes entirely with different definition mapping to the same values.

Not the end of the world, but a touch annoying, and not something you can do trivially in one throw of the parser. You're adding a fair bit of manual work to do this in each case.

But that current object…

It can be an object or a URI or a Link object. Once I follow it I get a collection page…

But a collection page is also a type of collection.

7/

#ActivityPub

So the collection page object can also be… paginated?

It's a B-tree?

Also because OrderedCollection is defined as a type of Collection, you could presumably have your Collection object return OrderedCollectionPages, or even a mix of Collection and OrderedCollectionPages.

Which means you're back to checking for orderedItems after all, even when you think you have a "collection" object, as soon as you deserialize the page.

Back up to the Collection itself.

8/

#ActivityPub

Remember we're working in a declarative system, for the most part.

So how do we tell it "sometimes this object may have its items in items, sometimes it may have it in current.items, sometimes it may have it in resolve(current).items, and sometimes it may have it in resolve(current.href).items"?

Oh, items might be a set of objects, a set of URIs, or a set of Links.

None of the major systems are built for this sort of thing, and building in that logic is a _lot_ of work.

9/

#ActivityPub

You can't even generally generate the code.

If it were consistent:

If next/prev and current/first/last were always URIs

if the information was aways on a page, perhaps if the collection was of type _CollectionPage_ rather than having CollectionPage be of type _Collection_.

If it disallowed Link objects.

Then I could easily build a pagination system around it where:

var page = loads(json)

var next = resolve(page.next)

var items = page.items

and we're done.

10/

#ActivityPub

Instead, I now need to teach my JSON parser about how to navigate the object and how to resolve a URI—breaking the abstraction layer—or I need to do some rather clever (clever being used here as a negative epithet) and complicated—to get it to navigate all of these aspects.

Oh, and then I need test cases.

So.

So.

Many test cases.

11/11

#ActivityPub

⇧