Introducing sGraph: social graph protocol

March 4, 2023 (1y ago) · @kgoryunov

Back

Today’s multitude of different social platforms puts an unnecessary burden on users who are currently forced to maintain (e.g. follow, invite, block, reward their followers or subscribers) their own networks-within-networks. These sub-networks could be represented as graphs.

sGraph was originally intended to alleviate this burden by introducing a standard protocol for storing and searching social graphs. However, sGraph also might be used for other use cases like on-chain reputation.

Protocol design

sGraph is a permissionless open social graph that enables any relation or event to become a shared record across the web.

Figure 1: Identities and relations
Figure 1: Identities and relations

The protocol was designed to enable transferability of data associated with identities, i.e. to enable internet users to plug their existing social (or any other) network data into other platforms supporting sGraph. 

For example: Alice follows Bob on Platform X and Bob flagged Charlie on Platform Y. Instead of storing those relations exclusively on the platforms, sGraph allows sharing these relations on-chain so that both Platform X and Platform Y, or any other apps or users may benefit from that data.

Read more about basic components of social networks and vectors of transformation.

While various different graph technologies and protocols exist, there isn't a single technology which could combine them together into a single universal public registry that is:

  1. Permissionless – anyone can become an sGraph data provider.
  2. Trustless – the users (data consumers) decide for themselves which data providers they trust.
  3. Composite primitives – anyone can create their own types or subtypes of relations and events (follow, subscribe, flag, new content object, notification, etc).
  4. Space efficient – sGraph is able to store billions of records at a very low cost.

Data model

Figure 2: Address-to-address and address-to-hash types of records
Figure 2: Address-to-address and address-to-hash types of records

sGraph supports two types of records: 

  1. Address-to-address: used for relations (following, subscription, block, etc)
  2. Address-to-hash: used for events (user created a post, started campaign, etc)
Figure 3: Data model of sGraph records
Figure 3: Data model of sGraph records

Space efficiency

In order to consume as little storage space as possible and make blockchain writes almost free, sGraph uses Concurrent Merkle Trees [1] proposed in late 2022. Instead of actual data (relations, etc), only hashes are stored in the registry (“on-chain”).

Merkle trees 

A merkle tree is a hash based data structure that encodes data into a tree. The tree has nodes that are hashes of its children and each leaf node is a hash of the data.

The binary Merkle tree of depth 3 in the diagram below has nodes labeled as Xi, where i represents the respective node’s index in a level-order traversal of the binary tree.

Figure 4: A depth 3 Merkle tree. The Merkle proof for the leaf node in cyan *X10* would consist of the orange nodes *X11, X4, X3*
Figure 4: A depth 3 Merkle tree. The Merkle proof for the leaf node in cyan X10 would consist of the orange nodes X11, X4, X3

Concurrent Merkle trees problem

Complications arise when considering concurrent writes (i.e. leaf replacements) to the same tree. There can be multiple requests to modify leaf nodes of a tree that use the same root node as reference. Every time a leaf node is modified, all in-flight proofs that use the same root node will be invalid. 

Figure 5: The cyan path and the orange path modify the same tree, but once one of the changes is locked in, the proof to the other change will be invalid.
Figure 5: The cyan path and the orange path modify the same tree, but once one of the changes is locked in, the proof to the other change will be invalid. 

Synchronizer

In order to read the data it has to be indexed efficiently and off-chain. To do this, one must run the synchronizer – a dedicated binary that maintains a local copy of all the relations.

sGraph synchronizer is an optional binary that runs in a given environment (example: Docker image included) and saves the actual relations to one of the supported databases (MongoDB, PostgreSQL, SQLite, etc)

Figure 6: Synchronizer node and sGraph architecture
Figure 6: Synchronizer node and sGraph architecture

Advantages of using synchronizer

  1. Low latency access. RPC requests are slow. By storing relations in an on-premise database the synchronizer is able to provide fast and reliable access to relations.
  2. Fewer RPC calls. Allows for additional savings when choosing an RPC provider.
  3. More control over reliability and scale, since data is stored in your database.

Implementation

sGraph protocol is available on GitHub. Protocol is in the development phase, if you would like to contribute to the protocol, please reach out at sgraph.io

References

  1. Jarry Xiao, Noah Gundotra, Austin Adams, Anatoly Yakovenko. (2022). Compressing Digital Assets with Concurrent Merkle Trees. 
  2. Timofey Barinov, Danil Karetnikov (2023). sGraph whitepaper. 

P.S. If you enjoyed this essay, share your thoughts on Twitter and tag me!