Introducing sGraph: social graph protocol
Today’s multitude of different social platforms puts an unnecessary burden on users who are currently forced to maintain (e.g. follow, invite, block, reward their followers or subscribers) their own networks-within-networks. These sub-networks could be represented as graphs.
sGraph was originally intended to alleviate this burden by introducing a standard protocol for storing and searching social graphs. However, sGraph also might be used for other use cases like on-chain reputation.
sGraph is a permissionless open social graph that enables any relation or event to become a shared record across the web.
The protocol was designed to enable transferability of data associated with identities, i.e. to enable internet users to plug their existing social (or any other) network data into other platforms supporting sGraph.
For example: Alice follows Bob on Platform X and Bob flagged Charlie on Platform Y. Instead of storing those relations exclusively on the platforms, sGraph allows sharing these relations on-chain so that both Platform X and Platform Y, or any other apps or users may benefit from that data.
While various different graph technologies and protocols exist, there isn't a single technology which could combine them together into a single universal public registry that is:
- Permissionless – anyone can become an sGraph data provider.
- Trustless – the users (data consumers) decide for themselves which data providers they trust.
- Composite primitives – anyone can create their own types or subtypes of relations and events (follow, subscribe, flag, new content object, notification, etc).
- Space efficient – sGraph is able to store billions of records at a very low cost.
sGraph supports two types of records:
- Address-to-address: used for relations (following, subscription, block, etc)
- Address-to-hash: used for events (user created a post, started campaign, etc)
In order to consume as little storage space as possible and make blockchain writes almost free, sGraph uses Concurrent Merkle Trees  proposed in late 2022. Instead of actual data (relations, etc), only hashes are stored in the registry (“on-chain”).
A merkle tree is a hash based data structure that encodes data into a tree. The tree has nodes that are hashes of its children and each leaf node is a hash of the data.
The binary Merkle tree of depth 3 in the diagram below has nodes labeled as Xi, where i represents the respective node’s index in a level-order traversal of the binary tree.
Concurrent Merkle trees problem
Complications arise when considering concurrent writes (i.e. leaf replacements) to the same tree. There can be multiple requests to modify leaf nodes of a tree that use the same root node as reference. Every time a leaf node is modified, all in-flight proofs that use the same root node will be invalid.
In order to read the data it has to be indexed efficiently and off-chain. To do this, one must run the synchronizer – a dedicated binary that maintains a local copy of all the relations.
sGraph synchronizer is an optional binary that runs in a given environment (example: Docker image included) and saves the actual relations to one of the supported databases (MongoDB, PostgreSQL, SQLite, etc)
Advantages of using synchronizer
- Low latency access. RPC requests are slow. By storing relations in an on-premise database the synchronizer is able to provide fast and reliable access to relations.
- Fewer RPC calls. Allows for additional savings when choosing an RPC provider.
- More control over reliability and scale, since data is stored in your database.
- Jarry Xiao, Noah Gundotra, Austin Adams, Anatoly Yakovenko. (2022). Compressing Digital Assets with Concurrent Merkle Trees.
- Timofey Barinov, Danil Karetnikov (2023). sGraph whitepaper.
P.S. If you enjoyed this essay, share your thoughts on Twitter and tag me!© Kirill GoryunovNewsletter