Thinking About Pratical Web3.0 and GNUNet as Infrastructure

The title is gonna make people reading this from Gemini mad. Saw that a mile away. But hear me out.

I just came back from g0v hackathon and decentralizing and Web3 has been a huge topic there. Heck even the Ministry of Digital Affairs joined discussion. That got me thinking. What can Web3 really do better than existing architectures? What is the value proposition? That led me thinking about my recent dive into GNUNet and rethinking about it's capabilities.

g0v - Grassroot open collabration on public affairs

Taiwan Ministry of Digital Affairs

Being general, GNUNet is like Tor. It's another kind of darknet. But to be very specific, GNUNet is special. It's not just another anonymization layer for TCP. GNUNet comes with a lot of decentralized subsystems that one can take advantage of to build applications - an all-in-one package. GNUNet has it's own distributed hash table, file sharing, network messaging, etc.. I want to put up a idea of how we developers can use GNUNet to build decentralized applications. Under all the practical limitations we face today. Be aware that I'll be using Web3 and decentralized services interchangeably. I understand the the difference between the Web and the Internet. But everyone uses them the same these days..

To me this post is an intellectual exercise. To clear up my mind and home my vision about what "Web3" could be. And what's just hype and BS. I want to identify what value we developer could bring through such a complex technology, in a practical manner. I also think decentralized web/services need not to be related to cryptocurrency. The following ideas a popular among Web3 supporters:

Decentralized
Node/users holder their own information/state
Bad actors by design cannot or minimally affects the system
User state can be cached and fetched from other nodes. The original source need not to be online all the time.

Older Internet users might ask "wait, I heard these properties somewhere before". And you are right. The early internet does exactly that! We started out storing everything on Mainframes. Mainframes are designed to be iron solid and won't easily go down. But then, after home computing took off. People start hosting their site on their own systems. Your machine getting rekted by lightning does not mean other sites going down too. Likewise some black hat on your ISP's network does not necessary mean you loose Internet access. Pages can also be cached by proxies in case the above 2 situation happens. Is the Internet already Web 3.0?

No. Yes, the Internet is a decentralized system and meets the quiteria. But it lacks two core idea that most Web3 advocate either ignores or doesn't know they need - servicification and availability. The Internet is not just a service provided by ISPs. It's infrastructure. But look at Web3 projects today, they are services. DeFi, NFT Exchange, etc.. They are (more or less) usable by ordinary people. You don't need a degree in computer science to use them. Second, like the big iron. Services now are designed to be as available as possible. Multi-region data centers, availability zone, load balancer, you name it. But that all depends on the project owner wanting to keep the service running. Let's say I enjoy watching videos on YouTube. Google can delete YouTube from their network completely and no one can do anything about it.

Disclaimer: I hate NFT. Heck, PGP signature on a blockchain is likely better.

I think that's the two things the plan Internet is not. The plan existence of Internet does not help muggles a bit. Nor self-hosting content solves the latter problem. It helps. But if I decide to delete this blog post, it's still (likely) gone forever. And this is the 2 major benefits people are pushing in favor of Web3. I like the idea too. But how to achieve is another story. In order to not depend on a single vendor keeping a service up and running. Users of a service had to depend on themselves. In other words, P2P. Yet, this introduces a heck ton of problems:

Trust or not to trust peers
CAP theorem
Cross version communication
Latency
Selfish peers

We just traded a stinky problem into many very yucky and research-worthy problems. Some are technical and some are political in nature. Everyone wants free services. But the nature of decentralizing means everyone have to pay a little bit. Maybe not in terms of money. But bandwidth, computing power, storage. Cryotcurrency is really a miracle in this regard. By design, Proof of Work makes everyone's selfishness aligned. Being as selfish as possible is exactly what the system wants. It also provides a very good incentive to keep the transaction validation going, mining rewards. But that leads to the ton of problem we have seen up to now.

After much though. I re-categorize what Web3 ought to be.

No single point of failure
High availability of content
Anti-censorship/surveillance/trust

Funny how we ended up at the same premise of the early Internet again. You see. All 3 points I listed at the beginning are merely means to ends. I believe the definition of "services" has changed with time. Internet is not a service anymore but infrastructure. People started seeing services like Google and Netflix as a single thing and thus a single point of failure. Your Google account gets deleted? Poof, your entire life also goes away. Decentralization is also just a tool to reduce trust and failure, a very powerful one if done right. Instead of adding buzzwords like "P2P", "decentralization", "DAO" into projects. We should focus on the end goal. What's the problem such architecture can help solve? Is it a step up or it's decentralization for the sake of decentralization?

An (idea of) practical Web3 application

First we shall identify what a server-client architecture is not doing well; where even federation like Matrix and Mastodon isn't good at. And how decentralized architecture is supposed to solve. And to do that we need to first identify weakness of federation protocols. In my opinion are the following:

Users has to trust the server (E2EE can be done. But that's only possible with messaging apps)
Servers can go down but important data needs to be kept

A toy idea. How about a clone of theragang.com with broker API integration. thetagang.com is free and ad-free trade logger that also allows users to share their option trades. I use it to both record my trades (alternatives costs $15~30 a month) and see what others are doing. This is very valuable as in some cases people may know something you don't. TradeStation, the broker I use, have very strict limitation and weird API key pattern. In short, their API allows you to host a service tied to your account. You need to be a partner in order to have other accounts signed up to your service. This is in contrast with GitHub and Crypto exchange APIs. Where the API key alone allows access to your data. Anyway, we need a self-hosted, decentralized design. Everyone can host their own node it reads their trades through their own key. Then trades could be shared via DHT, peer to peer networking and so on. The user does not need to trust the service provider with their account. They have everything on their system. And like thetagang.com shared trades and comments are well, shared.

THETA GANG - a social trading platform

The hard part would be to design a scalable and decentralized post and comment system with O(log N) access to recent data. Normally there's likely is SQL database of shared trades and index on the timestamp. And maybe Redis to cache results. This is much harder in a decentralized environment. The first and most basic solution is for all nodes to publish trades to the DHT with a predefined key. They everyone listens on that key for new data. This has 2 down sides. First DoS node that happens to handle all the trade. Secondly, data disappears once timedout. We can make it slightly scalable by using hash(SYMBOL + salt) mod N for the key and listen to all possible keys. However this causes exponential more listen operations as N increases.

A better solution, I think, is that every node publishes their trades periodically as a file into a file sharing protocol. Then publish the URI for retrieval onto a DHT; volume should be a lot lower then publish individual trades. Then the node also maintains an updatable/appendable file on a file share. Every time a new trade record is published. URI is appended to the file. Ideally upon publishing trade record URI to the DHT. Other nodes should try to retrieve the record soon. Thus records are kept in a distributed manner. Then even if said node goes down and the DHT has lost the URI. Other users could still retrieve the record from other nodes. For even better data loss protection. Each node's URI list could also contain references of other node's URI list. This way discovering nodes doesn't necessarily depends on the DHT.

It solves both existing problem of trade journals being too expensive and integration of broker API for non-commercial project to be impossible. It also should provide high availability for shared past trades. Also there's no single point of failure, the overall system allows nodes to joing or leave as there's little to none node onboarding. Checking almost all marks proposed above.

The main problem would be someone spamming the DHT with fake trades. This could be solved by some PoW just so spamming is not as easy.

How GNUNet can fit

After my previous experience with GNUNet. I think it's in a good position to support decentralized applications. AGPL is a pain to deal with, but it shouldn't matter if the application is open source and distributed. GNUNet has it's R⁵N DHT and file sharing protocols that enables building such application. According to GNUNet, the R⁵N DHT is more resistant to Sybil attacks. Unlike IPFS, GNUNet file sharing supports versioning, which makes the list appending scheme possible.

The GNUnet DHT, page 19 - Experimental Results: Sybils

GNUNet also has very strong and built-in support for identity (Egos) and validation. Which I haven't spend time to look into yet. However, I can say that the same is harder to achieve with other decentralized stack. The libp2p library built by IPFS foundation. libp2p also supports DHT and p2p streams. But file sharing has to be done over IFPS. Nor it has good built in identity support.

With all that said. GNUNet is difficult to develop for. One part because it's API is GNU-style C. Which many people hate, including me. That's why I'm working on a C++ wrapper to handle lifetime and states automatically. It's not publicly available yet. But I'll announce it in a few months.

Enough rambling. Going back to hacking.

Martin Chang

Systems software, HPC, GPGPU and AI. I mostly write stupid C++ code. Sometimes does AI research. Chronic VRChat addict

I run TLGS, a major search engine on Gemini. Used by Buran by default.

martin \at clehaxze.tw
Matrix: @clehaxze:matrix.clehaxze.tw
Jami: a72b62ac04a958ca57739247aa1ed4fe0d11d2df