Interview With Jens Finkhaeuser - Founder and CEO at the Interpeer Project

Shauli Zacks
Updated on: October 20, 2024 Content Editor

This article contains

In a recent interview with SafetyDetectives, Jens Finkhaeuser, founder and CEO of the Interpeer Project, shared the visionary journey behind his ambitious endeavor to rebuild the Internet stack from the ground up. With a background rooted in peer-to-peer networking from his time at Joost—an innovative video streaming startup—Finkhaeuser witnessed firsthand both the potential and pitfalls of the web’s current architecture. His passion for reimagining the Internet has led to the Interpeer Project, which focuses on creating a decentralized and resilient digital communication framework to address privacy, sovereignty, and censorship issues that plague today’s centralized systems.

Finkhaeuser’s story began with a fascination for peer-to-peer technology, shaped during the mid-2000s, and evolved through encounters with blockchain, web3, and various funding opportunities from the European Commission and Internet Society Foundation. At the core of the Interpeer Project is a drive to shift from server-dominated communication to a human-centric, peer-to-peer model that emphasizes privacy and efficiency. In the interview, Finkhaeuser discussed the technical and societal challenges his team is tackling, including group communication protocols, conflict-free replicated data types (CRDTs), and the broader implications of a more resilient Internet stack for global communities, especially in times of crisis.

Could you share the origin story behind the Interpeer Project? What inspired you to embark on this mission of reimagining the Internet stack?

The origin of the Interpeer Project lies way back in 2006 and is a little tenuously related to the project as it exists now. In that year I joined a mysterious stealth startup codenamed “The Venice Project” that wanted to bring high quality, full-content video streaming to the Internet — via a peer-to-peer network!

It’s hard to imagine these days how powerful that vision was. YouTube had launched only a year prior, and by mid-2006 was still limiting videos to two minutes in length, and the resolution was worse than you find in meme GIFs these days.

I joined the company’s the peer-to-peer engineering team. The idea was to utilize the bandwidth available from multiple peers in parallel to make video streaming cost-effective and efficient enough for the quality goals. The founders, Janus Friis and Niklas Zennström had previously disrupted markets with peer-to-peer based approaches with KaZaa and Skype, so this venture seemed promising! We re-launched as Joost to great fanfare, and delivered full-length TV episodes and movies in about digital TV quality. It was amazing!

But we also wanted to deliver live video broadcasts, and realized the limitations of the stack we were working with. So for that use case, we went back to the drawing board, until in 2008 we streamed March Madness in a public beta. It went well — not perfect, but really well, and we learned a lot from that.

Unfortunately, as startup stories often go, shortly thereafter Joost was no more. Netflix effectively took over the market, and the rest is history.

I left with an intense desire to put the lessons we learned into practice, but both due to the rather spectacular rise and fall of Joost, and because bandwidth costs dropped by an order of magnitude, not many people saw sense in investing in this kind of tech. So I pursued other things. As a hobby, I drafted architectures and protocols still, but there was no time to put into implementation.

Fast forward to 2019, and blockchain and web3 had made p2p interesting again. I tried bringing my experience into that space, but quickly realized it was of little interest. The specific requirements of high bandwidth and low latency that video streaming brings requires more control over lower levels of the networking stack than folk in that space were interested in developing.

However, merely having an interest in this kind of technology again finally allowed me to dream of Open Source funding, and this brought me to the European Commission’s Next Generation Internet initiative, which funds all manners of R&D in future Internet technologies. I applied for and received a grant via NLNet Foundation, and started fleshing out the ideas I had accumulated over the years. Another grant from the Internet Society Foundation helped me explore the overall architecture in more detail.

It had become apparent that there were significant shortcomings in the web stack regarding human rights issues, and I wanted to address them. So I started analyzing HTTP and it’s REST architecture to understand if there was a design flaw that caused these problems. I further looked into other proposed architectures that had evolved out of peer-to-peer file sharing over the years, or were aimed at very specific issues such as deep space communications, always with two questions in mind: what are they doing to avoid the issues with the web? Are they built in a way that makes live video broadcast feasible?

The questions and answers to this have shifted my understanding of what needs to be built dramatically since I sent that first grant application. But as a result, there is now also a consistent picture of an architecture we should be building, and that’s what we’re continuing to work on.

I say “we” now, because although I started as a single developer, over the years — always depending on funding status — I have hired other engineers. I am also continually seeking collaboration with similarly focused projects, such as Librecast (also NGI funded). I’ve also had the benefit of having domain experts review the work of the project, and bring their experience to bear. As a result, I have high confidence in the direction we’re going.

But R&D funding is always project based, and implementing this architecture exceeds the constraints of a single project’s funding; it requires a more continuous revenue stream and stable team. For that reason, I have also founded a non-profit organization dedicated to the pursuit of developing this and related technologies. I can collect donations, hire permanent staff, and so forth, all with that aim.

What key problems with the current web architecture motivated you to pursue a more human-centric, decentralized approach?

The practical starting point to that question is the recently updated RFC9620. It outlines various scenarios and mitigation approaches that engineers should consider when they design networking protocols in order to minimize harmful effects on the people using them.

The key problems with the web, are quite easy to understand, but as one builds on the other, it’s best to discuss them in order.

We tend to focus on domain names on the web, with the assumption that one domain servers a single web site or app. But the fundamental addressable unit on the web is neither of those, but a resource. We address it with a URL, which does contain a domain name, but also a path, and the combination of both indicates a specific resource.

Resources are what we act upon. The HTTP protocol defines actions (methods) for retrieving a resource, deleting one, and creating one. This effectively implements three out of four basic functions in the Create/Read/Update/Delete (CRUD) model of storage managements. It’s not a perfect mapping, but good enough for our discussion.

And here we see that there is no equivalent to the Update function in HTTP.

That’s not entirely truthful, I have to admit. The POST method does have similar semantics to the Update function. But what payload one should transmit via this method is undefined.

This omission exists with the other methods as well, to be fair. But to PUT a resource creates or overwrites data with client-provided values. To GET a value returns whatever was previously PUT there. To delete a resource, well… there is no semantic ambiguity in either of those actions. By contrast, the POST method is maximally ambiguous.

The first and immediate effect of this is that there is no way to point your client application at a different server. Your client must send exactly the parameters in a POST request that the server expects. This effectively captures the resource: perhaps three of the CRUD functions would be portable to other servers, but not the fourth.

Due to this form of capture and the focus on domains/services, it is the more obvious choice to authenticate users for the entire domain rather than individual resources. At that point, the choice of server not only captures the resource’s data, but also the user.

Again, this is primarily because the POST method’s payload is server defined. In HTTP this effective restriction is turned into an architectural principle that the REST architecture describes, which stands for REpresentational State Transfer. The key point here is that a resource is never guaranteed to be transmitted as-is, but that we merely transmit representations of it. To be fair, the goal here was flexibility of the implementation, not undermining data sovereignty. But that’s still what we get.

The second effect is that when one wishes to use the web for human communication and collaboration needs, it is not possible to communicate with other humans using a different server or service. That is, unless developers deliberately spend effort on breaking out of those silos. The Fediverse approach to managing this across multiple disparate, federated services fills me with hope in this respect. But the slow progress in developing this approach beyond the most basic functions is also due to the fact that the concept of federating actions is not a great fit with the underlying, server oriented stratum.

A third effect, which is independent of the second, is that there if representations of resources must be readable by the service, there is also no way to protect human to human communications from observation unless the service explicitly implements this. There is no built-in protection of the human right to privacy of communications.

Between this effect and the second, we have the basis for the growth of the platforms we have observed in the past decades, and the surveillance capitalism that is made possible by the sheer amount of data platforms gather. This has also manoeuvred us into a position where regulating the tech giants that own these platforms is extremely difficult, examples such as the Twitter/X ban in Brazil notwithstanding. We can call this the fourth effect, if you’d like.

The summary of this is, that a minor technical cause — deliberately leaving something undefined for flexibility — has turned the web from a fantastic means for connecting people across the globe into a human rights nightmare that can actively endanger lives. After all, if a server operator is (theoretically) privy to private conversations, so is a malicious regime that the server operator is beholden to, and all that can imply

I do not wish to come across here as overly critical of the giants on whose shoulders we all stand. For one thing, I think that the web has accomplished amazing things for the first part of its existence! Even if I only take my own experience as a measure, it has put me in contact with peers all over the world without whom I would never have reached where I am now.

And I think it’s also often overlooked that the creators of the web did not want to stop there, but were planning to add some notion of data sovereignty to the stack. An indication of this is the definition of the URI alongside the URL. URIs are identifiers for resources that were meant to be dynamically resolved to a locator. Unfortunately, such a resolution mechanism never emerged, with the result that developers today sometimes use the term URI, but always assume it can be used to access a resource directly.

To address the last part of your question, I don’t think I can say that the decentralized approach — though I prefer the term “distributed” here — is deliberately chosen. It’s more something that emerges both from my background, and the data sovereignty issue that sits at the center of the web’s problems. Solving data sovereignty and privacy concerns implies that servers have the two main functions of storing and making available data that is otherwise opaque to them; it is anti-REST. In that sense, they are indistinguishable from a node in a peer-to-peer network.

The upside to approaching the architecture from a peer-to-peer point of view is that it also allows for optimizations. If you’re on the couch at home with your partner and send them a meme… does that communication really need to exit your home network via your ISP to the cloud, only to make the return trip back? Probably not. Removing the special role of servers in the architecture makes space for finding smarter solutions, which can also enhance privacy, such as in this example.

The Interpeer Project is focused on long-term R&D. Can you share insights into the major technical challenges your team is tackling?

I’ve outlined many of the problems and concerns we’re trying to address together. I also mentioned how the project evolved from merely looking at peer-to-peer video broadcast, towards the scope it currently has. Aside from some technological innovations, I think the major result of the work to date is to have gone through the analysis and comparison work, and to arrive at a coherent architecture for the stack as a whole.

The major problem now is to validate that architecture, which requires implementing the layers that are currently still missing. There exist prototypes, notes on lessons learned, outlines of functionality required, and so forth — but not every layer has a fully working implementation yet. As people who write software will appreciate, it is often in the writing that you discover which of your concepts are wrong. So this work goes hand in hand with learning, and adapting those. It must eventually result in specifications for compatible implementations. The specification process various standards bodies have are involved enough that going through that provides its own set of challenges.

There is another point to consider: with the Internet or web stack that builds upon it, folk are used to the fairly simple seven layers of the ISO/OSI model in which to locate certain functionality. The fundamental notion is that layers should be exchangeable; it’s the interface between them that remains stable. But the interface in the ISO/OSI model is very simple: the most complex parts concerned with chunking up streams of bytes into individual messages and re-assembling them.

In our architecture, we treat group communications around resources as the core component. This is a generalization of the web’s approach, by the way, to effectively treat a resource as the anchor point for human communications. We more formally define the concept of a resource as the focal point for communications of a group of participants.

Group communications are very different from the point-to-point communications we see most of the time. One of several key questions is: how do you secure this? That question not only means that group-based key exchange for encrypting content is required, but that we also have to face a more abstract question: how is group membership managed? There are plenty of easy answers to choose from here, but in order to make the stack stand the test of time, we have to focus on the common primitives underlying all those approaches.

If we have these primitives, at which layer are they enforced? At the top of the stack, humans need to make decisions about group membership. But are we going to pass around data through the entire stack and wait for people to decide whether it is acceptable? It would be highly inefficient! So we need to pass those decisions from the top of the stack down as far as is necessary to make efficient choices, prevent denial-of-service scenarios, etc.

The result of this is that the simple interface between layers as we know it from ISO/OSI is not going to cut it. All implementation questions aside, the most important aspect of this work is to define those interfaces well. That also means not overloading developers with choices, but allow them to onboard gradually. That alone is an immense task!

The focus on long-term R&D is not because it’s impossible to move faster. It’s because in order to get all of this right, we need to engage with communities of practitioners that have entirely different concerns in the present. It will take time, because human consensus building always does.

Peer-to-peer and conflict-free replicated data types (CRDTs) are part of your solutions. Could you explain their importance in creating secure, decentralized communication?

I hinted above that I prefer “distributed” to “decentralized” to describe the architecture, and that goes straight back to the memorandum on distributed networks that Paul Baran authored in 1964 for the RAND corporation. It formed the foundation for IP networking, and so the modern Internet.

In it, he distinguishes between different architectures, starting with centralized approaches in which nodes effectively form a star pattern, all connecting to some center. In a decentralized approach, such as the web, nodes connect to one of many such centers, which themselves are connected. A galaxy of stars, if you wish. Finally he describes the distributed approach, which is more of a mesh in which all nodes connect to more than one other node. The argument he makes is that in this way, the loss of a single node permits other nodes to route around the failure.

Baran at the time was concerned with e.g. scenarios of war, in which communications nodes may be wiped out by missile attacks. But in the development of the Internet Protocol, this thinking was generalized to include any kind, even temporary failures. And at this IP level at least, the Internet is still mostly following this distributed principle.

I said before that the peer-to-peer approach is in some sense one that emerged. But if you go back to the roots of the Internet, it is also easily observable that its resilience stems from a distributed design. I think that resilience must remain a goal also for the future, even if scenarios have shifted from warfare to more mundane questions of whether or not you have connectivity in a train tunnel. Well, that is the privileged Western view, at least. In other parts of the world, resilient communications are still very much a question of survival. We should never forget that.

In summary, there are plenty of reasons pointing towards a peer-to-peer approach.

The use of CRDTs relates to this point of view, but the question one might ask is a little different: assuming I am cut off from the general Internet, how much can I still do? This question has led a number of people in the direction of so-called “offline first” technologies or application designs (they also go by other names). CRDTs provide a bridge between software that is designed to be used offline only, and software that is designed to be used always online. They treat changes to data as update logs, if you will. Even if logs are transmitted with immense delay or intermittency in communications, a consistent view on both communication ends can be established eventually by replaying them.

Furthermore, CRDTs take into account that multiple authors might create changes in parallel that have to be merged. This fits very naturally to the group communications model around resources: the form the communication can take could be to send updates to a shared resource to other group members.

But we don’t really mandate the use of CRDTs in the stack. It is the layer closest to the application, and it is easy to get started with this when your mental model is concerned with structured data. In other cases, it may be more effective to omit this layer, for example when the group is mostly about broadcasting from a single source to multiple recipients.

To return to your question, the upshot of both approaches is resiliency in the face of disruptions in communication. The solution each provides is complementary to the other, however.

How does the decentralization model proposed by Interpeer address privacy concerns more effectively than current centralized web systems?

I think for the most part I already replied to this, but it bears repeating: on its own, the web cannot provide privacy because communications are always observable by the server operator by design.

The approach we’re taking is to cut out the server’s observation options, in several complementary ways: for one, we design the protocols so that there is no need for servers to observe anything, by (effectively) including the Update part of the CRUD family.

For another, we treat all communications as fundamentally end-to-end encrypted. It’s not clear that this must always be enabled, but it must be the default from which users may choose to opt out.

The effect of both is that we also remove dependence on any particular node, which enables such things as the above mentioned optimizations for local networks. But it also allows routing around censorship, when a single node is compromised. Other routing approaches could e.g. leverage the Tor network, or implement similar functions in other layers of the stack.

There is a lot of room for what I consider “fun research questions” here. But it all starts with removing power from server operators back to the humans using the tech.

What practical use cases do you foresee emerging from the human-centric internet model you’re developing?

That question is one I am asked occasionally, and that I struggle replying to — not because there is no answer, but because the correct and complete answer is “any”.

Of course, that reply rarely satisfies the person asking.

In a sense, this is why I started answering your first question with a rather elaborate dive into history. I started with a focus on high bandwidth, low latency use cases such as live video broadcast. I included high bandwidth, ultra high latency use cases such as deep space communications — consider that a round trip to Mars and back is measured in weeks. Between them, there are also low bandwidth use cases, such as remote locations connected by a LoRaWAN link, where their latency or intermittency may be dependent on power derived from solar panels, i.e. the time of day and weather conditions.

I deliberately do not speak about use cases here, but classes of constraints on use cases, if you’d like.

Because there is only a single use case, which comes in infinite flavours: allow humans to safely communicate and collaborate, in whichever way they wish, within all combinations of the above constraints as best as those constraints allow.

That’s what we mean by human centric: to serve individual human beings with a high variety of needs and resources.

What does success look like for the Interpeer Project in 10 years? How do you foresee the internet changing?

Let’s start with the second part first: I foresee the Internet changing mostly for the worse before it gets better. We’re currently at the beginning of a phase in which spicy autocomplete (so-called artificial “intelligence”) floods our distributed knowledge base with garbage. Much of the web’s resources have become less useful faster than they originally improved during its rise.

Meanwhile, the AI hype in particular has led to an explosion in data center growth. With so much compute and storage so well connected to wherever end user’s needs may be expressed, there is no urgent business need to work on more resilient options.

One fairly obvious development is going to be that the solutions developed in the Global North will be increasingly less applicable in the Global South. That is because growth in compute resources and network capabilities there is slower, and not as well regulated as here, with the effect that you can already see the emergence of competing ecosystems within the same region.

For example, default routing between one part of a single city and another may cross continental divides only to make it back, merely because the operators are not peering anywhere near that city. Geopolitical instability can surely contribute to this.

There is not much business incentive to bridge this divide. Instead, it appears as if global tech players use it to carve out isolated fiefdoms. The idea of the Internet as a network-of-networks is not as present as we might like.

There is a gold rush underway, and gold rushes rarely end well for the majority of people.

Now I’m writing this in a night where Kirk sweeps over France and heads into central Europe. Hurricane Milton is heading for Florida, mere weeks after Helene has wreaked havoc there. And more storms are forming.

I’m particularly sensitive of France and Miami here, because some of my peers are monitoring data centres there, wondering whether services they host in those locations will remain unaffected.

Again, the Global South tends to be the worst affected by those climate change events. But we’re already seeing their impact on the parts of the Internet in the Global North. We can be all but certain that intermittency of Internet services is going to increase around the globe, even if only in one area or another at a time.

So my predictions for the Internet is that the Global North/South divide will grow, and disruptions in parts of the Internet will increase. And that is not even counting deliberate disruption as a form of warfare.

It seems like the right time to focus on a more resilient approach for a potential future Internet stack.

With regards to the project itself, projection is difficult. But let’s speculate a little: from an implementation perspective, 2-3 years with a small but stable team of engineers would be enough to test our assumptions and provide a stack that generally does what it promises.

The slower part will be to engage the wider Internet community in contributing their perspectives.

Spinning this business-like thinking further, a key point to that would be to find a narrower answer to your question about use cases: focus on engaging with a specific community with their specific issues, and demonstrate the value of the stack there.

This creates a split focus for the team: it would be too early to stop the R&D part, but it will be necessary to provide support to onboard interested parties to our stack. In this perspective, we would require both an additional business development team, as well as a customer support team in addition to the original R&D folk.

After that first community is content, it would then be time to build upon this success and expand into adjacent communities, until word of mouth can take on the bulk of the marketing effort.

The reason I went through this just now is because startups with significantly simpler technologies or products can possibly reach those adjacent communities within a ten year time frame, provided there are no major pivots required.

But we’re not looking at some app or another, but at a wholly different technology stack.

I think that you can’t apply this measure of success here. You cannot ask how much of the dominant tech stack is replaced with ours in ten years time. I suspect the answer to that would be “a fractional amount”.

But do we need to do this? Is hyper-scaling really the only way to think about network technology? We’re trying to make the Internet better for humans. Any amount of improvement in the part of human existence that depends on the Internet will be a resounding success! If we can get resilience and disruption tolerance to places that are hit worst by the climate crisis, that’d be fantastic! If we can help people keep their communications channels in conflict zones, it’s a win! If we can help people keep their conversations away from surveillance capitalist exploitation, we can celebrate!

I’m not replying this because I want to aim low, quite the contrary. You cannot be aiming low if your vision is to literally change the (Internet) world. But the goal isn’t tech for tech’s sake, or scale for scale’s sake. It’s always about people.

In ten years time, I wish to have made it possible for people to have safer, more resilient collaboration and communications on the Internet, powered by our stack.