Filecoin | As an Engineer

We go over Filecoin from a developer's perspective. We talk about some of the superpowers Filecoin has, take a look at the ecosystem, and clear up some confusion about the relationship with IPFS.

Patrick Collins
9 min readFeb 8, 2023
What is filecoin?
Filecoin, but for devs

You can also watch our video on it here:

Introduction

Filecoin is a decentralized storage solution with contract-based data persistence. And just dropped an absolute bomb of a release by adding smart contract functionality.

Summary of how it differs from Ethereum

Instead of being a blockchain that stores any data it’s given and grows so quickly in size that the SSD card in my intel nuc catches fire, the Filecoin chain itself only stores agreements or “deals” between storage providers and people wanting to store data, or renters.

So the amount of data you as a node operator need to store on the Filecoin chain is much less.

The below image shows what happens when you just “crank up the stats” on a blockchain and force everyone to buy 10,000 TBs of storage.

What is filecoin
Image from YouTube Video

On Filecoin, stored data additionally has an expiration. And it’s this combination of only certain nodes having to store data and data expiration dates that allows the network to scale with a lot of data much better than traditional blockchains.

Already, we have 2 major differences between Filecoin and a more “traditional” chain like Ethereum.

  • Data has an expiration
  • Not all nodes store all data

If you tried to store the entire internet on Ethereum it would cost you all the money in the world and every node operator would immediately be out of business. And it would be impossible to get all the money in the world anyways because FTX still has a good chunk of it.

Node operators that identify as “storage providers” promise to store data, and that promise is recorded on the chain. If a node doesn’t store the data, they are hit with a financial penalty or slashed. You’ll read about Proof-of-replication and proof-of-spacetime in the docs which is basically proof of stake combined with challenging devs to prove they have the data.

Maybe still a little confused? Ok, let’s back up.

How much data can fit on a blockchain?

What is filecoin?
How big is the internet?

It’s estimated that as of right now, the internet has 5 million terabytes of data. An Ethereum full-node for comparison is around 1 TBs big.

So if we want to move to a decentralized internet, we just need to put 5 million TBs onto Eth.

Storing data on-chain
Seems a little pricy

Looks like the estimate I get from popping all the data into remix is a little out of my budget.

Also, the ETH hardfork spurious dragon says a contract can only be 2.45 GBs anyways.

Not only would storing all this data be crazy expensive on a traditional blockchain but:

  • Decentrality is threatened
  • Old data becomes baggage (State Bloat)

When a traditional blockchain gets more data, every other node needs to add every piece of data. So if the blockchain grows too big too fast, it can be difficult for other node operators to keep up with the expense of the size, and they drop off, meaning fewer and fewer nodes securing the network.

And if the chain keeps every piece of data keeps getting deployed, in 50 years you’ll have a blockchain where maybe only 5% of the chain is actually used and the rest of the chain is historical garbage baggage.

What is filecoin?
State Bloat Example

So to have decentralized storage, we need a different model than what we’ve seen for Ethereum, Solana, Polygon, or really any EVM-compatible chain. Otherwise, we will just put way too much data that no one cares about on-chain, and all the node operators have to lug it around.

Filecoin

Filecoin is a decentralized storage solution, and instead of storing every single piece of data ever on-chain, it has a chain, but the chain only stores agreements (deals) between storage providers and what I’m calling renters. They also have smart contracts now, but more on that later. The renter pays the storage provider in FIL, and the storage provider stores the data. The kicker is every Filecoin node operator doesn’t store the data. The renter picks specific nodes they want to store the data.

Yes you heard that right, Filecoin is a 2 sided marketplace, and a chain in the middle to keep track of all the deals.

Filecoin is a 2 sided marketplace
A Filecoin storage provider marketplace on plus.fil.org

So this avoids the threat against decentrality because the chain only stores deal information, and it’s easy for anyone to join the network and rent out a piece of their storage.

And it avoids the state bloat because each deal/piece of data has an expiration, so you avoid the problem of having to store data forever.

Which intuitively makes sense. You wouldn’t tell your friend “hey dude, pay me once and I’ll let you crash on my couch anytime you want for all of eternity.”

Now once a deal has been made, where a storage provider promises a renter to store data, how do we guarantee that the data will stay?

Proof-Of-Storage

Filecoin’s Lotus is similar to Ethereum’s Geth
Filecoin Lotus node symbol

Filecoin uses consensus algorithms called proof-of-replication and proof-of-spacetime. At a very high level, the combination of these two means:

  1. Storage providers stake FIL as an economic incentive to behave.
  2. They are periodically (daily) challenged to provide cryptographic proof that they still have the data stored.
  3. If they can’t provide proof, they are slashed.

So in this regard, I think of the system as a “Proof of stake with data availability challenges”.

I’m glossing over the system a lot but this is my analogy at a high level.

This is known as “contract-based data persistence” for decentralized storage solutions. Since the data persists due to the threat of them getting slashed from a contract they signed. And the storage providers are paid FIL for storing the data.

Anyways A few notes on the kinds of data:

  1. Data isn’t encrypted but sealed. Meaning it’s not good for storing private data, and access to data is paywalled.
  2. There are size requirements — meaning you can’t just store a 1 byte file.

This was Filecoin in a nutshell, until recently when they launched their FVM — the Filecoin Virtual Machine, which allows people to not just store data, but also run smart contracts that can interact with that data. On top of the FVM they have another virtual machine called the FEVM, so all your solidity, vyper, and EVM code can be deployed to the FEVM which sends it to the FVM, which stores it on the Filecoin chain exactly in the same location the deals are stored.

The FVM

Filecoin has smart contracts
Filecoin FVM

My initial question when I saw this was “WOW! I’m going to deploy a smart contract that stores a ton of data on the Filecoin marketplace!”.

Then I learned you can’t do that yet.

“Oh, ok. Well, now I’ll write solidity code that will access to massive amounts of data!.”

Then I learned you can’t do that yet either.

Filecoin vs traditional blockchain
The Data is stored outside the chain in Filecoin, so it’s hard for smart contracts to access

A lot of the FVM integrations with storing and retrieving data don’t work quite yet. You can verify the existence of a deal being fulfilled, or the status of the deal since that data is stored on-chain. To be fair is really cool because the FVM has all the functionality of ETH, plus the ability to verify and potentially build decentralized marketplaces around the existence of data.

Which… Sort of blew my mind.

You can follow the Filecoin docs under the smart contract section to deploy a smart contract to the Filecoin network the exact same as you’d do on any EVM-compatible chain.

Ok, so you’re up to speed with me so far right?

  1. Filecoin is a decentralized storage solution
  2. It uses a 2 sided marketplace with incentive models to make sure data is stored
  3. This model allows it to scale to a ton of data
  4. You can launch smart contracts on it as well

Clarifications

But let’s clarify a few things you’ll run into when going through the docs. These are some things that confused me, and I’m going to un-confuse you right now.

The Almost Economic Incentive Layer for IPFS

So you might have seen headlines like this when reading about Filecoin.

Filecion is the incentive layer for IPFS

However, this isn’t exactly true. Storing data on a Filecoin node does nothing with IPFS.

Nope.

Nothing.

If you store data on Filecoin, you can additionally independently store it on IPFS, but it’s not the default. A project working on making data stored on Filecoin also stored on IPFS (as a true economic incentive model) is Estuary.

This also highlights how Filecoin is an L1, and how we will likely see more L2 projects like this show up.

This confusion shows up because there are a lot of mentions in the docs about how the two technologies work together, and the fact that they are both built by Protocol Labs.

What is Glif?

Glif: An lchemy/Infura of Filecoin. They also have a few other features still in beta.

What is Saturn?

If Filecoin is cold storage, Saturn is hot storage for data. The are more recently launched and I’m excited for their future.

What is a Lotus node?

The Go implementation of a Filecoin node. Similar to how Geth is the go implementation of Ethereum.

Actually using it

lotus sync wait

Now, what does this actually look and feel like?

I recommend going through the tutorial in the docs if you’re interested in getting started. After doing this a few times and running a production Filecoin node for about a year myself, my biggest piece of advice to you is to be patient. A lot of the steps in getting started can take a while. So don’t get discouraged.

  1. For smart contracts, it’s just like any EVM chain. You grab an RPC URL and you’re good to go.
  2. Store & Retrieving Data though, if you’re looking to do it yourself you’ll want to buckle up and don’t get discouraged.

The Lotus node — storing data

Now when you get to the lotus docs, you’ll be hit with the decision of running a:

  • Full node
  • Lite node
  • Archive node
  • Storage provider node (which can be some combination of the above)

For starters, just run a full node. If you want to make money storing data, run a storage node, but you’ll need to buy a LOT more hardware.

I highly recommend not running a Filecoin node in the cloud. It’s unlikely you will make more money than you will be charged.

To run a full node you’ll need at least 32 GM of ram and (according to the docs) “enough SSD to cover a full node”. Which of course, changes every second. So as of recording, I’ve found that 6TBs is more than you need to get started, but in practice, it’ll mean you won’t have to prune the chain very often.

Then, you’ll run through the docs to download and install the lotus code, and you’ll start it with the lotus keyword.

Once your node is up, you’ll run:

lotus sync wait

To wait for the chain to be synced. In practice, if you get an error saying it couldn’t find your API, you can just ignore that and wait.

But once synced, you can import any file to your node with:

lotus client import mydata.txt

It’ll give you an output like:

Import 3, Root bafykb...

Once completed, it’s time to setup a deal. Right now, the recommended marketplace is the plus.fil.org where you can select the other nodes that you’d like to interact with. Come back to your command line and run:

lotus client deal

And it will take you through a step-by-step process of sending your data to those storage providers. You’ll use the miner IDs from the marketplace to select them.

I’m looking forward to when this process is a little easier to do from a decentralized point of view.

And with that, you now have the basics of the Filecoin network. Let me know if I missed anything!

Follow me for me!

Twitter: https://twitter.com/patrickalphac

Medium: https://medium.com/@patrickalphac

TikTok: https://www.tiktok.com/@patrickalphac

YouTube: https://youtube.com/@PatrickAlphaC

--

--