## OverView

Symbol | N/A |
---|---|

Concept |
A MASSIVE AMOUNT OF STORAGE SITS UNUSED IN DATA CENTERS AND HARD DRIVES AROUND THE WORLD. |

## Whitepaper

#### Abstract

The internet is in the middle of a revolution: centralized proprietary services are being replaced with
decentralized open ones; trusted parties replaced with verifiable computation; brittle location addresses
replaced with resilient content addresses; inefficient monolithic services replaced with peer-to-peer algorithmic
markets. Bitcoin, Ethereum, and other blockchain networks have proven the utility of decentralized
transaction ledgers. These public ledgers process sophisticated smart contract applications and
transact crypto-assets worth tens of billions of dollars. These systems are the first instances of internetwide
Open Services, where participants form a decentralized network providing useful services for pay,
with no central management or trusted parties. IPFS has proven the utility of content-addressing by
decentralizing the web itself, serving billions of files used across a global peer-to-peer network. It liberates
data from silos, survives network partitions, works offline, routes around censorship, and gives
permanence to digital information.

Filecoin is a decentralized storage network that turns cloud storage into an algorithmic market. The
market runs on a blockchain with a native protocol token (also called “Filecoin”), which miners earn
by providing storage to clients. Conversely, clients spend Filecoin hiring miners to store or distribute
data. As with Bitcoin, Filecoin miners compete to mine blocks with sizable rewards, but Filecoin mining
power is proportional to active storage, which directly provides a useful service to clients (unlike Bitcoin
mining, whose usefulness is limited to maintaining blockchain consensus). This creates a powerful incentive
for miners to amass as much storage as they can, and rent it out to clients. The protocol weaves
these amassed resources into a self-healing storage network that anybody in the world can rely on. The
network achieves robustness by replicating and dispersing content, while automatically detecting and
repairing replica failures. Clients can select replication parameters to protect against different threat
models. The protocol’s cloud storage network also provides security, as content is encrypted end-to-end
at the client, while storage providers do not have access to decryption keys. Filecoin works as an incentive
layer on top of IPFS [1], which can provide storage infrastructure for any data. It is especially useful
for decentralizing data, building and running distributed applications, and implementing smart contracts.

This work:
(a) Introduces the Filecoin Network, gives an overview of the protocol, and walks through several
components in detail.

(b) Formalizes decentralized storage network (DSN) schemes and their properties, then constructs Filecoin
as a DSN.

(c) Introduces a novel class of proof-of-storage schemes called proof-of-replication, which allows proving
that any replica of data is stored in physically independent storage.
(d) Introduces a novel useful-work consensus based on sequential proofs-of-replication and storage as a
measure of power.

(e) Formalizes verifiable markets and constructs two markets, a Storage Market and a Retrieval Market,
which govern how data is written to and read from Filecoin, respectively.

(f) Discusses use cases, connections to other systems, and how to use the protocol.

Note: Filecoin is a work in progress. Active research is under way, and new versions of this paper will
appear at https://filecoin.io. For comments and suggestions, contact us at research@filecoin.io.

#### 1 Introduction

Filecoin is a protocol token whose blockchain runs on a novel proof, called Proof-of-Spacetime, where blocks
are created by miners that are storing data. Filecoin protocol provides a data storage and retrieval service
via a network of independent storage providers that does not rely on a single coordinator, where: (1) clients
pay to store and retrieve data, (2) Storage Miners earn tokens by offering storage (3) Retrieval Miners earn
tokens by serving data.

1.1 Elementary Components

The Filecoin protocol builds upon four novel components.
1. Decentralized Storage Network (DSN): We provide an abstraction for network of independent
storage providers to offer storage and retrieval services (in Section 2). Later, we present the Filecoin
protocol as an incentivized, auditable and verifiable DSN construction (in Section 4).

2. Novel Proofs-of-Storage: We present two novel Proofs-of-Storage (in Section 3): (1) Proof-ofReplication
allows storage providers to prove that data has been replicated to its own uniquely dedicated
physical storage. Enforcing unique physical copies enables a verifier to check that a prover is not
deduplicating multiple copies of the data into the same storage space; (2) Proof-of-Spacetime allows
storage providers to prove they have stored some data throughout a specified amount of time.

3. Verifiable Markets: We model storage requests and retrieval requests as orders in two decentralized
verifiable markets operated by the Filecoin network (in Section 5). Verifiable markets ensure that
payments are performed when a service has been correctly provided. We present the Storage Market
and the Retrieval Market where miners and clients can respectively submit storage and retrieval orders.

4. Useful Proof-of-Work: We show how to construct a useful Proof-of-Work based on Proof-ofSpacetime
that can be used in consensus protocols. Miners do not need to spend wasteful computation
to mine blocks, but instead must store data in the network.

#### 1.2 Protocol Overview

• The Filecoin protocol is a Decentralized Storage Network construction built on a blockchain and with
a native token. Clients spend tokens for storing and retrieving data and miners earn tokens by storing
and serving data.

• The Filecoin DSN handle storage and retrieval requests respectively via two verifiable markets: the
Storage Market and the Retrieval Market. Clients and miners set the prices for the services requested
and offered and submit their orders to the markets.

• The markets are operated by the Filecoin network which employs Proof-of-Spacetime and Proof-ofReplication
to guarantee that miners have correctly stored the data they committed to store.

• Finally, miners can participate in the creations of new blocks for the underlining blockchain. The
influence of a miner over the next block is proportional to the amount of their storage currently in use
in the network.

A sketch of the Filecoin protocol, using nomenclature defined later within the paper, is shown in Figure 1
accompanied with an illustration in Figure 2.

#### 1.3 Paper organization

The remainder of this paper is organized as follows. We present our definition of and requirements for a
theoretical DSNscheme in Section 2. In Section 3 we motivate, define, and present our Proof-of-Replication
and Proof-of-Spacetime protocols, used within Filecoin to cryptographically verify that data is continuously

stored in accordance with deals made. Section 4 describes the concrete instantiation of the Filecoin DSN,
describing data structures, protocols, and the interactions between participants. Section 5 defines and describes
the concept of Verifiable Markets, as well as their implementations, the Storage Market and Retrieval
Market. Section 6 motivates and describes the use of the Proof-of-Spacetime protocol for demonstrating and
evaluating a miner’s contribution to the network, which is necessary to extend the blockchain and assign
the block reward. Section 7 provides a brief description of Smart Contracts within the Filecoin We conclude
with a discussion of future work in Section 8

#### 2 Definition of a Decentralized Storage Network

We introduce the notion of a Decentralized Storage Network (DSN) scheme. DSNs aggregate storage offered
by multiple independent storage providers and self-coordinate to provide data storage and data retrieval to
clients. Coordination is decentralized and does not require trusted parties: the secure operation of theses
systems is achieved through protocols that coordinate and verify operations carried out by individual parties.
DSNs can employ different strategies for coordination, including Byzantine Agreement, gossip protocols, or
CRDTs, depending on the requirements of the system. Later, in Section 4, we provide a construction for
the Filecoin DSN.

Definition 2.1. A DSN scheme Π is a tuple of protocols run by storage providers and clients:

(Put, Get, Manage)

• Put(data) → key: Clients execute the Put protocol to store data under a unique identifier key.

• Get(key) → data: Clients execute the Get protocol to retrieve data that is currently stored using key.

• Manage(): The network of participants coordinates via the Manage protocol to: control the available
storage, audit the service offered by providers and repair possible faults. The Manage protocol is run
by storage providers often in conjunction with clients or a network of auditors1
.

A DSN scheme Π must guarantee data integrity and retrievability as well as tolerate management and storage
faults defined in the following sections.

2.1 Fault tolerance

2.1.1 Management faults

We define management faults to be byzantine faults caused by participants in the Manage protocol. A DSN
scheme relies on the fault tolerance of its underlining Manage protocol. Violations on the faults tolerance
assumptions for management faults can compromise liveness and safety of the system.
For example, consider a DSN scheme Π, where the Manage protocol requires Byzantine Agreement (BA)
to audit storage providers. In such protocol, the network receives proofs of storage from storage providers
and runs BA to agree on the validity of these proofs. If the BA tolerates up to f faults out of n total
nodes, then our DSN can tolerate f < n/2 faulty nodes. On violations of these assumptions, audits can be
compromised.

2.1.2 Storage faults
We define storage faults to be byzantine faults that prevent clients from retrieving the data: i.e. Storage
Miners lose their pieces, Retrieval Miners stop serving pieces. A successful Put execution is (f, m)-tolerant
if it results in its input data being stored in m independent storage providers (out of n total) and it can
tolerate up to f byzantine providers. The parameters f and m depend on protocol implementation; protocol
designers can fix f and m or leave the choice to the user, extending Put(data) into Put(data, f, m). A Get
execution on stored data is successful if there are fewer than f faulty storage providers.

For example, consider a simple scheme, where the Put protocol is designed such that each storage provider
stores all of the data. In this scheme m = n and f = m − 1. Is it always f = m − 1? No, some schemes can
be designed using erasure coding, where each storage providers store a special portion of the data, such that
x out of m storage providers are required to retrieve the data; in this case f = m − x.

2.2 Properties

We describe the two required properties for a DSN scheme and then present additional properties required
by the Filecoin DSN.
In the case where the Manage protocol relies on a blockchain, we consider the miners as auditors, since they verify and
coordinate storage providers
2.2.1 Data Integrity

This property requires that no bounded adversary A can convince clients to accept altered or falsified data
at the end of a Get execution.
Definition 2.2. A DSN scheme Π provides data integrity if: for any successful Put execution for some data
d under key k, there is no computationally-bounded adversary A that can convince a client to accept d
0
, for
d
0 6= d at the end of a Get execution for identifier k.

2.2.2 Retrievability

This property captures the requirement that, given our fault-tolerance assumptions of Π, if some data has
been successfully stored in Π and storage providers continue to follow the protocol, then clients can eventually
retrieve the data.
Definition 2.3. A DSN scheme Π provides retrievability if: for any successful Put execution for data under
key, there exists a successful Get execution for key for which a client retrieves data.

2.2.3 Other Properties

DSNs can provide other properties specific to their application. We define three key properties required by
the Filecoin DSN: public verifiability, auditability, and incentive-compatibility.
Definition 2.4. A DSN scheme Π is publicly verifiable if: for each successful Put, the network of storage
providers can generate a proof that the data is currently being stored. The Proof-of-Storage must convince
any efficient verifier, which only knows key and does not have access to data.
Definition 2.5. A DSN scheme Π is auditable, if it generates a verifiable trace of operation that can be
checked in the future to confirm storage was indeed stored for the right duration of time.
Definition 2.6. A DSN scheme Π is incentive-compatible, if: storage providers are rewarded for successfully
offering storage and retrieval service, or penalized for misbehaving, such that the storage providers’ dominant
strategy is to store data.

#### 3 Proof-of-Replication and Proof-of-Spacetime

In the Filecoin protocol, storage providers must convince their clients that they stored the data they were
paid to store; in practice, storage providers will generate Proofs-of-Storage (PoS) that the blockchain network
(or the clients themselves) verifies.

In this section we motivate, present and outline implementations for the Proof-of-Replication (PoRep) and
Proof-of-Spacetime (PoSt) schemes used in Filecoin.

3.1 Motivation

Proofs-of-Storage (PoS) schemes such as Provable Data Possession (PDP) [2] and Proof-of-Retrievability
(PoR) [3, 4] schemes allow a user (i.e. the verifier V) who outsources data D to a server (i.e. the prover P) to
repeatedly check if the server is still storing D. The user can verify the integrity of the data outsourced to a
server in a very efficient way, more efficiently than downloading the data. The server generates probabilistic
proofs of possession by sampling a random set of blocks and transmits a small constant amount of data in
a challenge/response protocol with the user.

PDP and PoR schemes only guarantee that a prover had possession of some data at the time of the challenge/response.
In Filecoin, we need stronger guarantees to prevent three types of attacks that malicious
miners could exploit to get rewarded for storage they are not providing: Sybil attack, outsourcing attacks,
generation attacks.

• Sybil Attacks: Malicious miners could pretend to store (and get paid for) more copies than the ones
physically stored by creating multiple Sybil identities, but storing the data only once.

• Outsourcing Attacks: Malicious miners could commit to store more data than the amount they can
physically store, relying on quickly fetching data from other storage providers.

• Generation Attacks: Malicious miners could claim to be storing a large amount of data which they
are instead efficiently generating on-demand using a small program. If the program is smaller than
the purportedly stored data, this inflates the malicious miner’s likelihood of winning a block reward in
Filecoin, which is proportional to the miner’s storage currently in use.

3.2 Proof-of-Replication

Proof-of-Replication (PoRep) is a novel Proof-of-Storage which allows a server (i.e. the prover P) to convince
a user (i.e. the verifier V) that some data D has been replicated to its own uniquely dedicated physical
storage. Our scheme is an interactive protocol, where the prover P: (a) commits to store n distinct replicas
(physically independent copies) of some data D, and then (b) convinces the verifier V, that P is indeed
storing each of the replicas via a challenge/response protocol. To the best of our knowledge, PoRep improves
on PoR and PDP schemes, preventing Sybil Attacks, Outsourcing Attacks, and Generation Attacks.
Note. For a formal definition, a description of its properties, and an in-depth study of Proof-of-Replication,
we refer the reader to [5].

Definition 3.1. (Proof-of-Replication) A PoRep scheme enables an efficient prover P to convince a verifier
V that P is storing a replica R, a physical independent copy of some data D, unique to P. A PoRep protocol
is characterized by a tuple of polynomial-time algorithms:
(Setup, Prove, Verify)

• PoRep.Setup(1λ
, D) → R, SP , SV , where SP and SV are scheme-specific setup variables for P and V, λ
is a security parameter. PoRep.Setup is used to generate a replica R, and give P and V the necessary
information to run PoRep.Prove and PoRep.Verify. Some schemes may require the prover or interaction
with a third party to compute PoRep.Setup.
10
• PoRep.Prove(SP , R, c) → π
c
, where c is a random challenge issued by a verifier V, and π
c
is a proof
that a prover has access to R a specific replica of D. PoRep.Prove is run by P to produce a π
c
for V.
• PoRep.Verify(SV , c, πc
) → {0, 1}, which checks whether a proof is correct. PoRep.Verify is run by V and
convinces V whether P has been storing R.
3.3 Proof-of-Spacetime
Proof-of-Storage schemes allow a user to check if a storage provider is storing the outsourced data at the time
of the challenge. How can we use PoS schemes to prove that some data was being stored throughout a period
of time? A natural answer to this question is to require the user to repeatedly (e.g. every minute) send
challenges to the storage provider. However, the communication complexity required in each interaction can
be the bottleneck in systems such as Filecoin, where storage providers are required to submit their proofs to
the blockchain network.
To address this question, we introduce a new proof, Proof-of-Spacetime, where a verifier can check if a prover
is storing her/his outsourced data for a range of time. The intuition is to require the prover to (1) generate
sequential Proofs-of-Storage (in our case Proof-of-Replication), as a way to determine time (2) recursively
compose the executions to generate a short proof.
Definition 3.2. (Proof-of-Spacetime) A PoSt scheme enables an efficient prover P to convince a verifier
V that P is storing some data D for some time t. A PoSt is characterized by a tuple of polynomial-time
algorithms:
(Setup, Prove, Verify)
• PoSt.Setup(1λ
, D) → SP , SV , where SP and SV are scheme-specific setup variables for P and V, λ is a
security parameter. PoSt.Setup is used to give P and V the necessary information to run PoSt.Prove
and PoSt.Verify. Some schemes may require the prover or interaction with a third party to compute
PoSt.Setup.
• PoSt.Prove(SP , D, c, t) → π
c
, where c is a random challenge issued by a verifier V, and π
c
is a proof
that a prover has access to D for some time t. PoSt.Prove is run by P to produce a π
c
for V.
• PoSt.Verify(SV , c, t, πc
) → {0, 1}, which checks whether a proof is correct. PoSt.Verify is run by V and
convinces V whether P has been storing D for some time t.

3.4 Practical PoRep and PoSt

We are interested in practical PoRep and PoSt constructions that can be deployed in existing systems and do
not rely on trusted parties or hardware. We give a construction for PoRep (see Seal-based Proof-of-Replication
in [5]) that requires a very slow sequential computation Seal to be performed during Setup to generate a
replica. The protocol sketches for PoRep and PoSt are presented in Figure 4 and the underlying mechanism
of the proving step in PoSt is illustrated in Figure 3.

3.4.1 Cryptographic building blocks

Collision-resistant hashing. We use a collision resistant hash function CRH : {0, 1}
∗ → {0, 1}
O(λ)
. We
also use a collision resistant hash function MerkleCRH, which divides a string in multiple parts, construct a
binary tree and recursively apply CRH and outputs the root.
zk-SNARKs. Our practical implementations of PoRep and PoSt rely on zero-knowledge Succinct Noninteractive
ARguments of Knowledge (zk-SNARKs) [6, 7, 8]. Because zk-SNARKs are succinct, proofs are
very short and easy to verify. More formally, let L be an NP language and C be a decision circuit for L.
A trusted party conducts a one-time setup phase that results in two public keys: a proving key pk and a
verification key vk. The proving key pk enables any (untrusted) prover to generate a proof π attesting that
11
x ∈ L for an instance x of her choice. The non-interactive proof π is both zero-knowledge and proof-ofknowledge.
Anyone can use the verification key vk to verify the proof π; in particular zk-SNARK proofs are
publicly verifiable: anyone can verify π, without interacting with the prover that generated π. The proof π
has constant size and can be verified in time that is linear in |x|.
A zk-SNARK for circuit satisfiability is a triple of polynomial-time algorithms
(KeyGen, Prove, Verify)
• KeyGen(1λ
, C) → (pk, vk). On input security parameter λ and a circuit C, KeyGen probabilistically
samples pk and vk. Both keys are published as public parameters and can be used to prove/verify
membership in LC .
• Prove(pk, x, w) → π. On input pk and input x and witness for the NP-statement w, the prover Prove
outputs a non-interactive proof π for the statement x ∈ LC .
• Verify(vk, x, π) → {0, 1}. On input vk, an input x, and a proof π, the verifier Verify outputs 1 if
x ∈ LC .
We refer the interested reader to [6, 7, 8] for formal presentation and implementation of zk-SNARK systems.
Generally these systems require the KeyGen operation to be run by a trusted party; novel work on Scalable
Computational Integrity and Privacy (SCIP) systems [9] shows a promising direction to avoid this initial
step, hence the above trust assumption.

3.4.2 Seal operation

The role of the Seal operation is to (1) force replicas to be physically independent copies by requiring provers
to store a pseudo-random permutation of D unique to their public key, such that committing to store n
replicas results in dedicating disk space for n independent replicas (hence n times the storage size of a
replica) and (2) to force the generation of the replica during PoRep.Setup to take substantially longer than
the time expected for responding to a challenge. For a more formal definition of the Seal operation see [5].
The above operation can be realized with Sealτ
AES−256, and τ such that Sealτ
AES−256 takes 10-100x longer than
the honest challenge-prove-verify sequence. Note that it is important to choose τ such that running Sealτ
BC
is distinguishably more expensive than running Prove with random access to R

#### References

[1] Juan Benet. IPFS - Content Addressed, Versioned, P2P File System. 2014.

[2] Giuseppe Ateniese, Randal Burns, Reza Curtmola, Joseph Herring, Lea Kissner, Zachary Peterson, and
Dawn Song. Provable data possession at untrusted stores. In Proceedings of the 14th ACM conference
on Computer and communications security, pages 598–609. Acm, 2007.

[3] Ari Juels and Burton S Kaliski Jr. Pors: Proofs of retrievability for large files. In Proceedings of the

14th ACM conference on Computer and communications security, pages 584–597. Acm, 2007.

[4] Hovav Shacham and Brent Waters. Compact proofs of retrievability. In International Conference on
the Theory and Application of Cryptology and Information Security, pages 90–107. Springer, 2008.

[5] Protocol Labs. Technical Report: Proof-of-Replication. 2017.

[6] Rosario Gennaro, Craig Gentry, Bryan Parno, and Mariana Raykova. Quadratic span programs and
succinct nizks without pcps. In Annual International Conference on the Theory and Applications of
Cryptographic Techniques, pages 626–645. Springer, 2013.

[7] Nir Bitansky, Alessandro Chiesa, and Yuval Ishai. Succinct non-interactive arguments via linear interactive
proofs. Springer, 2013.

[8] Eli Ben-Sasson, Alessandro Chiesa, Daniel Genkin, Eran Tromer, and Madars Virza. Snarks for c:
Verifying program executions succinctly and in zero knowledge. In Advances in Cryptology–CRYPTO
2013, pages 90–108. Springer, 2013.

[9] Eli Ben-Sasson, Iddo Bentov, Alessandro Chiesa, Ariel Gabizon, Daniel Genkin, Matan Hamilis,
Evgenya Pergament, Michael Riabzev, Mark Silberstein, Eran Tromer, et al. Computational integrity
with a public random string from quasi-linear pcps. In Annual International Conference on the Theory
and Applications of Cryptographic Techniques, pages 551–579. Springer, 2017.
[10] Henning Pagnia and Felix C G¨artner. On the impossibility of fair exchange without a trusted third party.
Technical report, Technical Report TUD-BS-1999-02, Darmstadt University of Technology, Department
of Computer Science, Darmstadt, Germany, 1999.

[11] Joseph Poon and Thaddeus Dryja. The bitcoin lightning network: Scalable off-chain instant payments.
2015.

[12] Andrew Miller, Iddo Bentov, Ranjit Kumaresan, and Patrick McCorry. Sprites: Payment channels that
go faster than lightning. arXiv preprint arXiv:1702.05812, 2017.

[13] Protocol Labs. Technical Report: Power Fault Tolerance. 2017.

[14] Protocol Labs. Technical Report: Expected Consensus. 2017.

[15] Iddo Bentov, Charles Lee, Alex Mizrahi, and Meni Rosenfeld. Proof of activity: Extending bitcoin’s
proof of work via proof of stake [extended abstract] y. ACM SIGMETRICS Performance Evaluation
Review, 42(3):34–37, 2014.

[16] Iddo Bentov, Rafael Pass, and Elaine Shi. Snow white: Provably secure proofs of stake. 2016.

[17] Silvio Micali. Algorand: The efficient and democratic ledger. arXiv preprint arXiv:1607.01341, 2016.

[18] Vitalik Buterin. Ethereum

[19] Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system, 2008.

[20] Eli Ben Sasson, Alessandro Chiesa, Christina Garman, Matthew Green, Ian Miers, Eran Tromer, and
Madars Virza. Zerocash: Decentralized anonymous payments from bitcoin. In Security and Privacy
(SP), 2014 IEEE Symposium on, pages 459–474. IEEE, 2014.