Fluence is a decentralized database based on p2p network of independent peers. It provides
all features of traditional databases like querying and filtering data with the addition of total
encryption and flexible data access management. Fluence is relatively fast, scalable, fault
tolerant and censorship resistant by design.
Fluence organizes nodes to clusters that responsible for particular Dataset. Each cluster keeps its own blockchain to have a consensus about all operations and rewards. Also, a set of nodes called Arbiters looks after each cluster to verify operations. Storage economy is implemented with storage contracts, which all parties should sign and correspond to. Rewards for nodes are defined by contract conditions after providing proof-of-retrievability every time tick. For a node, there is no way to spend rewards other than by withdrawal.
The project implements end-to-end encryption for both NoSQL database and B-Tree indices, allowing to perform range queries and search by key. Each request is processed by a set of nodes that have to reach consensus about results. Additionally, we use proxy re-encryption technology to let encrypted data be shared with other parties without exposing encryption keys.
Fluence relies on libp2p of IPFS for networking. It is based on S/Kademlia approach: every
node and every resource (Dataset in our case) have an ID from the same ID space. We place
the database to nodes with IDs closest to Dataset ID, to achieve the more uniform distribution
and faster allocation.
The keyspace is the only shared state in the whole Fluence Network, so there's no need to keep open connections or continually exchange any data between nodes. However, each Dataset forms its own subnetwork, that becomes fully connected while operates. This subnetwork consists of Cluster Nodes which store Dataset replicas and perform operations on it and Arbitrage Nodes. Arbitrage Nodes role is to verify the Cluster's state.
Each node in Fluence Network has several roles: it participates in some Clusters, storing and
managing corresponding Datasets and Blockchains, it arbiters some other Clusters, helps
them keep their Blockchains valid and consistent. Each node should be ready to accept
incoming connection from the Client and perform outgoing communication.
In general, the node consists of three parts: Network Layer (described in previous part), Dataset Management System, and Blockchain.
Dataset consists of one or more indexed columns and unindexed rows. All data is encrypted
on the client side, so that node knows nothing about its contents.
Rows are stored as raw byte arrays in a key-value store. The structure of rows contents, the number of columns are hidden from the node.
For ordered indices, B-Tree data structure is used. We reuse ZeroDB approach to search through the encrypted index. The Client can apply Order Preserving Encryption to prepare indices, in this case, querying can be made without round trip, as a node can compare values and select rows by ranges of indexed values.
Dataset is fully replicated among all Cluster Nodes. To enable replication and recovery, all writing operations to Dataset are saved to a journal.
After Dataset state change, Cluster Nodes must reach consensus in the new state's hash, and store a new block in Cluster Blockchain. This block contains proofs that all nodes after performing given writes have Dataset files with the same hash.
Cluster blockchain is used to safely store Datastore meta information: contract, functional
tokens transactions, proxy re-encryption key for data sharing and witnesses of proofs of
retrievability. Blocks are being built and added by Cluster Nodes, and validated by Arbiters.
Each node checks the block’s content for the validity. Consider Proof of Retrievability block: every tick of time (1 hour by default) all Cluster Nodes must prove that they are alive and have a copy of latest Dataset to query over. For each node, after each tick, one or more Arbiter asks for part of data and corresponding salted hash, to prove that node still has the data. Then Arbiter adds results of its check to the block and signs it with the private key.
Each block has all Cluster Nodes signatures and some Arbiter signatures. It contains a link to previous block's hash and exposes block ID with its own hash, so at any moment of time, it's hard to fake. That's how Fluence makes cheap and fast internal sources of trust.
Client API Overview
To operate with Fluence, a Client should install Fluence service. It performs all low-level
operations and offers simple unencrypted API as a local service.
For each Dataset, Client API performs key pairs management in private keys store. The Client should back it up in some cold dark place.
Client API provides simple JSON MongoDB-like gateway to all Client's data. You can think of each Dataset as of MongoDB's collection.
In a case of data being too big, Client API handles Transparent Sharding: a new Dataset is allocated on another Cluster in Fluence Network, and the Client can operate both, just like they are located in a single place.
Functional Token I/O
To fuel up Fluence Network, we introduce Fluence Functional Token (FFT) – an internal token
to handle transactions between Clients and node owners. This token is not tradable and not
transferable; it is issued and burned by Fluence Gateway in return to Tradable Tokens (FLU).
FFT aims to provide fast and cheap transactions within Dataset Cluster. When FFT are issued in external blockchain (via Ethereum contract of our Gateway), it's tracked by the corresponding Cluster, and the incoming transaction is placed on Cluster Blockchain.
In order to store Dataset or perform any operation with, the Client is required to pay with FFT tokens. During payment, tokens are transferred to node account in Cluster Blockchain
Once node owner wants to withdraw FFT and get tradable tokens, he creates a burn transaction, which is seen and verified by the Gateway, and tokens are issued to Ethereum address.
Storage contract is an agreement between Client and Cluster, and it’s stored in Cluster
Blockchain. The contract is signed by all parties including Arbiters at the beginning of the
collaboration; it has a time period, data allocation size, read price, etc.
The example of contract object:
● Contract ID (public key)
● Client ID (public key)
● Valid until (default: month)
● Nodes, Arbiters
● Replication size
● Node requirements
● Allocation in GB
● Max response time
● Finance conditions
● Gas price
The Contract is being deposited by the Client and is active while it has funds. Any time Client can terminate the Contract. If there are enough tokens on Contract at the moment when its period ends, its prolongated automatically.
Nodes receive rewards both for storage and performing operations on data. To receive rewards for storing, every hour each node in the Cluster should perform a proof of retrievability by putting proof hash in Cluster Blockchain and getting verifications from Arbiters. Nodes actually get funds for Contract only when its period ends. If the node is offline for too long during the Contract period, it’s punished either by not receiving tokens for the hours when it’s offline, or – in the worst case – by not receiving tokens for the Contract at all, and being removed from the Contract in favor of other, more stable nodes. This decision is made by Arbiters and is regulated by SLA.
Another way to get tokens is performing operations on data. For each operation, its complexity is evaluated using the formula known by all parties. Each “write” operation must be executed and signed by all nodes of the Cluster, so all nodes are rewarded. For read operations, the Client can choose between better data integrity guarantees (more signatures that different nodes performed the same query and got the same result) or speed (even with just one signature).
Sharing Data Overview
In addition to enabling storage of structured sensitive data, Fluence Network also lets Clients
share their data with known third parties, and get rewarded.
Dataset Sharing is organized with Sharing Contract, which is placed on Cluster Blockchain. By default, Sharing Contract sets zero profit – zero loss policy: the same gas price for operations as on the main contract with no additions. The receiver must fund the sharing contract before querying it, and then each query is billed just as Client's request is.
However, the Client has a possibility to set up increased operations price for Sharing Contract. In this case, on every read, the transaction is divided between nodes and Client.
Proxy Re-Encryption Key
All data in the Dataset is encrypted by Client's private key, so just sharing the Contract is not
enough to get data. Along with Sharing Contract, the Client provides Proxy Re-Encryption Key,
which is derived from Client's Private Key and receiver's Public Key.
Once receiver asks Cluster Nodes for data, data is re-encrypted using the Proxy key. Then a receiver can decrypt data with his private key. No data is disclosed to third parties.
To provide fine grained access control, the Client can encrypt the Data with a tree of Private Keys, even with a special key for every row. In this case, at the price of more complex keys management, the Client can share just the required amount of data.
The described system has a few attack vectors. Some of them typical for all distributed
systems, some are database and storage specific. We’re describing just a part of attacks;
other require further investigation.
Malicious node has multiple ways to attack system security. Because we don’t use PoW, there is no work that node should perform to generate a block. The node can speed up time, pretend that next time tick for the new block is coming and start issuing a message with a new block.
Fluence has two levels protection architecture from such attacks. At first, nodes in the same cluster should decline messages of this node, because they are concerned to receive a reward for the whole contract and don’t be punished. However, the Cluster is limited (usually about seven nodes) and can unite for cheating. The Cluster can act like single malicious node, by not checking each other’s proof of retrievability and speeding up time for blocks.
If that happens, Arbiters, which number is much greater than Cluster, throw out malicious nodes from the Cluster and choose substitutions from themselves. The chance of cheating is defined by the amount of Arbiters for each Cluster. Because of Arbiters only store Cluster Blockchain, but not data, it is very easy to scale their amount without harm for the network.
There is no motivation for Arbiters to cheat because most of the network will notice that and they will lose reward for block verification.
Regulators may try to block or isolate nodes that store unwanted content. Due to cluster limitation, it is possible, for example, to ban nodes by IP in particular country. However, the Client has all tools to run recovery mode for Cluster and substitute nodes in the Cluster to Arbiters. Because Arbiters number is rather large, the regulator will have to ban new nodes every time.
Denial of Remove
Probably, some node may deny "remove data" requests from the Client and keep it on the
drive. This behavior brings no benefit since data is encrypted. If such node tries to participate
in requests/responses in the Cluster, it will be ignored by other nodes because it has different
database version and can’t perform right database requests.
Also, the node may store re-encryption key that was asked to be revoked by owner. However, this won’t bring benefit to data buyer, because this node’s responses will be ignored by Cluster consensus.
Data buyer may try to set up the own node to get in the Cluster and recover access to data. The probability of this is minimal because to be accepted by Cluster node should get ID that close to Cluster by Kademlia.