Nanocoin

Nanocoin is a PoC cryptocurrency written in Haskell. It is under development to illustrate that the foundations of DLTs (Distributed Ledger Techonology) and aid in dispelling the air of mystery about what a cryptocurrency is. The project attempts to present the necessary cryptographic components and distributed systems knowledge necessary to implement a cryptocurrency.

Video of Presentation at Boston Haskell

Project goals:

Well-documented.
Low-dependencies.
Using classic proof-of-work.
Simple P2P Protocol using UDP chatter.
Support chain reconfiguration.
Basic ECDSA Block/Transaction signatures & validation
Transfer Transactions
In-memory.

Running a Node

Install the Stack build system:

$ stack setup
$ stack install nanocoin
$ nanocoin

Running nanocoin will spin up a Node with an RPC server running on localhost:3000 and a P2P server communicating with basic UDP Multicast on port 8001.

You can specify which port to run the RPC server on, and from which directory to load a node's public/private ECC key pair. If you do not supply a KEYS_DIR, the node will generate a random key pair with which to issue transactions and mine block.

Usage: nanocoin [-p|--rpc-port RPC_PORT] [-k|--keys KEYS_DIR]

Enabling Multicast

Nanocoin uses a UDP multicast gossip protocol for P2P networking. Before running the program, make sure your localhost network interface has multicast enabled. The simplest way to do this on Unix-like systems is:

$ sudo ifconfig lo multicast

Otherwise, the program will fail with addMembership: failed (Unknown error -1) error from the Multicast module when attempting to add the node to the multicast group.

Usually, lo is the name of the local loopback (localhost) network interface on modern Linux machines, but could differ depending on your machine and/or OS. To check what your localhost network interface is, type ifconfig; The interface prefixed by lo, e.g. lo0 is usually the interface of interest. Simply replace lo with the name of your localhost network interface in the command above.

RPC Interface

Nanocoin's RPC interface is implemented via an HTTP web server that serves as both a command and query entry points. Simply type in localhost:XXXX/<cmd-or-query> to interact with the node1:

Queries

/address:

View the address of the current node (derived from the nodes public key)

/blocks:

View the blocks on the block chain, including their transactions.

/mempool:

View the current collected transactions that have not yet been included in a
block on the network.

/ledger:

View the current state of the ledger, representative of all the transactions
of all the blocks applied in order to result in a cumulative ledger state.

Commands

/mineBlock:

Attempt to mine a block containing the transactions currently in the node's mempool.
This will fail if there are no transactions in the mempool.

/transfer/:toAddress/:amount

Issues a `Transfer` transaction to the network, transferring the specified 
`amount` of Nanocoin from this node's account to another node's account 
designated by `toAddress`. If you try to transfer more Nanocoin than you
have, the transaction will be rejected during the block mining process and
purged from all nodes' mem-pools.

Overview of Basic Cryptocurrency Components

This document serves as a brief overview of blockchain components and how to implement a cryptocurrency from scratch. Basic blockchain knowledge is assumed, but brief overviews of the basic components are included.

Cryptography

There are many cryptographic components that come together to make block chains and cryptocurrency implementations more secure. Nanocoin uses Haskell's most notable cryptography library known as cryptonite as it provides all the cryptographic functions and datatypes relevant to a cryptocurrency implementation.

Hash Functions

Hash functions are pure, one-way functions which, for every input, produce a unique fixed-length output which reveals no information about the input. Strong hash functions produce output that can be thought of as a digital fingerprint of the input data. The most important thing to note is that good hash functions obfuscate the original input, and the most minor change in the input results in a dramatically different output.

Example:

ghci> import Data.ByteString
ghci> import Crypto.Hash (hashWith)
ghci> hashWith SHA3_256 ("1234" :: ByteString)
1d6442ddcfd9db1ff81df77cbefcd5afcc8c7ca952ab3101ede17a84b866d3f3
ghci> hashWith SHA3_256 ("12345" :: ByteString)
7d4e3eec80026719639ed4dba68916eb94c7a49a053e05c8f9578fe4e5a3d7ea

Hashes are usually displayed in base16 format for brevity

Since hash functions can be thought of a digital fingerprints of data, it is often an effective and efficient way for two parties to confirm that they possess the same data. Instead of both parties having to share their original copy of the data they wish to compare, they can each share with each other the hash of the data and compare a much smaller piece of data (assuming they use the same hash function).

Collisions:

Since hash algorithms produce a fixed sized output, i.e. an infinite number of inputs map onto a finite number of outputs, there is a chance that two inputs produce the same output. However, the chance of collisions for modern popular hashing algorithms are negligible, and when collision attacks become viable attack vectors, academia and industry are usually prepared. the former with a more secure hashing algorithm and the latter with the adoption of the new algorithm.

SHA3_256

Sha3_256 (Secure Hash Algorithm 3) is one of the more popular hashing algorithms, in which every input is converted into a 256 bit output. Its implementation is very complex, but yields a fast and secure algorithm with which collisions will not feasibly happen given todays computing power.

Finite Fields

A Finite Field GF(p) can be described as a cyclic group with a prime order, or Z mod p (the set of integers Z modulus a prime number p), closed over addition (+) and multiplication (*) operations. The result of each operation is mod p.

`+`	0	1	2
0	0	1	2
1	1	2	0
2	2	0	1

`*`	1	2
0	0	0
1	1	2
2	2	1

TODO: elaborate...

Visit this link for a more in depth dive into Finite Fields mathematics.

ECC

Elliptic curve cryptography (ECC) is an approach to public-key cryptography based on the algebraic structure of elliptic curves over finite fields with very large prime orders. This version of public key cryptography provides the benefit of equal security with the public and private keys needing fewer bits than tradition cryptographic schemes based on the discrete log problem.

Elliptic Curves

In the cryptonite library, a Curve and Point are defined as follows:

-- | Define either a binary curve or a prime curve.
data Curve = | CurveFP  CurvePrime  -- ^ 𝔽p

-- | Define an elliptic curve in 𝔽p.
-- The first parameter is the Prime Number.
data CurvePrime = CurvePrime Integer CurveCommon

-- | Define common parameters in a curve definition
-- of the form: y^2 = x^3 + ax + b.
data CurveCommon = CurveCommon
  { ecc_a :: Integer -- ^ curve parameter a
  , ecc_b :: Integer -- ^ curve parameter b
  , ecc_g :: Point   -- ^ base (generator) point
  , ecc_n :: Integer -- ^ order of G
  , ecc_h :: Integer -- ^ cofactor
  }

-- | Define a point on a curve.
data Point = Point Integer Integer
           | PointO -- ^ Point at Infinity

Visit this link for a more in depth dive into the math behind elliptic curves.

Public/Private Key Pairs

ECC defines public keys to be a point on the Elliptic Curve, and a private key as a secret number k within the order of the curve (e.g. if the curve's order is 17, the private key will be a number n ∈ {0,.. ,17}). Public keys are derived from private keys by multiplying the generator point G by the private scalar k.

This scheme yields a result similar to the discrete log problem, but instead of it being difficult to recover a secret exponent x in the equation g^x mod p, it is difficult to recover the secret scalar k in the equation Gk mod p.

In the cryptonite library, these keys are defined with the following data structures:

Publie Key:

-- | ECDSA Public Key.
data PublicKey = PublicKey
  { public_curve :: Curve
  , public_q     :: PublicPoint
  }

Private Key:

type PrivateNumber = Integer

-- | ECDSA Private Key.
data PrivateKey = PrivateKey
  { private_curve :: Curve
  , private_d     :: PrivateNumber
  }

Digital Signature Algorithm (ECDSA)

To sign a piece of data is to provide the two resulting integers from the ECDSA (computed through a hash function). Inputs for the signing function are:

The data as a string of bytes, msg.
The elliptic curve private-key, d.

The signing algorithm will be described in terms of the curve Secp256k1, where p is the secp256k1 prime, and G is it’s respective generator point:

Hash the document byte stream such that z = H(msg)
Generate a random value k ∈ {1,..,p−1}
Compute (x,y) = kG
Compute r = x mod p, if r = 0 go back to step 1
Compute s = (z + rd) / k , if s = 0 go back to step 1

The resulting (r,s) pair is the signature.

To verify the signature resulting from the signing algorithm the public key corresponding to the private key used in the signing algorihthm is used. Inputs to the verification algorithm are:

The data as a string of bytes, msg
The signature, (r,s)
The EC public key Q

The output of the verification algorithm is simply a boolean indicating whether the signature provide is indeed a valid signature of the given data correspoding to the private key with which it was signed. The algorithm is as follows:

Compute z = H(msg)
Compute t = (z mod p) / s
Compute u = (r mod p) / s
Let (x,y) = tG + uQ
Verify that r = x mod p

To sign and verify a piece of data using cryptonite:

ghci> import Crypto.Number.Hash (SHA3_256)
ghci> import Crypto.PubKey.ECC.ECDSA (sign, verify)
ghci> import Crypto.PubKey.ECC.Generate (generate)
ghci> import Crypto.PubKey.ECC.Types (getCurveByName, SEC_p256k1)
ghci> let msg = "hello world" :: ByteString
ghci> let secp256k1 = getCurveByName SEC_p256k1
ghci> (pubKey, privKey) <- generate secp256k1
ghci> sig <- sign privKey SHA3_256 msg
ghci> verify SHA3_256 pubKey sig msg
True

Merkle Trees

TODO: Implementation and docs. Reference: https://github.com/adjoint-io/merkle-tree

Distributed Ledgers (A.K.A. Blockchain)

Distributed Ledgers are replicated state machines that are kept in sync via P2P discovery and consensus protocols. The state that each node in the network (i.e. instance of this state machine) keeps track of is a Ledger, a data structure that keeps track of transactions on the network. Each node's ledger is only modified when a block of transactions is broadcast to the network. Valid blocks are determined by implementation specific chain-rules and consensus algorithms.

Addresses

Addresses are succint representations of ECC Public Keys, and for all intents and purposes can be thought of as a slightly more succinct digital fingerprint than a SHA3_256 hash. The deriveAddress function maps a hash of a EC point to a unique, irreversible identity that uniquely defines a participant in the network and any participant can verify integrity of it's coherence to a public key.

For the Nanocoin project, we are simply copying Bitcoin's address implementation, plus a few extra hashes for good measure. Addresses are usually base58 encoded for brevity, and to distinguish them from SHA3_256 hashes which are encoded with base16.

-- | address(x,y) = addrHash(string(x) <> string(y))
deriveAddress :: Key.PublicKey -> Address
deriveAddress pub = Address (b58 addr)
  where
    (x, y) = Key.extractPoint pub
    addr   = BA.convert $ deriveHash pstr
    pstr   = i2osp x <> i2osp y

-- | addrHash(n) = sha256(sha256(ripemd160(sha256(n))))
deriveHash :: ByteString -> ByteString
deriveHash = Hash.sha256Raw'
           . Hash.sha256Raw'
           . Hash.ripemd160Raw'
           . Hash.sha256Raw'

Ledger

The Ledger is the data structure that defines the replicated state to which modifications can be made via transactions, and more specifically by accepting blocks of transactions that are generated by miners, using a simple consensus algorithm.

The Ledger maps addresses to balances, where:

type Balance = Int

-- | Datatype storing the holdings of addresses
newtype Ledger = Ledger
  { unLedger :: Map Address Balance 
  } deriving (Eq, Show, Generic, Monoid)

Transactions

Transactions denote atomic modifications to the replicated state machines' state, and are broadcast by nodes when user wish to issue a transaction on the network. Transactions are defined in Nanocoin as:

data Transaction = Transaction
  { header    :: TransactionHeader
  , signature :: ByteString
  }

Transaction Headers must be signed with the private key corresponding to the senderKey field of Transfer transaction. The transaction's signature is verified using the ECDSA.verify function in cryptonite.

Transaction Headers:

The TransactionHeader type denotes the type of transacation issued, and defines the type of stateful modification that must be applied to the ledger, given the transaction is valid. For Nanocoin, the only transaction needed is Transfer, which transfers an amount of Nanocoin from one Address to another.

data TransactionHeader =
  Transfer
    { senderKey :: Key.PublicKey
    , recipient :: Address
    , amount    :: Int
    }

This is the whole point of Nanocoin! To transact! The beauty is in the simplicity of this model.

Validation (Transaction)

For a node to validate a transaction:

The transaction signature must be verified against the transaction issuers private key using EDCSA verify.
The transaction header must be applied to the world state, and is valid if the ledger state modification is successful

The validation can be broken down into several parts:

-- | Verifies the transaction signature by looking up the transaction origin by address,
-- and verifying the signature of the transaction header with the resulting public key.
verifyTransactionSig :: Transaction -> Ledger -> Either InvalidTx ()

-- | If the sender has Nanocoin >= the the amount being transferred, the transfer of 
-- funds is applied to the ledger state.
applyTransfer :: TransactionHeader -> Ledger -> Either InvalidTx Ledger

Sometimes a transaction is only valid with respect to a ledger state resulting from a transaction that was issued previous in the block, and therefore ledger state must be accumulated when applying transaction to the ledger state.

Blocks

Blocks are the representation of a list of transactions that have been validated, which represent an atomic modification to the the ledger state. The word Blockchain comes from the fact that block headers reference previous blocks that have been mined by nodes in the network. This chain starts from a genesis block, which represents an initial ledger state from which each block in the chain builds upon. Blockchains are sequences of blocks representing lists of transactions that should be applied, atomically and sequentially to the ledger state, such that the final state after every block is replicated across all nodes in the network.

Blocks are represented in Nanocoin with the following data structures:

type Index      = Int
type Timestamp  = Integer
type Blockchain = [Block]

data BlockHeader = BlockHeader
  { origin       :: Key.PublicKey -- ^ Public Key of Block miner
  , previousHash :: ByteString    -- ^ Previous block hash
  , merkleRoot   :: ByteString    -- ^ Merkle root of tranactions
  , nonce        :: Int           -- ^ Nonce for Proof-of-Work
  } deriving (Eq, Show, Generic, S.Serialize)

data Block = Block
  { index        :: Index         -- ^ Block height
  , header       :: BlockHeader   -- ^ Block header
  , transactions :: [Transaction] -- ^ List of Transactions
  , signature    :: ByteString    -- ^ Block ECDSA signature
  }

Previous block hashes, contained in the previousHash field, are used to identify previous blocks in the chain because of the way in which a change in even one bit of the data of the previous block will result in an incredibly different hash output. This creates a tamper resistant history due to the fact that before a node will accept a new block, it checks to see if hashing the previous block in it's local storage results in the same hash of the new block it has received.

In Nanocoin, the blockchain is stored in the nodeChain :: MVar [Block] in which the most recent block is stored as the head of the list. Chains with no blocks are inherently invalid, because a block chain must start with an initial genesis block. The genesis block must be initialized with a ECC key pair such that nodes with the same gensis block will agree on the same chain.

The genesis block is defined in Nanocoin as:

genesisBlock :: Key.KeyPair -> IO Block
genesisBlock (pubKey, privKey) = do
    signature' <- liftIO $
      Key.sign privKey (S.encode genesisBlockHeader)
    return Block
      { index     = 0
      , header    = genesisBlockHeader
      , signature = S.encode signature'
      }
  where
    genesisBlockHeader = BlockHeader
      { origin       = pubKey
      , previousHash = "0"
      , transactions = []
      , nonce        = 0
      }

Validation (Block)

Validation of a new block consists of the following:

The block signature must be verified against the block origin address's public key
The block index equal to the previous blocks index + 1
The computed hash of the local previous block must match the previousHash field in the new block.
The computed hash of the current block's header must satisfy the difficulty predicate in the PoW Algorithm
The block must have at least 1 transaction
Each transaction in the block header must be valid

The code that implements this is quite straight forward:

data InvalidBlock
  = InvalidBlockSignature Text
  | InvalidBlockIndex
  | InvalidBlockHash
  | InvalidBlockNumTxs
  | InvalidBlockTx T.InvalidTx
  | InvalidPrevBlockHash
  | InvalidFirstBlock
  | InvalidOriginAddress Address
  | InvalidBlockTxs [T.InvalidTx]

-- | Validate a block before accepting a block as new block in chain
validateBlock
  :: Ledger
  -> Block
  -> Block
  -> Either InvalidBlock ()
validateBlock ledger prevBlock block
  | index block /= index prevBlock + 1                 = Left InvalidBlockIndex
  | hashBlock prevBlock /= previousHash (header block) = Left InvalidPrevBlockHash
  | not (checkProofOfWork block)                       = Left InvalidBlockHash
  | null (transactions $ header block)                 = Left InvalidBlockNumTxs
  | otherwise = do
  
      -- Verify signature of block
      verifyBlockSignature ledger block
      
      -- Validate all transactions w/ respect to world state
      first InvalidBlockTx $ do
        let txs = transactions $ header block
        T.validateTransactions ledger txs

Networking

Nanocoin is uses a simple P2P messaging protocol that uses UDP Multicast chatter. This means that every message that is sent by a node on the network is broadcast simultaneously to every node on the network, i.e. every Nanocoin connected to the same router as the node sending the message. This greatly increases network traffic and is not suitable for a real world implementation of a cryptocurrency. Future work on this project will transition from the simple Multicast UDP chatter protocol to a distributed p2p network implemented using cloud-haskell.

NodeState

data NodeState = NodeState
  { nodeConfig   :: Peer                   -- ^ P2P info (rpc port, p2p port)
  , nodeChain    :: MVar Block.Blockchain  -- ^ In-memory Blockchain
  , nodeKeys     :: KeyPair                -- ^ Node key pair
  , nodeSender   :: MsgSender              -- ^ Function to broadcast a P2P message
  , nodeReceiver :: MsgReceiver            -- ^ The source of network messages
  , nodeLedger   :: MVar Ledger.Ledger     -- ^ In-memory ledger state
  , nodeMemPool  :: MVar MemPool.MemPool   -- ^ Mempool to collect transactions
  }

At the core of each node is the node state. This datatype carries data relevant to the operation of the node. The three MVars are the in-memory representation of the blockchain, the ledger, and the transaction pool. The rest of the values are unchanging node configuration data.

MemPool

A transaction mem-pool is simply a data store for transaction broadcast to the network. Each node keeps a list of the transaction that have been broadcast, and updates this list each time a new TransactionMsg arrives from the MsgReceiver; There are three times when a node's mempool is modified:

When a node receives a TransactionMsg with a valid transaction, it adds it to the mempool
Whenever a new block is accepted and applied to the ledger state, a node purges it's mempool of the transaction that were included in the new block.
When a new block is being mined, invalid transactions in the mempool are removed from the mempool.

Nanocoin defines a Mempool as a newtype wrapper around a list.

newtype MemPool = MemPool
  { unMemPool :: [Transaction]
  } deriving (Show, Eq, Generic, Monoid, ToJSON)

addTransaction :: Transaction -> MemPool -> MemPool
addTransaction tx (MemPool pool) = MemPool (pool ++ [tx])

removeTransactions :: [Transaction] -> MemPool -> MemPool
removeTransactions txs (MemPool pool) = MemPool $ pool \\ txs

The node's MemPool is stored in an MVar such that it persists the entire duration of a node's execution, and can be potentially modified by multiple processes safely.

Messaging Protocol

The messaging protocol is defined as simple as possible, with the unfortunate result of greatly increasing the number of messages sent over the network:

data Msg 
  = QueryBlockMsg Int
  | BlockMsg Block
  | TransactionMsg Transaction

Each node in the network is running a process that is listening on the multicast port 8001 and waits to receive messages, dispatching them to a function handleMessage that has the following implementation:

handleMessage :: NodeState -> Msg -> IO ()
handleMessage nodeState msg = do
  case msg of
    ...

QueryBlockMsg index
1. If NodeState - nodeChain has a block with the given index, broadcast the block to the network
2. Else log the error and do nothing
BlockMsg block
1. If genesis block is instantiated, attempt to apply the block to the ledger state.
2. Else log the error and do nothing
TransactionMsg transaction
1. If the transaction signature is valid, add it to the node's mempool
2. Else log the error and do nothing

Consensus Algorithm

Nanocoin uses a simple proof of work (PoW) approach to chain concensus:

To mine a block on the chain, a node must compute a nonce such that that resulting hash of the block being mined begins with a number of 0's equal to round(ln(n)) where n is the length of the current chain (i.e. the index of the current block being mined); This predicate is known as the block difficulty. For this PoW implementation the average nonce computed is 16^n, so when the length of the chain surpasses 12 (round(ln(n)) == 4) it begins to take several seconds to mine each block. As n surpasses 23, mining a block could take well over 10 minutes.

Note: A nonce is an arbitrary positive number repeatedly incremented and added to the block header to produce a hash that has the correct number of leading zeros, denoted by block difficulty.

proofOfWork
  :: Int         -- ^ Difficulty measured by block index
  -> BlockHeader -- ^ Header to hash with nonce parameter
  -> BlockHeader
proofOfWork idx blockHeader = blockHeader { nonce = calcNonce 0 }
  where
    difficulty = calcDifficulty idx
    prefix = toS $ replicate difficulty '0'

    calcNonce n
      | prefix' == prefix = n
      | otherwise = calcNonce $ n + 1
      where
        headerHash = hashBlockHeader (blockHeader { nonce = n })
        prefix' = BS.take difficulty headerHash
        
calcDifficulty :: Int -> Int
calcDifficulty = round . logBase (2 :: Float) . fromIntegral

Validation (Consensus)

Blocks are validated with respect to Nanocoin's PoW consensus algorithm in a very simple way:

checkProofOfWork :: Block -> Bool
checkProofOfWork block =
    BS.isPrefixOf prefix $ hashBlock block
  where
    difficulty = calcDifficulty $ index block
    prefix = toS $ replicate difficulty '0'

This is perhaps the one of the most notable outcomes of simple Proof of Work; Generating a new valid block is computationally difficult, but validating the generated block is computationally trivial.

License

Released under Apache 2.0.

JeremyLWright / nanocoin