| 6-Jan-2018 | Like this? Dislike this? Let me know |
Blockchain ... it's almost too much to take on in one rant.
First and foremost, there is no single precise definition of "blockchain" like there is with, for example, the derivative of x2+1. The term blockchain is now used broadly to cover a soup of approaches involving immutability, transaction management, distribution of data, payload, and consensus. Here is a sample from both ends of the spectrum:
| Concept | Bitcoin | Hyperledger (incl. many variations within) |
|---|---|---|
| Participants | Completely anonymous, known only by public key | All participants well-known and identity is vetted |
| Transactions | Accumulate in an uncommitted block (effectively a bucket). Miners attempt to find the right data (a "nonce") to add to the block that will produce a properly constructed hash or fingerprint of the block, after which the block can be committed to the chain and the result broadcast to the mining network | No mining, no nonces, and essentially no blocks. Each transaction (e.g. modification of a loan agreement, updating a current assessed value, etc.) yields a new version, which is hashed and committed to the chain |
| Consensus | Statistically driven, relying on large number of participants. Side chains can emerge but eventually, participants add more and more blocks to one particular chain, leading to longest chain wins model. Statistics suggest that after 6 blocks have been committed to a chain, the transactions within are nearly (but not 100%) guaranteed to be correct and without double-spending. | Workflow entitlements driven, relying on specific actions by specifically named participants. No consensus required although the workflow might demand two or more participants to do something before state change can take place. But this is not the same thing as law-of-large-numbers statistical consensus. |
| Distribution Model | Fully distributed data and processing running on any infrastructure from the cloud to a PC on a desktop. Many nodes in the network, each with a copy of the blockchain. Nodes broadcast changes and listen for others and each applies the same algorithms to achieve global consensus. | (Most extreme variation) Single copy of a single workflow running on infrastruture in the cloud hosted by a major company. No other nodes, no other copies. APIs typically exist for participants to "listen" for new activity on the workflow, upon which they can manually "copy down" the latest versions into their own technology (which may have nothing to do blockchain) for local processing and querying. Note these local copies are NOT part of any consensus / data integrity model. |
| Payload | Completely objective and context-free, the value and bookkeeping data about the bitcoin. Any party examining the payload can understand it | The digital asset is an arbitrary payload such as a loan that may have a great deal of subjective and context-sensitive data. Consistent relevance/importance and interpretation of all the data to every party involved in the workflow is highly questionable, e.g. the building inspector does not care about the LTV and toggle rate parameters of the loan -- and by extension does not want to in any way be responsible for assuring their integrity |
So... which one is correct? Both. Wikipedia summarizes blockchain thusly and I believe it is not only a fair description, but one that could be applied to both scenarios above:
A blockchain, originally block chain, is a continuously growing list of records, called blocks, which are linked and secured using cryptography. Each block typically contains a hash pointer as a link to a previous block, a timestamp and transaction data. By design, blockchains are inherently resistant to modification of the data. The Harvard Business Review describes it as "an open, distributed ledger that can record transactions between two parties efficiently and in a verifiable and permanent way." For use as a distributed ledger, a blockchain is typically managed by a peer-to-peer network collectively adhering to a protocol for validating new blocks. Once recorded, the data in any given block cannot be altered retroactively without the alteration of all subsequent blocks, which requires collusion of the network majority.The landscape is filled with terms like "distributed ledger" and "smart contract" and different definitions have been applied to each one depending on the particular product at hand. In other words, there are many different (and useful and interesting) products and solutions performing very different workloads at different scale and performance -- and each trying to tag the solution with as many blockchain terms as possible.
Instead of adding to the mess by proclaming another top-down definition of the blockchain as it revolutionizes yet another business use case, let's instead start fresh from the bottom up: a chain of transactions.
fingerprint1 = hash(version 1)
fingerprint2 = hash(version 2)
merged = concatenate(fingerprint1, fingerprint2)
chain_fingerprint_from_1_to_2 = hash(merged)
fingerprint2 = hash(version 2)
fingerprint3 = hash(version 3)
merged = concatenate(fingerprint2, fingerprint3)
chain_fingerprint_from_2_to_3 = hash(merged)
fingerprint3 = hash(version 3)
fingerprint4 = hash(version 4)
merged = concatenate(fingerprint3, fingerprint4)
chain_fingerprint_from_3_to_4 = hash(merged)
As a result, given a list of versions of a transaction, it is possible for anyone to "walk" the list and recalculate all the fingerprints and ensure that the recalculated data matches whatever was originally stored. Not a single byte of any of the versions can change nor can the order of the list. No secret keys are required; in fact, no keys are required at all and the process is patently transparent. It almost does not matter if a thing in fact has a "version number" as part of its data payload. It is the creation order and fingerprint chaining that is the ultimate guarantor of integrity and transaction activity over time.
Most popular blockchain implementations, however, assume (rightly so) that individual version integrity and lineage cannot be separated. Thus, instead of tracking both the individual version fingerprint and the chain_fingerprint, each individual version fingerprint includes the fingerprint from the prior version as well:
merged = concatenate(null, version 1); // 1st is special; no prior fingerprint!
fingerprint 1 = hash(merged)
merged = concatenate(fingerprint 1, version 2);
fingerprint 2 = hash(merged)
merged = concatenate(fingerprint 2, version 3);
fingerprint 3 = hash(merged)
merged = concatenate(fingerprint 3, version 4);
fingerprint 4 = hash(merged)
As valuable and important as chain immutability is, there are several important points that should be made here:
Note: Even in a single shared central ledger design like the IBM Cloud Hyperledger, it is very likely that participants will have to make an out-of-process copy of the ledger in order to integrate the data with other systems. Long story short, there is no practical way you are going to issue a SELECT statement to join the blockchain persistor to your local database.
Signing transactions is a vital part of blockchain security/integrity but
it also introduces risk because unlike hashing which is completely
identity-independent, signing requires a private key and something in the
blockchain process must exist to deliver unsigned data to you so you can
securely sign in and pass the result back. You must be very careful to physically guard your private keys.
It is much, much easier to steal a private key than to computationally attack
encrypted material. This challenge has been present for more than 20 years.
But...
In the emerging world of smart contracts, this
could have devasting consequences as contracts signed by you (but not really you) automatically transfer ownership of your car to an unintended third
party, which quickly sells the car for bitcoins, remaining anonymous and leaving
you to deal with the new owner who can present cryptographically secure proof
that he owns the asset. Because people are fallable -- much more so than
strong cryptography -- clearly legal counsel will continue to be a needed
profession.
The basic idea in mining a new block is to get the fingerprint to look like a special target sequence with a certain number of leading zeros. For example, instead of the fingerprint looking like this:
8e12fd1980258264f694cf2fa788388af9172c1ce9fc994aea3f6067e50414d5
0000000000000000057fcc708cf0130d95e27c5819203e9f967ac56e4df598ee
fingerprint = null;
while(fingerprint does not contains required amount of leading zeros) {
nonce = 4 bytes of random material (32 bits, or 4 billion possibilities);
fingerprint = hash(block of transaction data + nonce);
}
In the mid 1990s, we stored smallish perl programs in a database as a BLOB. These programs exploited the compactness and "quickness" of perl to perform if/then/else logic and array and hashmap manipulation without getting buried in the rigid and unterse syntax of C++. Every night, a C++ program linked with the perl interpreter would iteratively fetch these programs based on various criteria, determine the data needs, make market and other information available to it, let it run its perl logic (which could also make use of the parent C++ program's high performance functions and, indeed, the distributed computing environment), and then save results back to the database.Sound familiar? It is also important to note that even today most smart contract implementations have some sort of a runtime context around them. In other words, the contract software as a unit of release just "sits there"; something has to run it and bring it to life. In the example above, the parent C++ program was the execution engine that took care of this. Today, smart contracts require a similar engine that is live and sitting on top of the blockchain. The code for the smart contract is part of the data payload managed on the blockchain and enjoys the same benefits of immutability as regular "simple" data like fields of numbers and text.
Note that you actually don't need a blockchain to make a smart contract run, but the there are 2 important features that the blockchain brings to the table:
To be fair, smart contracts have a little more work to do than our 1995 version:
In the Bitcoin system, consensus is achieved through proof-of-work by a
statistically important large number of participants performing extremely
objective and clear (but time/cost expensive) operations for which there is
specific incentivization. Perfect.
But the concept of consensus gets murkier when it is not a statistically
based problem involving data much more complex than a bitcoin value. For
example, consider a real estate processing blockchain involving 6 parties:
the buyer, the seller, the broker, the buyer's bank, the housing inspector,
and an escrow bank. There is no bitcoin-style consensus here. There are
not 1000s of miners performing the same task in parallel, each trying to
win the next block. Instead, there is only one of each type of participant,
each with a different set of responsibilities and incentivizations. This
gives rise to the following:
No participant will provide input to consensus or voting regarding authenticity and/or accuracy on data or process for which they are not:This does not defeat the usefulness of the blockchain, of course, but developers of solutions must be careful when using the term "consensus." Consensus is only appropriate when 2 or more participants work in parallel and a mathematical model is employed to determine if conditions are sufficient for workflow to move forward. It is worth noting that consensus does not have to be a PhD-complex algorithm or one that demands large numbers of participants -- and in fact, many consensus models in longer cycle business-transaction workflows look very much like standard workflow approval, e.g. if a simple majority of participants at stage n say all is well, proceed to stage n+1. Or even simpler (and very common): when all participants say all is well, move to stage n+1. As such, in most business workflows, it will be necessary to clearly define the fields for which a participant has "vouching/review" responsibility. This is the next step beyond basic read/write entitlements.
- Incentivized (typically, but almost always through monetary compensation)
- Protected by legal precedence based on an in-economy set of risk mitigation procedures.
An exciting opportunity exists to hybridize single-actor workflow together with consensus via crowdsourced incentivized participation. Relatively simple but somewhat more subjective steps in a workflow could be tackled by dozens or more participants, making their responses (mean and standard deviation) more statistically relevant.
Like this? Dislike this? Let me know