The reliability and integrity of blockchain is rooted in there being no chance of any fraudulent data or transactions, such as a double spend, being accepted or recorded. A cornerstone of the technology as a whole and the key components in maintaining this reliability is hashing.
Hashing is the process of taking an input of any length and turning it into a cryptographic fixed output through a mathematical algorithm (Bitcoin uses SHA-256, for example). Examples of such inputs can include a short piece of information such as a message or a huge cache of varying pieces of information such as a block of transactions or even all of the information contained on the internet.
Securing Data with Hashing
Hashing drastically increases the security of the data. Anyone who may be trying to decrypt the data by looking at the hash will not be able to work out the length of the encrypted information based on the hash. A cryptographic hash function needs to have several crucial qualities to be considered useful, these include:
Impossible to produce the same hash value for differing inputs:
This is important because if it were not the case it would be impossible to keep track of the authenticity of inputs.
The same message will always produce the same hash value:
The importance of this is similar to the prior point.
Quick to produce a hash for any given message:
The system would not be efficient or provide value otherwise.
Impossible to determine input based on hash value:
This is one of the foremost aspects and qualities of hashing and securing data.
Even the slightest change to an input completely alters the hash:
This is also a matter of a security. If a slight change only made a slight difference it would be considerably easier to work out what the input was. The better and more complex the hashing algorithm, the larger the impact of changing an input will be on what the output is.
Hashing secures data by providing certainty that it hasn’t been tampered with before being seen by the intended recipient. So, as an example, if you downloaded a file containing sensitive information, you could run it through a hashing algorithm, calculate the hash of that data and compare it to the one shown by whoever sent you the data. If the hashes don’t match, you can be certain that the file was altered before you received it.
In blockchain, hashes are used to represent the current state of the world, or to be more precise, the state of a blockchain. As such, the input represents everything that has happened on a blockchain, so every single transaction up to that point, combined with the new data that is being added. What this means is that the output is based on, and therefore shaped by, all previous transactions that have occurred on a blockchain.
As mentioned, the slightest change to any part of the input results in a huge change to the output; in this lies the irrefutable security of blockchain technology. Changing any record that has previously happened on a blockchain would change all the hashes, making them false and obsolete. This becomes impossible when the transparent nature of blockchain is taken into account, as these changes would need to be done in plain sight of the whole network.
The first block of a blockchain, known as a genesis block, contains its transactions that, when combined and validated, produce a unique hash. This hash and all the new transactions that are being processed are then used as input to create a brand new hash that is used in the next block in the chain. This means that each block links back to its previous block through its hash, forming a chain back to the genesis block, hence the name blockchain. In this way, transactions can be added securely as long as the nodes on the network are in consensus on what the hash should be.
An Explanation of Data Structures
Data structures are a specialized way of storing data. The two foremost hashing objects carrying out this function are pointers and linked lists. Pointers store addresses as variables and as such point to the locations of other variables. Linked lists are a sequence of blocks connected to one another through pointers. As such, the variable in each pointer is the address of the next node, with the last node having no pointer and the pointer in the first block, the genesis block, actually lying outside of the block itself. At its simplest, a blockchain is simply a linked list of recorded transactions pointing back to one another through hash pointers.
Hash pointers are where blockchain sets itself apart in terms of certainty as pointers not only contain the address of the previous block, but also the hash data of that block too. As described earlier, this is the foundation of the secure nature of blockchain. For example, if a hacker wanted to attack the ninth block in a chain and change its data, he would have to alter the data in all following blocks, as their hash would also change. In essence, this makes it impossible to alter any data that is recorded on a blockchain.
Hashing is of the core fundamentals and foremost aspects of the immutable and defining potential of blockchain technology. It preserves the authenticity of the data that is recorded and viewed, and as such, the integrity of a blockchain as a whole. It is one of the more technical aspect of the technology, however understanding it is a solid step in understanding how blockchain functions and the immeasurable potential and value that it has.
A TXID is a transaction ID, produced by hashing transaction data (such as the amount being sent, the receiving address, the timestamp etc) and appearing in a string of numbers and letters, that can be used to identify and confirm a transaction has happened.
What are Merkle Trees?
A merkle tree, otherwise called a hash tree, is a data structure of hashes used to record data onto a blockchain in a secure and efficient manner. The concept was patented by Ralph Merkle in 1979.
The system works by running a block of transactions through an algorithm to generate a hash as a means of verifying the validity of that data based on the original transactions. An entire block of transactions is not run through a hash function at once, but rather each transaction is hashed, with those transactions being linked and hashed together. Eventually, this creates one hash for the entire block.
When visualized, the structure resembles that of a tree, albeit in a simplified manner as each block will normally contain hundreds, if not thousands, of transactions. Hashes on the bottom row are known as ‘leaves’, while middle hashes are referred to as ‘branches’ with the hash at the top being the ‘root’.
Merkle trees are especially useful as they allow anyone to confirm the validity of an individual transaction without having to download a whole blockchain. For instance, as long as you have the root hash (12345678), you can easily confirm transaction (8) by accounting for the hashes (7), (56) and (1234). As long as they are all there on a blockchain, transaction (8) is surely there and as such accounted for and as a result true, and meant to be there.
The Hash of the merkle root is normally contained in a block header along with:
- Hash of the previous block
- The block version number
- The current difficulty target
Merkle trees and hashes are a key component in allowing blockchain technology to function whilst providing security, integrity and irrefutability and, alongside consensus protocols, are arguably the most important aspects of blockchain technology.