Is Checksum or CRC better for checking data written to USB flash drives?
This post is to give the general user an idea of what verification method is better for writing data to a flash drive. There are reference links at the bottom of this post which dive much deeper into the two methods of verification if this simple overview is not enough.
The short answer is Cyclical Redundancy Check or CRC is the best method for checking data written to a USB flash drive.
Many believe a checksum is the best method to verify data written to a flash drive (most popular is MD5). I believe this is favored because it is easier to understand how the verification works, and also easier to implement. However, there are flaws in checksum verification and therefore not suitable for verification of data written to a flash drive.
What is the difference between Checksum and CRC verification? The checksum method uses addition in its math calculations to check whether all data was written correctly. CRC uses long division in its math calculations to check whether all data was written correctly. It is worth noting I am talking about binary long division, not the school-yard long division you so fondly remember.
Checksum methods will calculate the total bits in a packet of data and include that total checksum amount when the data is sent over communication lines. The receiver will then look at the packet, read the checksum value and then perform the same calculation to make sure everything adds up. If the calculation on the receiver’s end matches the value passed in the packet, all is good. The problem is a high probability that somewhere between the sender and receiver the bits of data are changed, corrupted or swapped yet still turn a correct checksum value after calculation on the receiving end.
In very simple terms suppose you have a bin of oranges and red apples going from Los Angeles to New York. All the apples and oranges where counted first and written down on a manifest, then poured into the bin and shipped off. Along the way some of the red apples where replaced with green apples. When the bin arrived in New York, the total number of apples and oranges remained the same after final count; however the receiver never knew it was supposed to be only red apples.
Cyclical Redundancy Checking is about as straight forward as addition but using long division. The advantage with this approach is looking throughout the entire packet to verify if all the information arrived correctly, rather than depending on one lump sum number as verification. As with traditional long division, binary division works through the dividend number from the divisor number.
Since the CRC method is a bit more complicated to explain, let me first touch on the apple and oranges example (before you fall asleep or click off the page). Let’s say the same bin of apples and oranges are going from Los Angeles to New York, but this time the apples and oranges are placed on trays similar to what you think for an egg carton. The trays are stacked nice and neat and the manifest says how many apples, color of apples, oranges and layers of trays for all the fruit. When the bin of apples and oranges are received in New York the receiver can easily check the number of layers in the bin and verify the total number of fruit as described in the manifest.
Cyclical Redundancy Check calculations are long division calculations for a packet of information but have just a slight bit of tweaking. To start the calculation a divisor number is set or “given.” Next, you add in Zeros that are one less than the divisor number. So for example if the divisor number is 5 digits long, then you would add four zeros at the end of the packet. The calculation would then start for binary long division. Once the binary long division is done there will be a remainder number. Here is where the “cyclical” part comes in. Now if the calculation swaps the remainder number with the four zeros mentioned above (when the calculation first started), then when the long division is ran again, there is no remainder (remainder is zero). Using this method means all bits in a packet are examined during the binary long division process after the data packet is received.
Probably the easiest way to understand this would be with a visual aid. Study the two charts below, then it will make sense.
When it comes to the reason for verification methods to be employed for data transmission it is typically done because of “noise” on the transmission lines, not because of a hacker or outside manipulation trying to tweak the data. Noise on a transmission line can be encountered for many reasons, such as poor design of the electrical part (say the printed circuit board of a flash drive) or not grounded properly (most flash drives use two layer PCB but the specification for USB is four layer CPB as a minimum) or poor quality materials used in the device. By using dependable verification methods, the receiver can, more reliably, determine if all the data was sent and received correctly.
Source: Thank you Nexcopy Inc. engineers for taking a technical topic and presenting in a non-technical way.
Binary Long Division
CRC definitions via Wikipedia
More technical definition + code for CRC in C and C++