The Hamming distance is a measure for comparing two binary equal-length data strings. It is denoted by d(a,b), where a and b are two equal-length strings, and defined as the number of positions at which their symbols are different.
For example, the Hamming distance between:
“karolin” and “kathrin” is 3 (the different symbols are: r-t, o-h, and l-r),
1011100 and 1001000 is 2 (the various binary numbers are: 1-0 and 1-0), and between
31738 and 32337 are 3 (the different integers are: 1-2, 7-3, and 8-7).
As the above examples show, the larger the Hamming distance, the more dissimilar the two involved strings are.
This metric is named after American mathematician Richard Hamming, who introduced it in his foundational paper on Hamming codes in 1950. It is used in several disciplines, such as information theory, coding theory, and cryptography.
The Hamming distance has proven to be very useful in many problems where a solution can be found by comparing two strings.
For example, in coding theory, it is used in error detecting and error-correcting codes. In telecommunications, it is applied to count the number of changed bits in binary words of fixed-length and then compare that value with a tolerated error. And in genetics, it is used as a measure of genetic differences.
It’s applied by some supervised and unsupervised learning algorithms in machine learning to find similarities or dissimilarities.
Hamming distances are the right tool to solve many problems that involve comparing different strings. As such, they are employed in several of the calculations performed by LogicPlum’s platform.
However, as all these operations are performed in an automated manner, the user doesn’t need to be an expert in coding theory and can only understand the results presented by the obtained model.
For those interested in the original paper by Richard Hamming:
© 2020 LogicPlum, Inc. All rights reserved.