Welcome Guest [Log In] [Register]
Welcome to Crypto. We hope you enjoy your visit.


You're currently viewing our forum as a guest. This means you are limited to certain areas of the board and there are some features you can't use. If you join our community, you'll be able to access member-only sections, and use many member-only features such as customizing your profile, sending personal messages, and voting in polls. Registration is simple, fast, and completely free.


Join our community!


If you're already a member please log in to your account to access all of our features:

Username:   Password:
Add Reply
Measure and Compare Entropy
Topic Started: Dec 2 2015, 05:25 PM (1,226 Views)
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
Uploaded a tiny routine to measure and compare the entropy of given values from a random source as text and hex string.
Code:
 
Text[space]String[space]to[space]check:[space][space]a78b60e5d2d61a4b90e91f169f3f75a2
----------------------------------------------------------
Entropy[space]=[space]3,819548827786958

Hex[space]String[space]to[space]check:[space][space]0xa78b60e5d2d61a4b90e91f169f3f75a2
----------------------------------------------------------
Entropy[space]=[space]4,000000000000000
If the hex value get longer the difference between a text string and a hex value in the resulting entropy increased noticable.
Code:
 
Text[space]String[space]to[space]check:[space][space]0b23fe039a61b72fd0cb4547b209eb9b3b98cbe74527a8f86d20e14601d953486438810fed8722c06bc517fcc6f52640c432c089c43cc77c990c38d99f4b83ce
----------------------------------------------------------
Entropy[space]=[space]3,898951907252214

Hex[space]String[space]to[space]check:[space][space]0x0b23fe039a61b72fd0cb4547b209eb9b3b98cbe74527a8f86d20e14601d953486438810fed8722c06bc517fcc6f52640c432c089c43cc77c990c38d99f4b83ce
----------------------------------------------------------
Entropy[space]=[space]5,812500000000000
The source code for the entropy check can be downloaded
from here http://www.freecx.co.uk/utils/

In order to compile you need the PureBasic Demo for either Windows, Linux or Mac OS X which you can download for free at
http://www.purebasic.com/download.php

Cheers,
Karl-Uwe


P.S.: Just figured that the map has to be reset before the next test run. Uploaded the fixed source code.
Edited by Karl-Uwe Frank, Dec 2 2015, 07:37 PM.
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
Uploaded an additional version which will compile into an EXE that show a popup window, for those users not familiar with the command line on Windows.

http://www.freecx.co.uk/utils/entropycheck_window.pb

Cheers,
Karl-Uwe
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
Regarding PureBasic, not only that is really cross platform (currently only missing ARM support), generate tiny exe files that reach nearly the execution speed of C/C++, it also offers a nice selection of crypto functions build in like AES, SHA-1, SHA-2, SHA-3

http://www.purebasic.com/documentation/cipher/index.html

Cheers,
Karl-Uwe
Edited by Karl-Uwe Frank, Dec 2 2015, 09:33 PM.
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
Uploaded just the source code of another utility to visualise the entropy of a given keyword against the hash of the keyword.

http://www.freecx.co.uk/utils/entropy_check_passwd.pb

Cheers,
Karl-Uwe
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
mok-kong shen
NSA worthy
[ *  *  *  *  *  * ]
Karl-Uwe Frank
Dec 2 2015, 05:25 PM
If the hex value get longer the difference between a text string and a hex value in the resulting entropy increased noticable.
I don't yet understand. Are the text string and the hex value related in some particular way? If yes, how related? (If not, their entopy contents would certainly be totally independent of each other.)
Offline Profile Quote Post Goto Top
 
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
mok-kong shen
 
Karl-Uwe Frank
Dec 2 2015, 05:25 PM
If the hex value get longer the difference between a text string and a hex value in the resulting entropy increased noticable.
I don't yet understand. Are the text string and the hex value related in some particular way? If yes, how related? (If not, their entopy contents would certainly be totally independent of each other.)
The text and the hex string are related in characters used, but differ in their notation, as the hex notation consist of a leading "0x". It's just a fun project and basically nothing more than a tool that calculate and visualise how much more entropy is available if we use a given string in hex format instead as a text string.
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
mok-kong shen
NSA worthy
[ *  *  *  *  *  * ]
Some further questions: One has a certain given alphabet and a given (commonly relative short) string in a field of application. Should one assume that the frequency distribution of that one string is representive for that field? If yes, how certain one is in that assumption in practice? If no, how would you compute the entropy value? Now changing the notation from text to hex would essentially/basically change the frequency distribution, considering e.g. the extreme case of changing to binary representation, in which case one has only 2 frequency values. I am afraid (though not at all sure) that there might be some problems even on the conceptual level.
Edited by mok-kong shen, Feb 12 2016, 11:50 AM.
Offline Profile Quote Post Goto Top
 
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
mok-kong shen
 
... e.g. the extreme case of changing to binary representation, in which case one has only 2 frequency values.
Here you answered the question yourself.

The main difference between the notation of a given keyword in text or hex notation is, that with a hex notation you automatically expand the given range of possible characters from the short alphabet of 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F to the byte representation of 0x0..0x255. Because the entropy is calculated over the complete range of the given alphabet, the lager such alphabet is the greater the entropy.

If you extend the value #RNDvalByte = 64 up to #RNDvalByte = 160 in the available example http://www.freecx.co.uk/utils/entropycheck_window.pb and run a test it become perhaps more obvious, because the entropy of the hex notation increase nearly double of that of the text notation. Just try it yourself and look at the results.
Edited by Karl-Uwe Frank, Feb 21 2016, 06:43 PM.
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
mok-kong shen
NSA worthy
[ *  *  *  *  *  * ]
I suppose you misunderstood me. If a binary sequence is interpreted as a hex sequence, one has a frequency distribution of 16 different values, while in the binary case there are only two frequency values. My question is whether it could happen that in the hex case due to the more detailed consideration the entropy calculated would differ from the case of binary where there is sort of a lumping together to deal only with 0 and 1. Could you also answer the first question of my last post?
Edited by mok-kong shen, Feb 21 2016, 09:23 PM.
Offline Profile Quote Post Goto Top
 
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
mok-kong shen
 
If a binary sequence is interpreted as a hex sequence, one has a frequency distribution of 16 different values, while in the binary case there are only two frequency values.
A binary sequence interpreted as a hex sequence has 256 different values not 16.
Edited by Karl-Uwe Frank, Feb 22 2016, 04:03 PM.
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
mok-kong shen
NSA worthy
[ *  *  *  *  *  * ]
Karl-Uwe Frank
Feb 22 2016, 04:03 PM
A binary sequence interpreted as a hex sequence has 256 different values not 16.
See https://en.wikipedia.org/wiki/Hexadecimal
Offline Profile Quote Post Goto Top
 
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
Basically every information in a computer is stored a bits having the values 0 or 1.

In fact all information handled or stored are considered byte (eight bit).

A byte has a representation from 0 to 255 decimal, 0x00 to 0xff in hexadecimal or 00000000 to fffffff in binary notation.

Mostly every program is handling the input and output as byte.

Due to this fact that there are max. 256 different byte available to handle or store information it doesn't matter how the information is organised, it's all a row of byte.

If we check the entropy of a file or other information we can calculate over the whole range of 256 byte.

Even if we check a JPG, a movie file, a text file or whatever, it's always consist of only max. 256 different byte.

If you interpret the hex value of a row of byte as text characters you loosing the context and entropy because you limit the max. available character range to only 16 instead of 256!

The hex text string

2b f3 22 a7 ...

has the hex byte notation

0x32 0x62 0x66 0x33 0x32 0x32 0x61 0x37 ...

which mean that you limit the max. available character range to

0x30 0x31 0x32 0x33 0x34 0x35 0x36 0x37 0x38 0x39 0x61 0x62 0x63 0x64 0x65 0x66

instead using the full range from 0x00 to 0xff

2b = 43
f3 = 243
22 = 34
a7 = 167

cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
mok-kong shen
NSA worthy
[ *  *  *  *  *  * ]
One could certainly take arbitrary number of bits as a group/unit for one's (particular) study of frequency distributions. Byte of 8 bits as a unit is only because current hardware have all that as a unit for addressing. In the past, for example, the computers of CDC had 6-bit bytes.
Edited by mok-kong shen, Feb 23 2016, 09:34 PM.
Offline Profile Quote Post Goto Top
 
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
mok-kong shen
 
One could certainly take arbitrary number of bits as a group/unit for one's (particular) study of frequency distributions.
Yes of course, anyone could build up his personal set of characters based on any desired amount of bits. But in cryptography and especially when it comes to high quality randomness entropy is measured in byte consisting of 8 bit each. Even a file with true randomness gathered from a radio decay source will consist of a row of 8-bit byte in the range 0x00..0xFF. And this hold even for a 64bit or even a 128bit CPU. All in all it's always a bunch of 8-bit byte. And yes in the past there were computers handling different size of byte, as per your example. But for the last decades it's 8-bit per byte, from Arduino to the newest mobile processor. So currently there is no need other than calculating entropy on the byte "alphabet".

According to Wikipedia:
"The idea is that the more different letters there are, and the more equal their proportional abundances in the string of interest, the more difficult it is to correctly predict which letter will be the next one in the string. The Shannon entropy quantifies the uncertainty (entropy or degree of surprise) associated with this prediction."

Clearly having only 0 and 1 make it a 50% chance in predicting the next outcome.

Which in simple words mean, a low alphabet with only 16 different characters or 26 characters can not hold the higher amount of entropy as a 256 character "alphabet".
Edited by Karl-Uwe Frank, Feb 24 2016, 12:59 AM.
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
mok-kong shen
NSA worthy
[ *  *  *  *  *  * ]
Karl-Uwe Frank
Feb 24 2016, 12:50 AM
in cryptography and especially when it comes to high quality randomness entropy is measured in byte consisting of 8 bit each.
Doesn't that sound like an assertion of the genre found e.g. in politics? Note that Shannon's formula is general in the sense that the cardinality of the set of symbols under consideration is arbitrary (i.e. 2 or more).
Offline Profile Quote Post Goto Top
 
1 user reading this topic (1 Guest and 0 Anonymous)
Go to Next Page
« Previous Topic · Utilities · Next Topic »
Add Reply