Welcome Guest [Log In] [Register]
Welcome to Crypto. We hope you enjoy your visit.


You're currently viewing our forum as a guest. This means you are limited to certain areas of the board and there are some features you can't use. If you join our community, you'll be able to access member-only sections, and use many member-only features such as customizing your profile, sending personal messages, and voting in polls. Registration is simple, fast, and completely free.


Join our community!


If you're already a member please log in to your account to access all of our features:

Username:   Password:
Add Reply
Measure and Compare Entropy
Topic Started: Dec 2 2015, 05:25 PM (1,232 Views)
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
Uploaded a tiny routine to measure and compare the entropy of given values from a random source as text and hex string.
Code:
 
Text[space]String[space]to[space]check:[space][space]a78b60e5d2d61a4b90e91f169f3f75a2
----------------------------------------------------------
Entropy[space]=[space]3,819548827786958

Hex[space]String[space]to[space]check:[space][space]0xa78b60e5d2d61a4b90e91f169f3f75a2
----------------------------------------------------------
Entropy[space]=[space]4,000000000000000
If the hex value get longer the difference between a text string and a hex value in the resulting entropy increased noticable.
Code:
 
Text[space]String[space]to[space]check:[space][space]0b23fe039a61b72fd0cb4547b209eb9b3b98cbe74527a8f86d20e14601d953486438810fed8722c06bc517fcc6f52640c432c089c43cc77c990c38d99f4b83ce
----------------------------------------------------------
Entropy[space]=[space]3,898951907252214

Hex[space]String[space]to[space]check:[space][space]0x0b23fe039a61b72fd0cb4547b209eb9b3b98cbe74527a8f86d20e14601d953486438810fed8722c06bc517fcc6f52640c432c089c43cc77c990c38d99f4b83ce
----------------------------------------------------------
Entropy[space]=[space]5,812500000000000
The source code for the entropy check can be downloaded
from here http://www.freecx.co.uk/utils/

In order to compile you need the PureBasic Demo for either Windows, Linux or Mac OS X which you can download for free at
http://www.purebasic.com/download.php

Cheers,
Karl-Uwe


P.S.: Just figured that the map has to be reset before the next test run. Uploaded the fixed source code.
Edited by Karl-Uwe Frank, Dec 2 2015, 07:37 PM.
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
Replies:
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
mok-kong shen
 
But a character in any text file occupies a byte which is 2 hexs.
Yes, mostly a text character occupies one byte - but which is one hex, not two hex!. It occur to me meanwhile that this is a basic misunderstanding of yours.

=> A byte is always one hex value in the range 0x00..0xFF. <=

If you of course would interpret a hex value as a text string, than every hex value occupies two byte, just because every hex value in a text string consist of two characters in the range 0..F.
And it's a grave error thinking that it must have more entropy therefor. Which I have already explained in several posts why not, like Post #36 for example.

Quote:
 
You could anyway try and see the result of processing pieces of not too short normal texts.
Just to do you a favour I have tested the following text passage from WikiPedia for its entropy, first as pure text file and second the encrypted version of this file (a pure hex binary therefor)

Click to view: Text passage from WikiPedia


And logically the result of the text reads

Entropy = 4.335508 bits per byte.

while that of the encrypted text reads

Entropy = 7.926975 bits per byte.
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
mok-kong shen
NSA worthy
[ *  *  *  *  *  * ]
Certainly I have no real idea at all, but each byte has its leading bit 0, which is a fairly high regularity and hence I personally had intuitively expected to see a lower value of entropy per byte to come out.
Offline Profile Quote Post Goto Top
 
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
A text file with natural language will always have a lower entropy, I suppose always lower than 5 because a text of natural language has a limited character set, while any properly encrypted file consiting of the full byte range and therefor will very soon reach the maximal possible value of 8.

To prove this I have tested a text file which you can download here http://norvig.com/big.txt (6.5 MB)

The result as text file is
big.txt
Entropy = 4.511148 bits per byte.

the encrypted version of this text file however
big.txt.zx8
Entropy = 7.999970 bits per byte.

mok-kong shen
 
Certainly I have no real idea at all, but each byte has its leading bit 0, which is a fairly high regularity and hence I personally had intuitively expected to see a lower value of entropy per byte to come out.
This of course will occur if we treat hex values as text string, because in this case we limit the possible range of available character just to 16!

cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
For those how like to play around a bit with entropy calculation online, here is a nice website

http://planetcalc.com/2476/

If you like to compare the results my example programs generates, just paste this test string into the form field, (but remove any newline first)

d131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f8955ad3406
09f4b30283e488832571415a085125e8f7cdc99fd91dbdf280373c5bd8823e3156348f5b
ae6dacd436c919c6dd53e2b487da03fd02396306d248cda0e99f33420f577ee8ce54b670
80a80d1ec69821bcb6a8839396f9652b6ff72a70

Uncheck "Ignore case", but check "Ignore space" and then click "Calculate"

You may also paste the same values in as pure hex, in order to verify that this give a higher entropy.
Edited by Karl-Uwe Frank, Feb 29 2016, 03:26 AM.
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
mok-kong shen
NSA worthy
[ *  *  *  *  *  * ]
I tried that site with two natural language words "variable" and "property" and got 2.75 and 2.5 only. Is that ok? (cf. the value 4.51... you got from a text.)
Edited by mok-kong shen, Feb 29 2016, 02:02 PM.
Offline Profile Quote Post Goto Top
 
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
mok-kong shen
 
I tried that site with two natural language words "variable" and "property" and got 2.75 and 2.5 only. Is that ok?
Yes, that's what I get also

pypy checkentropy.py "variable"

Text string to check = variable
----------------------------------------------------------
Entropy = 2.75

pypy checkentropy.py "property"

Text string to check = property
----------------------------------------------------------
Entropy = 2.5

mok-kong shen
 
(cf. the value 4.51... you got from a text.)
You have tested fairly short strings. The larger the test the higher the result, because it might be that the text includes all characters in a given set.
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
mok-kong shen
NSA worthy
[ *  *  *  *  *  * ]
As is apparent from my earlier comments, I have no practical experience with outputs from software implementing Shannon's formula. It seems to me that short sequences always give small entropy values which are apparently of no use to judge the randomness and that in order to get good useful results there should be none-zero frequencies in the whole range of a byte, which would translate to a minimum sequence length of the order of 1K or perhaps more, I surmise. Further, assuming the sequence has a sufficient length, I like to know how is one going to judge whether a sequence is sufficiently random or not in practice, i.e. is a entropy value like 7.5 good enough or one would need something like 7.995? Could you please say something from your experiences?

I think that there is something fundamental that is not nice with the entropy measure. You earlier got from a natural language text a value 4.51... If one sorts that text, i.e. all a's are in front of all b's etc., that value wouldn't change, since the formula counts only the frequencies of the characters and doesn't take their positional relations among them into count. However, evidently the sorted text is more "regular" or less "random" than the original. One could perhaps take two characters (bytes) as unit and similarly compute from the formula a measure. But it is evident that the problem could only at best be weakened, not basically removed.
Edited by mok-kong shen, Mar 2 2016, 08:42 PM.
Offline Profile Quote Post Goto Top
 
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
mok-kong shen
 
As is apparent from my earlier comments, I have no practical experience with outputs from software implementing Shannon's formula. It seems to me that short sequences always give small entropy values which are apparently of no use to judge the randomness and that in order to get good useful results there should be none-zero frequencies in the whole range of a byte, which would translate to a minimum sequence length of the order of 1K or perhaps more, I surmise. Further, assuming the sequence has a sufficient length, I like to know how is one going to judge whether a sequence is sufficiently random or not in practice, i.e. is a entropy value like 7.5 good enough or one would need something like 7.995? Could you please say something from your experiences?
To my experience any entropy value greater than 5 is fine, because a natural language will not reach 5, as far as I understand and tested.

For example the above quoted text of yours give

File size in bytes: 763
Shannon entropy (min bits per byte-character): 4,336011982


While a 512bit SHA-2 hash of that text passage
ad6141c5824b7d9f669dc54706c9e149bac18db0a462453d07d08b39d363dc9c5fa2191b2d2174bcb6baef7b2421d939bc6604f502f30621768c39f3fd2e29c5
give

File size in bytes: 64
Shannon entropy (min bits per byte-character): 5,620864648



mok-kong shen
 
If one sorts that text, i.e. all a's are in front of all b's etc., that value wouldn't change, since the formula counts only the frequencies of the characters and doesn't take their positional relations among them into count.
That's how entropy is calculated. And as said so often, a natural language has no place in cryptography other than as plaintext perhaps.

Also mentioned often is, that one should only use the hash of any keyword, but not the keyword itself for initialising any cipher.
But still the keyword should have sufficient entropy itself.

Which brings us to another point.

A good keyword which a user can type in using his regular keyboard should have at least an entropy of greater 3.5,
like this one for example

Enter Passphrase: §4sdJMA_LAis/&67zzds§!=:l(&%hashA.

File size in bytes: 35
Shannon entropy (min bits per byte-character): 4,422000517




while a simpler one like below has a bit lower value.

Enter Passphrase: 0h8zPT9pug1TFiHt

File size in bytes: 16
Shannon entropy (min bits per byte-character): 3,875


But by just changing an adding some characters against some special characters will lift the entropy nicely

Enter Passphrase: 0h8#PT9%ug1T@iHtaz

File size in bytes: 18
Shannon entropy (min bits per byte-character): 4,05881389



But as you already noticed, all what counts is the maximum diversity of used characters in relation to the length of the string. This becomes very clearly visible if you compare the entropy of the larger text passage with its 763 byte against the complex keyword with only 35, which offers a slightly higher entropy than the text.

Therefore to my opinion a value of entropy greater 3.5 for the typed in keyword is evidently the lowest border. Anything below should be rejected by the encryption software with an alert to the user. Later on, the hash used to seed the encryption routine should not be lower than 5.
Edited by Karl-Uwe Frank, Mar 3 2016, 02:49 AM.
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
mok-kong shen
NSA worthy
[ *  *  *  *  *  * ]
For statistical tests there are confidence intervals. So one knows what the values obtained signify. But in comparison the entropy values computed are fuzzy in my view.
Offline Profile Quote Post Goto Top
 
1 user reading this topic (1 Guest and 0 Anonymous)
« Previous Topic · Utilities · Next Topic »
Add Reply