| Welcome to Crypto. We hope you enjoy your visit. You're currently viewing our forum as a guest. This means you are limited to certain areas of the board and there are some features you can't use. If you join our community, you'll be able to access member-only sections, and use many member-only features such as customizing your profile, sending personal messages, and voting in polls. Registration is simple, fast, and completely free. Join our community! If you're already a member please log in to your account to access all of our features: |
| Measure and Compare Entropy | |
|---|---|
| Tweet Topic Started: Dec 2 2015, 05:25 PM (1,232 Views) | |
| Karl-Uwe Frank | Dec 2 2015, 05:25 PM Post #1 |
|
NSA worthy
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Uploaded a tiny routine to measure and compare the entropy of given values from a random source as text and hex string.If the hex value get longer the difference between a text string and a hex value in the resulting entropy increased noticable. The source code for the entropy check can be downloaded from here http://www.freecx.co.uk/utils/ In order to compile you need the PureBasic Demo for either Windows, Linux or Mac OS X which you can download for free at http://www.purebasic.com/download.php Cheers, Karl-Uwe P.S.: Just figured that the map has to be reset before the next test run. Uploaded the fixed source code. Edited by Karl-Uwe Frank, Dec 2 2015, 07:37 PM.
|
|
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0= | |
![]() |
|
| Replies: | |
|---|---|
| Karl-Uwe Frank | Feb 28 2016, 08:09 PM Post #46 |
|
NSA worthy
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Yes, mostly a text character occupies one byte - but which is one hex, not two hex!. It occur to me meanwhile that this is a basic misunderstanding of yours. => A byte is always one hex value in the range 0x00..0xFF. <= If you of course would interpret a hex value as a text string, than every hex value occupies two byte, just because every hex value in a text string consist of two characters in the range 0..F. And it's a grave error thinking that it must have more entropy therefor. Which I have already explained in several posts why not, like Post #36 for example. Just to do you a favour I have tested the following text passage from WikiPedia for its entropy, first as pure text file and second the encrypted version of this file (a pure hex binary therefor) Click to view: Text passage from WikiPedia And logically the result of the text reads Entropy = 4.335508 bits per byte. while that of the encrypted text reads Entropy = 7.926975 bits per byte. |
|
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0= | |
![]() |
|
| mok-kong shen | Feb 28 2016, 08:23 PM Post #47 |
|
NSA worthy
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Certainly I have no real idea at all, but each byte has its leading bit 0, which is a fairly high regularity and hence I personally had intuitively expected to see a lower value of entropy per byte to come out. |
![]() |
|
| Karl-Uwe Frank | Feb 28 2016, 08:47 PM Post #48 |
|
NSA worthy
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
A text file with natural language will always have a lower entropy, I suppose always lower than 5 because a text of natural language has a limited character set, while any properly encrypted file consiting of the full byte range and therefor will very soon reach the maximal possible value of 8. To prove this I have tested a text file which you can download here http://norvig.com/big.txt (6.5 MB) The result as text file is big.txt Entropy = 4.511148 bits per byte. the encrypted version of this text file however big.txt.zx8 Entropy = 7.999970 bits per byte. This of course will occur if we treat hex values as text string, because in this case we limit the possible range of available character just to 16! |
|
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0= | |
![]() |
|
| Karl-Uwe Frank | Feb 29 2016, 03:25 AM Post #49 |
|
NSA worthy
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
For those how like to play around a bit with entropy calculation online, here is a nice website http://planetcalc.com/2476/ If you like to compare the results my example programs generates, just paste this test string into the form field, (but remove any newline first) d131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f8955ad3406 09f4b30283e488832571415a085125e8f7cdc99fd91dbdf280373c5bd8823e3156348f5b ae6dacd436c919c6dd53e2b487da03fd02396306d248cda0e99f33420f577ee8ce54b670 80a80d1ec69821bcb6a8839396f9652b6ff72a70 Uncheck "Ignore case", but check "Ignore space" and then click "Calculate" You may also paste the same values in as pure hex, in order to verify that this give a higher entropy. Edited by Karl-Uwe Frank, Feb 29 2016, 03:26 AM.
|
|
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0= | |
![]() |
|
| mok-kong shen | Feb 29 2016, 01:46 PM Post #50 |
|
NSA worthy
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I tried that site with two natural language words "variable" and "property" and got 2.75 and 2.5 only. Is that ok? (cf. the value 4.51... you got from a text.)
Edited by mok-kong shen, Feb 29 2016, 02:02 PM.
|
![]() |
|
| Karl-Uwe Frank | Mar 1 2016, 12:21 AM Post #51 |
|
NSA worthy
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Yes, that's what I get also pypy checkentropy.py "variable" Text string to check = variable ---------------------------------------------------------- Entropy = 2.75 pypy checkentropy.py "property" Text string to check = property ---------------------------------------------------------- Entropy = 2.5 You have tested fairly short strings. The larger the test the higher the result, because it might be that the text includes all characters in a given set. |
|
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0= | |
![]() |
|
| mok-kong shen | Mar 1 2016, 09:02 PM Post #52 |
|
NSA worthy
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
As is apparent from my earlier comments, I have no practical experience with outputs from software implementing Shannon's formula. It seems to me that short sequences always give small entropy values which are apparently of no use to judge the randomness and that in order to get good useful results there should be none-zero frequencies in the whole range of a byte, which would translate to a minimum sequence length of the order of 1K or perhaps more, I surmise. Further, assuming the sequence has a sufficient length, I like to know how is one going to judge whether a sequence is sufficiently random or not in practice, i.e. is a entropy value like 7.5 good enough or one would need something like 7.995? Could you please say something from your experiences? I think that there is something fundamental that is not nice with the entropy measure. You earlier got from a natural language text a value 4.51... If one sorts that text, i.e. all a's are in front of all b's etc., that value wouldn't change, since the formula counts only the frequencies of the characters and doesn't take their positional relations among them into count. However, evidently the sorted text is more "regular" or less "random" than the original. One could perhaps take two characters (bytes) as unit and similarly compute from the formula a measure. But it is evident that the problem could only at best be weakened, not basically removed. Edited by mok-kong shen, Mar 2 2016, 08:42 PM.
|
![]() |
|
| Karl-Uwe Frank | Mar 3 2016, 02:43 AM Post #53 |
|
NSA worthy
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
To my experience any entropy value greater than 5 is fine, because a natural language will not reach 5, as far as I understand and tested. For example the above quoted text of yours give File size in bytes: 763 Shannon entropy (min bits per byte-character): 4,336011982 While a 512bit SHA-2 hash of that text passage ad6141c5824b7d9f669dc54706c9e149bac18db0a462453d07d08b39d363dc9c5fa2191b2d2174bcb6baef7b2421d939bc6604f502f30621768c39f3fd2e29c5 give File size in bytes: 64 Shannon entropy (min bits per byte-character): 5,620864648 That's how entropy is calculated. And as said so often, a natural language has no place in cryptography other than as plaintext perhaps. Also mentioned often is, that one should only use the hash of any keyword, but not the keyword itself for initialising any cipher. But still the keyword should have sufficient entropy itself. Which brings us to another point. A good keyword which a user can type in using his regular keyboard should have at least an entropy of greater 3.5, like this one for example Enter Passphrase: §4sdJMA_LAis/&67zzds§!=:l(&%hashA. File size in bytes: 35 Shannon entropy (min bits per byte-character): 4,422000517 while a simpler one like below has a bit lower value. Enter Passphrase: 0h8zPT9pug1TFiHt File size in bytes: 16 Shannon entropy (min bits per byte-character): 3,875 But by just changing an adding some characters against some special characters will lift the entropy nicely Enter Passphrase: 0h8#PT9%ug1T@iHtaz File size in bytes: 18 Shannon entropy (min bits per byte-character): 4,05881389 But as you already noticed, all what counts is the maximum diversity of used characters in relation to the length of the string. This becomes very clearly visible if you compare the entropy of the larger text passage with its 763 byte against the complex keyword with only 35, which offers a slightly higher entropy than the text. Therefore to my opinion a value of entropy greater 3.5 for the typed in keyword is evidently the lowest border. Anything below should be rejected by the encryption software with an alert to the user. Later on, the hash used to seed the encryption routine should not be lower than 5. Edited by Karl-Uwe Frank, Mar 3 2016, 02:49 AM.
|
|
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0= | |
![]() |
|
| mok-kong shen | Mar 3 2016, 08:38 AM Post #54 |
|
NSA worthy
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
For statistical tests there are confidence intervals. So one knows what the values obtained signify. But in comparison the entropy values computed are fuzzy in my view. |
![]() |
|
| 1 user reading this topic (1 Guest and 0 Anonymous) | |
| « Previous Topic · Utilities · Next Topic » |





![]](http://z2.ifrm.com/static/1/pip_r.png)



12:32 AM Jul 11