Welcome Guest [Log In] [Register]
Welcome to Crypto. We hope you enjoy your visit.


You're currently viewing our forum as a guest. This means you are limited to certain areas of the board and there are some features you can't use. If you join our community, you'll be able to access member-only sections, and use many member-only features such as customizing your profile, sending personal messages, and voting in polls. Registration is simple, fast, and completely free.


Join our community!


If you're already a member please log in to your account to access all of our features:

Username:   Password:
Add Reply
Measure and Compare Entropy
Topic Started: Dec 2 2015, 05:25 PM (1,230 Views)
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
Uploaded a tiny routine to measure and compare the entropy of given values from a random source as text and hex string.
Code:
 
Text[space]String[space]to[space]check:[space][space]a78b60e5d2d61a4b90e91f169f3f75a2
----------------------------------------------------------
Entropy[space]=[space]3,819548827786958

Hex[space]String[space]to[space]check:[space][space]0xa78b60e5d2d61a4b90e91f169f3f75a2
----------------------------------------------------------
Entropy[space]=[space]4,000000000000000
If the hex value get longer the difference between a text string and a hex value in the resulting entropy increased noticable.
Code:
 
Text[space]String[space]to[space]check:[space][space]0b23fe039a61b72fd0cb4547b209eb9b3b98cbe74527a8f86d20e14601d953486438810fed8722c06bc517fcc6f52640c432c089c43cc77c990c38d99f4b83ce
----------------------------------------------------------
Entropy[space]=[space]3,898951907252214

Hex[space]String[space]to[space]check:[space][space]0x0b23fe039a61b72fd0cb4547b209eb9b3b98cbe74527a8f86d20e14601d953486438810fed8722c06bc517fcc6f52640c432c089c43cc77c990c38d99f4b83ce
----------------------------------------------------------
Entropy[space]=[space]5,812500000000000
The source code for the entropy check can be downloaded
from here http://www.freecx.co.uk/utils/

In order to compile you need the PureBasic Demo for either Windows, Linux or Mac OS X which you can download for free at
http://www.purebasic.com/download.php

Cheers,
Karl-Uwe


P.S.: Just figured that the map has to be reset before the next test run. Uploaded the fixed source code.
Edited by Karl-Uwe Frank, Dec 2 2015, 07:37 PM.
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
Replies:
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
Karl-Uwe Frank
 
in cryptography and especially when it comes to high quality randomness entropy is measured in byte consisting of 8 bit each.
mok-kong shen
 
Doesn't that sound like an assertion of the genre found e.g. in politics?
No, it is a simple fact.
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
jdege
Member Avatar
NSA worthy
[ *  *  *  *  *  * ]
Karl-Uwe Frank
Feb 23 2016, 09:02 PM
Basically every information in a computer is stored a bits having the values 0 or 1.

In fact all information handled or stored are considered byte (eight bit).
From Wikipedia: Byte

Quote:
 
The size of the byte has historically been hardware dependent and no definitive standards existed that mandated the size.

I've programmed on a machine that had six-bit bytes (and 18-bit integers and 60-bit floating point.)

The fundamental unit of information is the binary digit - the bit. Everything else is simply convention, a matter of what choices the developers found most convenient.
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post Goto Top
 
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
jdege
 
The fundamental unit of information is the binary digit - the bit. Everything else is simply convention, a matter of what choices the developers found most convenient.
I totally agree with you. So since the 1960s a byte is considered holding 8 bit. Therefore measuring the entropy of a row (file, variable, text string or whatever) of 8bit bytes is in common use. That's why the max. value of entropy is 8 bit per byte, if we use the log base 2.
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
mok-kong shen
NSA worthy
[ *  *  *  *  *  * ]
@Karl-Uwe Frank: Would employing 6 bits as uinit vs. 8 bits as unit deliver different values out of Shannon's formula? That was essentially what I asked in an earlier post. If the answer is yes, a further question is what does that mean actually?

BTW, from which literature did you get "since the 1960s a byte is considered holding 8 bit"?
Edited by mok-kong shen, Feb 25 2016, 08:52 PM.
Offline Profile Quote Post Goto Top
 
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
mok-kong shen
 
Would employing 6 bits as uinit vs. 8 bits as unit deliver different values out of Shannon's formula? That was essentially what I asked in an earlier post. If the answer is yes, a further question is what does that mean actually?
Due to my understanding it is logical that a 6bit byte can not hold as much entropy as a 8bit byte. The same as a 16 character set can't hold as much entropy as a 256 character set - as I already explained.

mok-kong shen
 
BTW, from which literature did you get "since the 1960s a byte is considered holding 8 bit"?
Just followed the link jdege posted.
Edited by Karl-Uwe Frank, Feb 25 2016, 11:37 PM.
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
mok-kong shen
NSA worthy
[ *  *  *  *  *  * ]
@Karl-Uwe Frank: I have again looked at the Wiki page which jdege referred to but failed up to now to see any sentence in support of "since the 1960s a byte is considered holding 8 bit". Could you please cite a sentence there so that I could find (the bunch of) the relevant sentences? In fact I personally interpret jdege's post to mean that a byte is not necessarily a term equvalent to 8 bits, since he mentioned that he earlier worked with machines having 6-bit bytes.

As to the entropy issue, if one at any position of a given bit sequence takes 6 bits vs. 8 bits, certainly the latter contains more entropy because it has two additional bits. But one is investigating a given bit sequence as a whole and it is the "average" entropy per bit, or the total entropy of the given sequence. So that argument of yours seems to be invalid.
Edited by mok-kong shen, Feb 26 2016, 08:38 AM.
Offline Profile Quote Post Goto Top
 
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
mok-kong shen
 
Could you please cite a sentence there so that I could find (the bunch of) the relevant sentences?
Just for your convinient as it seems you have some difficulties locating certain contextual information:

During the early 1960s, while also active in ASCII standardization, IBM simultaneously introduced in its product line of System/360 the eight-bit Extended Binary Coded Decimal Interchange Code (EBCDIC), an expansion of their six-bit binary-coded decimal (BCDIC) representation used in earlier card punches.[9] The prominence of the System/360 led to the ubiquitous adoption of the eight-bit storage size, while in detail the EBCDIC and ASCII encoding schemes are different.

In the early 1960s, AT&T introduced digital telephony first on long-distance trunk lines. These used the eight-bit µ-law encoding. This large investment promised to reduce transmission costs for eight-bit data.


mok-kong shen
 
As to the entropy issue, if one at any position of a given bit sequence takes 6 bits vs. 8 bits, certainly the latter contains more entropy because it has two additional bits.
Correct.

mok-kong shen
 
But one is investigating a given bit sequence as a whole and it is the "average" entropy per bit, or the total entropy of the given sequence. So that argument of yours seems to be invalid.
What now? In your previous sentence you admit that a 8bit byte sequence hold more entropy than a 6bit byte sequence but in the second sentence you deny the fact?
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
mok-kong shen
NSA worthy
[ *  *  *  *  *  * ]
The phrase "The prominence of the System/360 led to the ubiquitous adoption of the eight-bit storage size" doesn't justify "since the 1960s a byte is considered holding 8 bit", for the first phrase doesn't tell at all the time timepoint of (obviously slowly achieved) ubiquitous adoption. And I personally know that even in the 1980's there were big computers with 6-bit bytes.

I wrote "if one at any position of a given bit sequence takes 6 bits vs. 8 bits" which means, if the "position" is e.g. the start of the sequence, one takes the first 6 bits vs. first 8 bits of the sequece. Is that clear? So please kindly re-answer my earlier question.
Edited by mok-kong shen, Feb 26 2016, 10:25 AM.
Offline Profile Quote Post Goto Top
 
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
mok-kong shen
 
The phrase "The prominence of the System/360 led to the ubiquitous adoption of the eight-bit storage size" doesn't justify "since the 1960s a byte is considered holding 8 bit", for the first phrase doesn't tell at all the time timepoint of (obviously slowly achieved) ubiquitous adoption.
The 8bit byte was first use by IBM in the 1960s...

https://en.wikipedia.org/wiki/Six-bit_character_code
"Six-bit BCD was used by IBM on early computers such as the IBM 704 in 1954.[1]:p.35 This encoding was replaced by the 8-bit EBCDIC code when System/360 standardized on 8-bit bytes. "

https://en.wikipedia.org/wiki/EBCDIC
"EBCDIC was devised in 1963 and 1964 by IBM and was announced with the release of the IBM System/360 line of mainframe computers. It is an eight-bit character encoding..."

...and widely adopted since the 1980s.

https://en.wikipedia.org/wiki/Eight-bit
"The IBM System/360 introduced byte-addressable memory with 8-bit bytes,..."
"The first widely adopted 8-bit microprocessor was the Intel 8080, being used in many hobbyist computers of the late 1970s and early 1980s, ..."
"Many 8-bit CPUs or microcontrollers are the basis of today's ubiquitous embedded systems."


mok-kong shen
 
And I personally know that even in the 1980's there were big computers with 6-bit bytes.
Yes, that is nothing new, but as I said already, 8bit byte are in common use for decades now, and entropy is measured based on the 8bit byte file, hex values, etc.


mok-kong shen
 
...is e.g. the start of the sequence, one takes the first 6 bits vs. first 8 bits of the sequence. Is that clear?
For me is is obvious that a 6bit byte range set can not hold as much entropy as the a 8bit byte range, just because the range of available characters in the 6bit byte is smaller. That implies that the possibility of repeating values is lower in the 8bit byte range, which is better for randomness. From my point of view it is like having a 20 character alphabet comparing against a 92 character set, which obviously mean that the 92 character set can hold more entropy as a 20 character set.

Let me give you an illustrative example why.

Consider the following situation were we have a moderate secure password like
key$12/WORD_34%

At http://rumkin.com/tools/password/passchk.php I get the following result:

Length: 15
Strength: Strong - This password is typically good enough to safely guard sensitive information like financial records.
Entropy: 65.3 bits
Charset Size: 92 characters

How would you use such a password with a 6bit byte range?
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
mok-kong shen
NSA worthy
[ *  *  *  *  *  * ]
That a larger range would be better for the purpose seems plausible, but could you also try (with sufficiently long test sequences!) also e.g. 10 and 12 bits as unit?
Edited by mok-kong shen, Feb 26 2016, 05:10 PM.
Offline Profile Quote Post Goto Top
 
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
mok-kong shen
 
...but could you also try (with sufficiently long test sequences!) also e.g. 10 and 12 bits as unit?
What do you expect? Obviously a larger bit per byte range will hold more entropy under the perviously considered circumstances.
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
mok-kong shen
NSA worthy
[ *  *  *  *  *  * ]
A larger range contains more entropy, but what is interesting is the average entropy per bit, that is the entropy in a larger range divided by the number of bits of the range. It is IMHO not clear at all how that computed value varies with the range used to calculate it. One has to use tests on empirically available long sequences to examine it.
Offline Profile Quote Post Goto Top
 
mok-kong shen
NSA worthy
[ *  *  *  *  *  * ]
A larger range contains more entropy, but what is interesting is the average entropy per bit, that is the entropy in a larger range divided by the number of bits of the range. It is IMHO not intuitively clear at all how that computed value varies with the range used to calculate it. One has to use tests on empirically available long sequences to examine that effect.

I recalled just now that Shannon studied the entropy of English texts and found a value of about 1 bit per character.
Edited by mok-kong shen, Feb 26 2016, 05:43 PM.
Offline Profile Quote Post Goto Top
 
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
Let me give you another example using a 8 character keyword from different char sets

String = oEYnncCf
hex value = 6F 45 59 3C 3E 63 43 66
Entropy = 2.7500


String = ø€¥≤≥çǃ
hex value = C3 B8 E2 82 AC C2 A5 E2 89 A4 E2 89 A5 C3 A7 C3 87 C6 92
Entropy = 3.5368


String = 슉쎭穻槂귃櫃韂걾
hex value = EC 8A 89 EC 8E AD E7 A9 BB E6 A7 82 EA B7 83 E6 AB 83 E9 9F 82 EA B1 BE
Entropy = 4.1682


As you can see, using a different char set but still bound to 8bit per byte the entropy rises just because a larger char set consumes more byte per character.

Which bring us back to the beginning, that when using a given hex value as text string instead as hex byte values the entropy is lower, because using the hex values as text string would limit the available range of byte to calculate the entropy just to 16 instead of 256 possible byte.


(Attached a screen shot because it might be possible that your browser is unable to display the example passwords correctly)
Attached to this post:
Attachments: entropy_testset.jpg (27.87 KB)
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
Just uploaded the Python script to measure entropy

http://www.freecx.co.uk/utils/checkentropy.py

and a quick test with one of the hex values from the previous password examples show the difference between using a given hex value as text string vs. using it as hex value representation in the full byte range (0..255)

pypy checkentropy.py EC8A89EC8EADE7A9BBE6A782EAB783E6AB83E99F82EAB1BE

Text string to check = EC8A89EC8EADE7A9BBE6A782EAB783E6AB83E99F82EAB1BE
----------------------------------------------------------
Entropy = 3.31856849313


pypy checkentropy.py 0xEC8A89EC8EADE7A9BBE6A782EAB783E6AB83E99F82EAB1BE

Hex String to check = EC:8A:89:EC:8E:AD:E7:A9:BB:E6:A7:82:EA:B7:83:E6:AB:83:E9:9F:82:EA:B1:BE
----------------------------------------------------------
Entropy = 4.16829583405

cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
1 user reading this topic (1 Guest and 0 Anonymous)
Go to Next Page
« Previous Topic · Utilities · Next Topic »
Add Reply