|
pride.crypto.metrics; python tools for statistical analysis of crypto
|
|
Topic Started: Apr 21 2016, 12:21 AM (456 Views)
|
|
E. Rose
|
Apr 21 2016, 12:21 AM
Post #1
|
- Posts:
- 32
- Group:
- Members
- Member
- #4,398
- Joined:
- April 20, 2016
|
I develop a python module called pride.crypto.metrics. It contains various tests for various types of cryptographic primitives, such as hash functions and block ciphers. I have found the tools invaluable when researching my own algorithms. I am putting it here in the hope that someone else may find it useful as well.
Tests include:
- avalanche/diffusion test
- Measures how much a change in input influences output - randomness test (ent)
- Standard measurement of random bytes using ent (entropy, chi square, serial correlation, etc) - bias test
- Determines uniformity of the supplied random bytes; Measures the evenness of output distribution in terms of what symbols appear and where. - period test
- Cycle detection test. Chains output as input to the hash and measures how regularly truncated output cycles. - collision test
- Truncated collision detection test. Crypts permutations of inputs and notes the how often collisions happen using truncated outputs - compression test
- Measures how long it takes a hash function to compress large input - performance test
- Measures how long it takes to generate 1 MB of random data
The code is available on github here: https://github.com/erose1337/pride/blob/master/crypto/metrics.py and is being attached as a zip to this post.
You should not have to install the complete pride package just to use pride.crypto.metrics; The script should function as a standalone, save for the demo test of AES/random data. Testing of random data does require ent; ent should be located in the same directory the script is run from.
Let me know if there's any problems, if you need help using it, or if you find it helpful. It could be documented a little better, but it's pretty straightforward to use.
- Attached to this post:
metrics.py (9.48 KB)
Edited by E. Rose, Apr 21 2016, 12:23 AM.
|
|
|
| |
|
Karl-Uwe Frank
|
Apr 23 2016, 08:34 PM
Post #2
|
- Posts:
- 639
- Group:
- Members
- Member
- #3,502
- Joined:
- July 11, 2011
|
running metrics.py I get the following error messages
- Code:
-
Testing[space]diffusion/avalanche...[space] Minimum[space]Hamming[space]distance[space]and[space]ratio:[space][space]0.36328125 Average[space]Hamming[space]distance[space]and[space]ratio:[space][space]0.49609375 Maximum[space]Hamming[space]distance[space]and[space]ratio:[space][space]0.6484375 Testing[space]randomness[space]of[space]1048576[space]bytes...[space] Traceback[space](most[space]recent[space]call[space]last): [space][space]File[space]"app_main.py",[space]line[space]51,[space]in[space]run_toplevel [space][space]File[space]"metrics.py",[space]line[space]232,[space]in[space]<module> [space][space][space][space]test_sha_metrics() [space][space]File[space]"metrics.py",[space]line[space]225,[space]in[space]test_sha_metrics [space][space][space][space]test_hash_function(lambda[space]data:[space]sha256(data).digest()) [space][space]File[space]"metrics.py",[space]line[space]146,[space]in[space]test_hash_function [space][space][space][space]test_randomness(_hash_prng(hash_function,[space]1024[space]*[space]1024)) [space][space]File[space]"metrics.py",[space]line[space]69,[space]in[space]test_randomness [space][space][space][space]with[space]open("./random_data/Test_Data_{}.bin".format(size),[space]'wb')[space]as[space]_file: IOError:[space][Errno[space]2][space]No[space]such[space]file[space]or[space]directory:[space]'./random_data/Test_Data_1048576.bin'
Perhaps you should add a check if the random data file is present at the start of the program.
|
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
|
| |
|
E. Rose
|
Apr 23 2016, 11:13 PM
Post #3
|
- Posts:
- 32
- Group:
- Members
- Member
- #4,398
- Joined:
- April 20, 2016
|
The random data is passed as an argument to the function and written to a file (notice the mode "wb"), which is supposed to create the file if it does not exist. I'm willing to bet it's complaining because the /random_data/ folder doesn't exist. Let me fix it and push the change to github. Thanks for being a bug hunter for me!
|
|
|
| |
|
E. Rose
|
Apr 23 2016, 11:40 PM
Post #4
|
- Posts:
- 32
- Group:
- Members
- Member
- #4,398
- Joined:
- April 20, 2016
|
Fixed! (hopefully). Here's the link again, same as the one up top: https://github.com/erose1337/pride/blob/master/crypto/metrics.py
|
|
|
| |
|
Karl-Uwe Frank
|
Apr 24 2016, 10:29 AM
Post #5
|
- Posts:
- 639
- Group:
- Members
- Member
- #3,502
- Joined:
- July 11, 2011
|
Works like a charm now.
As I'm aways interested in thoroughly testing different non-common crypto and hash algorithms for quality I'm curious how that might be possible using your test suite.
How would I need to change the source code in order to have some specific algorithms tested which are not part of Python? Can you give me a hint?
|
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
|
| |
|
Karl-Uwe Frank
|
Apr 24 2016, 03:38 PM
Post #6
|
- Posts:
- 639
- Group:
- Members
- Member
- #3,502
- Joined:
- July 11, 2011
|
Forgot to mention that, in order make it work like a charm, I needed to fix line 74 accordingly
< os.system(os.path.join(current_directory, "ent.exe") + " ./random_data/Test_Data_{}.bin".format(size))
> os.system(os.path.join(current_directory, "ent.exe") + " ./Test_Data_{}.bin".format(size))
|
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
|
| |
|
E. Rose
|
Apr 24 2016, 08:29 PM
Post #7
|
- Posts:
- 32
- Group:
- Members
- Member
- #4,398
- Joined:
- April 20, 2016
|
- Quote:
-
Forgot to mention that, in order make it work like a charm, I needed to fix line 74 accordingly
And that's what I get for using a literal in two places and forgetting to maintain the invariant. Fixed it properly and pushed to github again. Thanks for your patience!
I'll try to improve the documentation so it's more obvious how to use the tools. Here's a copy from the information I'm adding to the file:
- Code:
-
# The functions test_hash_function, test_block_cipher, and test_stream_cipher # can be used to test the metrics of the respective primitives. Each test # function measures a variety of statistics, which are configurable via # keyword arguments. # # Using each function should be straightforward. To test a hash function, supply # it as the first argument to test_hash_function; The hash function in question # should function canonically; It should accept as input a string of bytes, and # return as output a (fixed length) string of bytes. Any function that fits this # interface can be tested. For example, below is a "random oracle" hash function # that simply spits out 32 actually random bytes and memoizes the result when # fed input.
import os import metrics
memo = {}
def random_hash_function(input_data): try: return memo[input_data] except KeyError: result = memo[input_data] = os.urandom(32) return result metrics.test_hash_function(random_hash_function)
I am going to make the interface for test_block_cipher and test_stream_cipher more friendly to others. Right now they take an object with an encrypt method, but I think I will change it to where it simply accepts an encrypt method. This should make it more straightforward I think.
|
|
|
| |
|
Karl-Uwe Frank
|
Apr 26 2016, 02:30 PM
Post #8
|
- Posts:
- 639
- Group:
- Members
- Member
- #3,502
- Joined:
- July 11, 2011
|
Assume I have a 128bit hash value as a byte array. How would I pass that to metrics.test_hash_function( )?
All I get is - Code:
-
[space][space]#[space]Final[space]Hash[space]Output [space][space]final_hash[space]=[space]bytearray(hSize) [space][space]for[space]i[space]in[space]range(hSize):[space]final_hash[i][space]=[space]PRGA() [space] [space][space][space][space]metrics.test_hash_function([space]memo[final_hash][space]) TypeError:[space]'bytearray'[space]objects[space]are[space]unhashable or as a string objects - Code:
-
[space][space]#[space]Final[space]Hash[space]Output [space][space]final_hash[space]=[space]"" [space][space]for[space]i[space]in[space]range(hSize):[space]final_hash[space]+=[space]chr(PRGA())
[space][space][space][space]metrics.test_hash_function([space]memo[final_hash][space]) KeyError:[space]'\xeb\xec\xa5\xf8E\xec\x95|\xcfU\xa4\xf59\x8d\t=' - Code:
-
[space]#[space]Final[space]Hash[space]Output [space][space]final_hash[space]=[space]"" [space][space]for[space]i[space]in[space]range(hSize):[space]final_hash[space]+=[space]'%02x'[space]%[space]PRGA() [space][space] [space][space]metrics.test_hash_function([space]memo[final_hash][space]) KeyError:[space]'ebeca5f845ec957ccf55a4f5398d093d'
|
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
|
| |
|
E. Rose
|
Apr 26 2016, 04:08 PM
Post #9
|
- Posts:
- 32
- Group:
- Members
- Member
- #4,398
- Joined:
- April 20, 2016
|
- Karl-Uwe Frank
- Apr 26 2016, 02:30 PM
Assume I have a 128bit hash value as a byte array. How would I pass that to metrics.test_hash_function( )? All I get is - Code:
-
# Final Hash Output final_hash = bytearray(hSize) for i in range(hSize): final_hash[i] = PRGA() metrics.test_hash_function( memo[final_hash] ) TypeError: 'bytearray' objects are unhashable
or as a string objects - Code:
-
# Final Hash Output final_hash = "" for i in range(hSize): final_hash += chr(PRGA())
metrics.test_hash_function( memo[final_hash] ) KeyError: '\xeb\xec\xa5\xf8E\xec\x95|\xcfU\xa4\xf59\x8d\t=' - Code:
-
# Final Hash Output final_hash = "" for i in range(hSize): final_hash += '%02x' % PRGA() metrics.test_hash_function( memo[final_hash] ) KeyError: 'ebeca5f845ec957ccf55a4f5398d093d'
test_hash_function accepts as the first argument a hash function itself, not the output of a hash function. test_hash_function uses the supplied function to generate outputs as needed, according to test at hand.
The hash function passed to test_hash_function should function canonically, meaning it accepts as input a string of bytes and returns as outputs a string of bytes.
- Code:
-
import[space]os import[space]metrics
memo[space]=[space]{}
def[space]random_hash_function(input_data): try: return[space]memo[input_data] except[space]KeyError: result[space]=[space]memo[input_data][space]=[space]os.urandom(32) return[space]result
metrics.test_hash_function(random_hash_function)
Does this example help clarify? This example does work, I have tested it.
Side note: I tend to use bytearrays internally in my primitives because it avoids allocations and is faster. However, I feel it it is more intuitive for the function interface to accept/return byte strings.
|
|
|
| |
|
Karl-Uwe Frank
|
Apr 26 2016, 06:07 PM
Post #10
|
- Posts:
- 639
- Group:
- Members
- Member
- #3,502
- Joined:
- July 11, 2011
|
Okay, lets assume this primitive test setup
- Code:
-
#!/usr/bin/env[space]python #
import[space]sys,[space]os from[space]hashlib[space]import[space]md5 import[space]metrics
memo[space]=[space]{}
#[space]----------------------------------------- # def[space]md5_hash(): [space][space]return[space]md5("test").digest()
#[space]----------------------------------------- # def[space]random_hash_function(input_data): [space][space]try: [space][space][space][space]return[space]memo[input_data] [space][space]except[space]KeyError: [space][space][space][space]result[space]=[space]memo[input_data][space]=[space]md5_hash() [space][space]return[space]result
#[space]----------------------------------------- # def[space]main(): [space][space]metrics.test_hash_function([space]random_hash_function[space])
[space][space]sys.stdout.flush()
if[space]__name__[space]==[space]"__main__": [space][space]main() [space][space]sys.exit() Will this below be the expected and correct result?
- Code:
-
Testing[space]diffusion/avalanche...[space] Minimum[space]Hamming[space]distance[space]and[space]ratio:[space][space]0.0 Average[space]Hamming[space]distance[space]and[space]ratio:[space][space]0.0 Maximum[space]Hamming[space]distance[space]and[space]ratio:[space][space]0.0 Testing[space]randomness[space]of[space]524288[space]bytes...[space] Data[space]generated;[space]Running[space]ent... Entropy[space]=[space]4.000000[space]bits[space]per[space]byte.
Optimum[space]compression[space]would[space]reduce[space]the[space]size of[space]this[space]524288[space]byte[space]file[space]by[space]50[space]percent.
Chi[space]square[space]distribution[space]for[space]524288[space]samples[space]is[space]7864320.00,[space]and[space]randomly would[space]exceed[space]this[space]value[space]less[space]than[space]0.01[space]percent[space]of[space]the[space]times.
Arithmetic[space]mean[space]value[space]of[space]data[space]bytes[space]is[space]126.8125[space](127.5[space]=[space]random). Monte[space]Carlo[space]value[space]for[space]Pi[space]is[space]3.499982834[space](error[space]11.41[space]percent). Serial[space]correlation[space]coefficient[space]is[space]-0.170770[space](totally[space]uncorrelated[space]=[space]0.0). Testing[space]period[space]with[space]output[space]truncated[space]to[space]2[space]byte:[space] Minimum/Average/Maximum[space]cycle[space]lengths:[space][space](1,[space]1.0,[space]1) Testing[space]for[space]byte[space]bias... Byte[space]bias:[space][space]16[space][1,[space]1,[space]1,[space]1,[space]1,[space]1,[space]1,[space]1,[space]1,[space]1,[space]1,[space]1,[space]1,[space]1,[space]1,[space]1] Symbols[space]out[space]of[space]256[space]that[space]appeared[space]anywhere:[space][space]16 Testing[space]for[space]collisions[space]with[space]output[space]of[space]3[space]bytes...[space] Collision[space]after:[space][space]1[space];[space]Output[space]size:[space][space]3 Time[space]testing[space]compression[space]function... 0.00615339279175 Testing[space]time[space]to[space]generate[space]1024[space]*[space]1024[space]bytes...[space] 33.778537035
|
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
|
| |
|
E. Rose
|
Apr 26 2016, 07:56 PM
Post #11
|
- Posts:
- 32
- Group:
- Members
- Member
- #4,398
- Joined:
- April 20, 2016
|
Yes. Though I will notice in your test setup that input_data is not passed to the md5 hash function. The memo then assigns the hash of the null string to all input_data entries, which is why the test outputs such unfortunate looking results.
|
|
|
| |
|
Karl-Uwe Frank
|
Apr 27 2016, 10:15 AM
Post #12
|
- Posts:
- 639
- Group:
- Members
- Member
- #3,502
- Joined:
- July 11, 2011
|
- E. Rose
-
Though I will notice in your test setup that input_data is not passed to the md5 hash function. The memo then assigns the hash of the null string to all input_data entries, ... Well, and how would you suppose to change this in the example listing?
|
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
|
| |
|
E. Rose
|
Apr 27 2016, 04:29 PM
Post #13
|
- Posts:
- 32
- Group:
- Members
- Member
- #4,398
- Joined:
- April 20, 2016
|
We simply need to ensure that the hash input actually makes it to the hash function. Here is an example of how to test the md5 hash function:
- Code:
-
def[space]test_md5_metrics(): [space][space][space][space]from[space]hashlib[space]import[space]md5[space][space] [space][space][space][space]test_hash_function(lambda[space]data:[space]md5(data).digest())
As you can see, test_hash_function accepts a function. The hash functions in pythons hashlib do not fit the canonical interface of Hash(input) => output; Pythons hash objects require a call to the "digest" method in order to produce their output. To accommodate for this, we wrap the md5 function in a lambda which will automatically call the digest method for us, resulting in a hash function that fits the required interface.
Note that in the previous example I gave a few days ago, the test harness was to test a "random oracle" hash function. This was the reason for including a memo in that test harness. An example using md5/sha would have probably been a better, simpler example, and would not have required such things.
When the above md5 test is run, it outputs the following (minus the output from ent):
- Code:
-
Testing[space]diffusion/avalanche...[space] Minimum[space]Hamming[space]distance[space]and[space]ratio:[space][space]0.3203125 Average[space]Hamming[space]distance[space]and[space]ratio:[space][space]0.5 Maximum[space]Hamming[space]distance[space]and[space]ratio:[space][space]0.6875 Testing[space]randomness[space]of[space]524288[space]bytes...[space] Data[space]generated;[space]Running[space]ent... Testing[space]period[space]with[space]output[space]truncated[space]to[space]2[space]byte:[space] Minimum/Average/Maximum[space]cycle[space]lengths:[space][space](2,[space]40.37,[space]683) Testing[space]for[space]byte[space]bias... Byte[space]bias:[space][space]16[space][256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256] Symbols[space]out[space]of[space]256[space]that[space]appeared[space]anywhere:[space][space]256 Testing[space]for[space]collisions[space]with[space]output[space]of[space]3[space]bytes...[space] Collision[space]after:[space][space]7381[space];[space]Output[space]size:[space][space]3 Time[space]testing[space]compression[space]function... 7.16011771885e-05 Testing[space]time[space]to[space]generate[space]1024[space]*[space]1024[space]bytes...[space] 0.220407800606
Edited by E. Rose, Apr 27 2016, 04:29 PM.
|
|
|
| |
|
Karl-Uwe Frank
|
Apr 27 2016, 10:32 PM
Post #14
|
- Posts:
- 639
- Group:
- Members
- Member
- #3,502
- Joined:
- July 11, 2011
|
Thanks a lot! That's fairly more what I'd expected, because when I've tested my own function- Code:
-
import[space]sys,[space]os import[space]metrics
memo[space]=[space]{} ...
def[space]random_hash_function(input_data): [space][space]try: [space][space][space][space]return[space]memo[input_data] [space][space]except[space]KeyError: [space][space][space][space]result[space]=[space]memo[input_data][space]=[space]my_new_hash_function() [space][space]return[space]result
#[space]----------------------------------------- # def[space]main(): [space][space]metrics.test_hash_function([space]random_hash_function[space])
[space][space]sys.stdout.flush()
if[space]__name__[space]==[space]"__main__": [space][space]main() [space][space]sys.exit()
I got this result- Code:
-
Testing[space]diffusion/avalanche...[space] Minimum[space]Hamming[space]distance[space]and[space]ratio:[space][space]0.3203125 Average[space]Hamming[space]distance[space]and[space]ratio:[space][space]0.5 Maximum[space]Hamming[space]distance[space]and[space]ratio:[space][space]0.6796875 Testing[space]randomness[space]of[space]524288[space]bytes...[space] Data[space]generated;[space]Running[space]ent... Entropy[space]=[space]7.999643[space]bits[space]per[space]byte.
Optimum[space]compression[space]would[space]reduce[space]the[space]size of[space]this[space]524288[space]byte[space]file[space]by[space]0[space]percent.
Chi[space]square[space]distribution[space]for[space]524288[space]samples[space]is[space]259.17,[space]and[space]randomly would[space]exceed[space]this[space]value[space]41.56[space]percent[space]of[space]the[space]times.
Arithmetic[space]mean[space]value[space]of[space]data[space]bytes[space]is[space]127.4592[space](127.5[space]=[space]random). Monte[space]Carlo[space]value[space]for[space]Pi[space]is[space]3.152813541[space](error[space]0.36[space]percent). Serial[space]correlation[space]coefficient[space]is[space]0.000213[space](totally[space]uncorrelated[space]=[space]0.0). Testing[space]period[space]with[space]output[space]truncated[space]to[space]2[space]byte:[space] Minimum/Average/Maximum[space]cycle[space]lengths:[space][space](1,[space]38.23,[space]388) Testing[space]for[space]byte[space]bias... Byte[space]bias:[space][space]16[space][256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256] Symbols[space]out[space]of[space]256[space]that[space]appeared[space]anywhere:[space][space]256 Testing[space]for[space]collisions[space]with[space]output[space]of[space]3[space]bytes...[space] Collision[space]after:[space][space]10753[space];[space]Output[space]size:[space][space]3 Time[space]testing[space]compression[space]function... 0.00349779129028 But as I always like to verify any test result generated by my own algorithms against the output of some well known algorithms, these of MD5 confused me a bit.
|
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
|
| |
|
E. Rose
|
Apr 27 2016, 11:49 PM
Post #15
|
- Posts:
- 32
- Group:
- Members
- Member
- #4,398
- Joined:
- April 20, 2016
|
I should note: the random_hash_function with the memo is not part of a normal test setup. That was present in the first example because it formed the body of a "random oracle" hash function, which was then tested. Basically, I would modify your test harness to read:
- Code:
-
import[space]sys,[space]os import[space]metrics
#[space]----------------------------------------- # def[space]main(): [space][space]metrics.test_hash_function([space]my_new_hash_function[space])
[space][space]sys.stdout.flush()
if[space]__name__[space]==[space]"__main__": [space][space]main() [space][space]sys.exit()
If my_new_hash_function accepts bytes for input and returns bytes as output, then this is all you need to do.
Edited by E. Rose, Apr 27 2016, 11:51 PM.
|
|
|
| |
| 1 user reading this topic (1 Guest and 0 Anonymous)
|