Welcome Guest [Log In] [Register]
Welcome to Crypto. We hope you enjoy your visit.


You're currently viewing our forum as a guest. This means you are limited to certain areas of the board and there are some features you can't use. If you join our community, you'll be able to access member-only sections, and use many member-only features such as customizing your profile, sending personal messages, and voting in polls. Registration is simple, fast, and completely free.


Join our community!


If you're already a member please log in to your account to access all of our features:

Username:   Password:
Add Reply
pride.crypto.metrics; python tools for statistical analysis of crypto
Topic Started: Apr 21 2016, 12:21 AM (458 Views)
E. Rose
Member
[ *  * ]
I develop a python module called pride.crypto.metrics. It contains various tests for various types of cryptographic primitives, such as hash functions and block ciphers. I have found the tools invaluable when researching my own algorithms. I am putting it here in the hope that someone else may find it useful as well.

Tests include:

  • avalanche/diffusion test
    - Measures how much a change in input influences output
  • randomness test (ent)
    - Standard measurement of random bytes using ent (entropy, chi square, serial correlation, etc)
  • bias test
    - Determines uniformity of the supplied random bytes; Measures the evenness of output distribution in terms of what symbols appear and where.
  • period test
    - Cycle detection test. Chains output as input to the hash and measures how regularly truncated output cycles.
  • collision test
    - Truncated collision detection test. Crypts permutations of inputs and notes the how often collisions happen using truncated outputs
  • compression test
    - Measures how long it takes a hash function to compress large input
  • performance test
    - Measures how long it takes to generate 1 MB of random data

The code is available on github here: https://github.com/erose1337/pride/blob/master/crypto/metrics.py and is being attached as a zip to this post.

You should not have to install the complete pride package just to use pride.crypto.metrics; The script should function as a standalone, save for the demo test of AES/random data. Testing of random data does require ent; ent should be located in the same directory the script is run from.

Let me know if there's any problems, if you need help using it, or if you find it helpful. It could be documented a little better, but it's pretty straightforward to use.
Attached to this post:
Attachments: metrics.py (9.48 KB)
Edited by E. Rose, Apr 21 2016, 12:23 AM.
Offline Profile Quote Post Goto Top
 
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
running metrics.py I get the following error messages
Code:
 
Testing[space]diffusion/avalanche...[space]
Minimum[space]Hamming[space]distance[space]and[space]ratio:[space][space]0.36328125
Average[space]Hamming[space]distance[space]and[space]ratio:[space][space]0.49609375
Maximum[space]Hamming[space]distance[space]and[space]ratio:[space][space]0.6484375
Testing[space]randomness[space]of[space]1048576[space]bytes...[space]
Traceback[space](most[space]recent[space]call[space]last):
[space][space]File[space]"app_main.py",[space]line[space]51,[space]in[space]run_toplevel
[space][space]File[space]"metrics.py",[space]line[space]232,[space]in[space]<module>
[space][space][space][space]test_sha_metrics()
[space][space]File[space]"metrics.py",[space]line[space]225,[space]in[space]test_sha_metrics
[space][space][space][space]test_hash_function(lambda[space]data:[space]sha256(data).digest())
[space][space]File[space]"metrics.py",[space]line[space]146,[space]in[space]test_hash_function
[space][space][space][space]test_randomness(_hash_prng(hash_function,[space]1024[space]*[space]1024))
[space][space]File[space]"metrics.py",[space]line[space]69,[space]in[space]test_randomness
[space][space][space][space]with[space]open("./random_data/Test_Data_{}.bin".format(size),[space]'wb')[space]as[space]_file:
IOError:[space][Errno[space]2][space]No[space]such[space]file[space]or[space]directory:[space]'./random_data/Test_Data_1048576.bin'
Perhaps you should add a check if the random data file is present at the start of the program.
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
E. Rose
Member
[ *  * ]
The random data is passed as an argument to the function and written to a file (notice the mode "wb"), which is supposed to create the file if it does not exist. I'm willing to bet it's complaining because the /random_data/ folder doesn't exist. Let me fix it and push the change to github. Thanks for being a bug hunter for me!
Offline Profile Quote Post Goto Top
 
E. Rose
Member
[ *  * ]
Fixed! (hopefully). Here's the link again, same as the one up top: https://github.com/erose1337/pride/blob/master/crypto/metrics.py
Offline Profile Quote Post Goto Top
 
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
Works like a charm now.

As I'm aways interested in thoroughly testing different non-common crypto and hash algorithms for quality I'm curious how that might be possible using your test suite.

How would I need to change the source code in order to have some specific algorithms tested which are not part of Python?
Can you give me a hint?
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
Forgot to mention that, in order make it work like a charm, I needed to fix line 74 accordingly

< os.system(os.path.join(current_directory, "ent.exe") + " ./random_data/Test_Data_{}.bin".format(size))

> os.system(os.path.join(current_directory, "ent.exe") + " ./Test_Data_{}.bin".format(size))
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
E. Rose
Member
[ *  * ]
Quote:
 
Forgot to mention that, in order make it work like a charm, I needed to fix line 74 accordingly


And that's what I get for using a literal in two places and forgetting to maintain the invariant. Fixed it properly and pushed to github again. Thanks for your patience!

I'll try to improve the documentation so it's more obvious how to use the tools. Here's a copy from the information I'm adding to the file:

Code:
 

# The functions test_hash_function, test_block_cipher, and test_stream_cipher
# can be used to test the metrics of the respective primitives. Each test
# function measures a variety of statistics, which are configurable via
# keyword arguments.
#
# Using each function should be straightforward. To test a hash function, supply
# it as the first argument to test_hash_function; The hash function in question
# should function canonically; It should accept as input a string of bytes, and
# return as output a (fixed length) string of bytes. Any function that fits this
# interface can be tested. For example, below is a "random oracle" hash function
# that simply spits out 32 actually random bytes and memoizes the result when
# fed input.

import os
import metrics

memo = {}

def random_hash_function(input_data):
try:
return memo[input_data]
except KeyError:
result = memo[input_data] = os.urandom(32)
return result

metrics.test_hash_function(random_hash_function)


I am going to make the interface for test_block_cipher and test_stream_cipher more friendly to others. Right now they take an object with an encrypt method, but I think I will change it to where it simply accepts an encrypt method. This should make it more straightforward I think.
Offline Profile Quote Post Goto Top
 
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
Assume I have a 128bit hash value as a byte array. How would I pass that to metrics.test_hash_function( )?

All I get is
Code:
 
[space][space]#[space]Final[space]Hash[space]Output
[space][space]final_hash[space]=[space]bytearray(hSize)
[space][space]for[space]i[space]in[space]range(hSize):[space]final_hash[i][space]=[space]PRGA()
[space]
[space][space][space][space]metrics.test_hash_function([space]memo[final_hash][space])
TypeError:[space]'bytearray'[space]objects[space]are[space]unhashable
or as a string objects
Code:
 
[space][space]#[space]Final[space]Hash[space]Output
[space][space]final_hash[space]=[space]""
[space][space]for[space]i[space]in[space]range(hSize):[space]final_hash[space]+=[space]chr(PRGA())

[space][space][space][space]metrics.test_hash_function([space]memo[final_hash][space])
KeyError:[space]'\xeb\xec\xa5\xf8E\xec\x95|\xcfU\xa4\xf59\x8d\t='
Code:
 
[space]#[space]Final[space]Hash[space]Output
[space][space]final_hash[space]=[space]""
[space][space]for[space]i[space]in[space]range(hSize):[space]final_hash[space]+=[space]'%02x'[space]%[space]PRGA()
[space][space]
[space][space]metrics.test_hash_function([space]memo[final_hash][space])
KeyError:[space]'ebeca5f845ec957ccf55a4f5398d093d'

cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
E. Rose
Member
[ *  * ]
Karl-Uwe Frank
Apr 26 2016, 02:30 PM
Assume I have a 128bit hash value as a byte array. How would I pass that to metrics.test_hash_function( )?

All I get is
Code:
 
  # Final Hash Output
  final_hash = bytearray(hSize)
  for i in range(hSize): final_hash[i] = PRGA()
 
    metrics.test_hash_function( memo[final_hash] )
TypeError: 'bytearray' objects are unhashable
or as a string objects
Code:
 
  # Final Hash Output
  final_hash = ""
  for i in range(hSize): final_hash += chr(PRGA())

    metrics.test_hash_function( memo[final_hash] )
KeyError: '\xeb\xec\xa5\xf8E\xec\x95|\xcfU\xa4\xf59\x8d\t='
Code:
 
 # Final Hash Output
  final_hash = ""
  for i in range(hSize): final_hash += '%02x' % PRGA()
  
  metrics.test_hash_function( memo[final_hash] )
KeyError: 'ebeca5f845ec957ccf55a4f5398d093d'

test_hash_function accepts as the first argument a hash function itself, not the output of a hash function. test_hash_function uses the supplied function to generate outputs as needed, according to test at hand.

The hash function passed to test_hash_function should function canonically, meaning it accepts as input a string of bytes and returns as outputs a string of bytes.

Code:
 

import[space]os
import[space]metrics

memo[space]=[space]{}

def[space]random_hash_function(input_data):
try:
return[space]memo[input_data]
except[space]KeyError:
result[space]=[space]memo[input_data][space]=[space]os.urandom(32)
return[space]result

metrics.test_hash_function(random_hash_function)


Does this example help clarify? This example does work, I have tested it.

Side note: I tend to use bytearrays internally in my primitives because it avoids allocations and is faster. However, I feel it it is more intuitive for the function interface to accept/return byte strings.
Offline Profile Quote Post Goto Top
 
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
Okay, lets assume this primitive test setup
Code:
 
#!/usr/bin/env[space]python
#

import[space]sys,[space]os
from[space]hashlib[space]import[space]md5
import[space]metrics

memo[space]=[space]{}

#[space]-----------------------------------------
#
def[space]md5_hash():
[space][space]return[space]md5("test").digest()

#[space]-----------------------------------------
#
def[space]random_hash_function(input_data):
[space][space]try:
[space][space][space][space]return[space]memo[input_data]
[space][space]except[space]KeyError:
[space][space][space][space]result[space]=[space]memo[input_data][space]=[space]md5_hash()
[space][space]return[space]result

#[space]-----------------------------------------
#
def[space]main():
[space][space]metrics.test_hash_function([space]random_hash_function[space])

[space][space]sys.stdout.flush()

if[space]__name__[space]==[space]"__main__":
[space][space]main()
[space][space]sys.exit()
Will this below be the expected and correct result?

Code:
 
Testing[space]diffusion/avalanche...[space]
Minimum[space]Hamming[space]distance[space]and[space]ratio:[space][space]0.0
Average[space]Hamming[space]distance[space]and[space]ratio:[space][space]0.0
Maximum[space]Hamming[space]distance[space]and[space]ratio:[space][space]0.0
Testing[space]randomness[space]of[space]524288[space]bytes...[space]
Data[space]generated;[space]Running[space]ent...
Entropy[space]=[space]4.000000[space]bits[space]per[space]byte.

Optimum[space]compression[space]would[space]reduce[space]the[space]size
of[space]this[space]524288[space]byte[space]file[space]by[space]50[space]percent.

Chi[space]square[space]distribution[space]for[space]524288[space]samples[space]is[space]7864320.00,[space]and[space]randomly
would[space]exceed[space]this[space]value[space]less[space]than[space]0.01[space]percent[space]of[space]the[space]times.

Arithmetic[space]mean[space]value[space]of[space]data[space]bytes[space]is[space]126.8125[space](127.5[space]=[space]random).
Monte[space]Carlo[space]value[space]for[space]Pi[space]is[space]3.499982834[space](error[space]11.41[space]percent).
Serial[space]correlation[space]coefficient[space]is[space]-0.170770[space](totally[space]uncorrelated[space]=[space]0.0).
Testing[space]period[space]with[space]output[space]truncated[space]to[space]2[space]byte:[space]
Minimum/Average/Maximum[space]cycle[space]lengths:[space][space](1,[space]1.0,[space]1)
Testing[space]for[space]byte[space]bias...
Byte[space]bias:[space][space]16[space][1,[space]1,[space]1,[space]1,[space]1,[space]1,[space]1,[space]1,[space]1,[space]1,[space]1,[space]1,[space]1,[space]1,[space]1,[space]1]
Symbols[space]out[space]of[space]256[space]that[space]appeared[space]anywhere:[space][space]16
Testing[space]for[space]collisions[space]with[space]output[space]of[space]3[space]bytes...[space]
Collision[space]after:[space][space]1[space];[space]Output[space]size:[space][space]3
Time[space]testing[space]compression[space]function...
0.00615339279175
Testing[space]time[space]to[space]generate[space]1024[space]*[space]1024[space]bytes...[space]
33.778537035
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
E. Rose
Member
[ *  * ]
Yes. Though I will notice in your test setup that input_data is not passed to the md5 hash function. The memo then assigns the hash of the null string to all input_data entries, which is why the test outputs such unfortunate looking results.
Offline Profile Quote Post Goto Top
 
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
E. Rose
 
Though I will notice in your test setup that input_data is not passed to the md5 hash function. The memo then assigns the hash of the null string to all input_data entries, ...
Well, and how would you suppose to change this in the example listing?
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
E. Rose
Member
[ *  * ]
We simply need to ensure that the hash input actually makes it to the hash function. Here is an example of how to test the md5 hash function:

Code:
 
def[space]test_md5_metrics():
[space][space][space][space]from[space]hashlib[space]import[space]md5[space][space]
[space][space][space][space]test_hash_function(lambda[space]data:[space]md5(data).digest())


As you can see, test_hash_function accepts a function. The hash functions in pythons hashlib do not fit the canonical interface of Hash(input) => output; Pythons hash objects require a call to the "digest" method in order to produce their output. To accommodate for this, we wrap the md5 function in a lambda which will automatically call the digest method for us, resulting in a hash function that fits the required interface.

Note that in the previous example I gave a few days ago, the test harness was to test a "random oracle" hash function. This was the reason for including a memo in that test harness. An example using md5/sha would have probably been a better, simpler example, and would not have required such things.

When the above md5 test is run, it outputs the following (minus the output from ent):

Code:
 
Testing[space]diffusion/avalanche...[space]
Minimum[space]Hamming[space]distance[space]and[space]ratio:[space][space]0.3203125
Average[space]Hamming[space]distance[space]and[space]ratio:[space][space]0.5
Maximum[space]Hamming[space]distance[space]and[space]ratio:[space][space]0.6875
Testing[space]randomness[space]of[space]524288[space]bytes...[space]
Data[space]generated;[space]Running[space]ent...
Testing[space]period[space]with[space]output[space]truncated[space]to[space]2[space]byte:[space]
Minimum/Average/Maximum[space]cycle[space]lengths:[space][space](2,[space]40.37,[space]683)
Testing[space]for[space]byte[space]bias...
Byte[space]bias:[space][space]16[space][256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256]
Symbols[space]out[space]of[space]256[space]that[space]appeared[space]anywhere:[space][space]256
Testing[space]for[space]collisions[space]with[space]output[space]of[space]3[space]bytes...[space]
Collision[space]after:[space][space]7381[space];[space]Output[space]size:[space][space]3
Time[space]testing[space]compression[space]function...
7.16011771885e-05
Testing[space]time[space]to[space]generate[space]1024[space]*[space]1024[space]bytes...[space]
0.220407800606
Edited by E. Rose, Apr 27 2016, 04:29 PM.
Offline Profile Quote Post Goto Top
 
Karl-Uwe Frank
NSA worthy
[ *  *  *  *  *  * ]
Thanks a lot! That's fairly more what I'd expected, because when I've tested my own function
Code:
 

import[space]sys,[space]os
import[space]metrics

memo[space]=[space]{}
...

def[space]random_hash_function(input_data):
[space][space]try:
[space][space][space][space]return[space]memo[input_data]
[space][space]except[space]KeyError:
[space][space][space][space]result[space]=[space]memo[input_data][space]=[space]my_new_hash_function()
[space][space]return[space]result

#[space]-----------------------------------------
#
def[space]main():
[space][space]metrics.test_hash_function([space]random_hash_function[space])

[space][space]sys.stdout.flush()


if[space]__name__[space]==[space]"__main__":
[space][space]main()
[space][space]sys.exit()



I got this result
Code:
 
Testing[space]diffusion/avalanche...[space]
Minimum[space]Hamming[space]distance[space]and[space]ratio:[space][space]0.3203125
Average[space]Hamming[space]distance[space]and[space]ratio:[space][space]0.5
Maximum[space]Hamming[space]distance[space]and[space]ratio:[space][space]0.6796875
Testing[space]randomness[space]of[space]524288[space]bytes...[space]
Data[space]generated;[space]Running[space]ent...
Entropy[space]=[space]7.999643[space]bits[space]per[space]byte.

Optimum[space]compression[space]would[space]reduce[space]the[space]size
of[space]this[space]524288[space]byte[space]file[space]by[space]0[space]percent.

Chi[space]square[space]distribution[space]for[space]524288[space]samples[space]is[space]259.17,[space]and[space]randomly
would[space]exceed[space]this[space]value[space]41.56[space]percent[space]of[space]the[space]times.

Arithmetic[space]mean[space]value[space]of[space]data[space]bytes[space]is[space]127.4592[space](127.5[space]=[space]random).
Monte[space]Carlo[space]value[space]for[space]Pi[space]is[space]3.152813541[space](error[space]0.36[space]percent).
Serial[space]correlation[space]coefficient[space]is[space]0.000213[space](totally[space]uncorrelated[space]=[space]0.0).
Testing[space]period[space]with[space]output[space]truncated[space]to[space]2[space]byte:[space]
Minimum/Average/Maximum[space]cycle[space]lengths:[space][space](1,[space]38.23,[space]388)
Testing[space]for[space]byte[space]bias...
Byte[space]bias:[space][space]16[space][256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256,[space]256]
Symbols[space]out[space]of[space]256[space]that[space]appeared[space]anywhere:[space][space]256
Testing[space]for[space]collisions[space]with[space]output[space]of[space]3[space]bytes...[space]
Collision[space]after:[space][space]10753[space];[space]Output[space]size:[space][space]3
Time[space]testing[space]compression[space]function...
0.00349779129028
But as I always like to verify any test result generated by my own algorithms against the output of some well known algorithms, these of MD5 confused me a bit.
cHNiMUBACG0HAAAAAAAAAAAAAABIZVbDdKVM0w1kM9vxQHw+bkLxsY/Z0czY0uv8/Ks6WULxJVua
zjvpoYvtEwDVhP7RGTCBVlzZ+VBWPHg5rqmKWvtzsuVmMSDxAIS6Db6YhtzT+RStzoG9ForBcG8k
G97Q3Jml/aBun8Kyf+XOBHpl5gNW4YqhiM0=
Offline Profile Quote Post Goto Top
 
E. Rose
Member
[ *  * ]
I should note: the random_hash_function with the memo is not part of a normal test setup. That was present in the first example because it formed the body of a "random oracle" hash function, which was then tested. Basically, I would modify your test harness to read:

Code:
 


import[space]sys,[space]os
import[space]metrics

#[space]-----------------------------------------
#
def[space]main():
[space][space]metrics.test_hash_function([space]my_new_hash_function[space])

[space][space]sys.stdout.flush()


if[space]__name__[space]==[space]"__main__":
[space][space]main()
[space][space]sys.exit()


If my_new_hash_function accepts bytes for input and returns bytes as output, then this is all you need to do.
Edited by E. Rose, Apr 27 2016, 11:51 PM.
Offline Profile Quote Post Goto Top
 
1 user reading this topic (1 Guest and 0 Anonymous)
Go to Next Page
« Previous Topic · Utilities · Next Topic »
Add Reply