Welcome Guest [Log In] [Register]
Viewing Single Post From: Cracking A Vig With Ic
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
OK. You have some text, and you want to see if it has a frequency distribution that is similar to that of ordinary English. First, you need a frequency distribution for ordinary English. You can find that all over the web: http://en.wikipedia.org/wiki/Letter_frequencies. It doesn't matter whether these are expressed as percentages, or per thousand, or whatever, because you're going to be dividing by their sum. It's only the ratios that matter.
Code:
 
a   8.167%
b   1.492%
c   2.782%
d   4.253%
e  12.702%
f   2.228%
g   2.015%
h   6.094%
i   6.966%
j   0.153%
k   0.772%
l   4.025%
m   2.406%
n   6.749%
o   7.507%
p   1.929%
q   0.095%
r   5.987%
s   6.327%
t   9.056%
u   2.758%
v   0.978%
w   2.360%
x   0.150%
y   1.974%
z   0.074%
The next thing you need is to add up all the numbers above so as to get a total count. In this case, because our numbers are percentages, we should end up very close ro 100, which we do. These numbers sum to 99.999.

Next, we need the text we're testing.
Code:
 
WEREK NIGHT SOFTH EROUN DTABL EWEDA NCEWH ENEER WEREA BLEWE
DOROU TINES ANDCH ORUSS CENES WEREK NIGHT SOFTH EROUN DTABL
EWITH FOOTW ORKIM PECCA BLEOU RSHOW SAREF ORMID ABLEW EDINE
WELLH EREIN CAMEL OTBUT MANYT IMESW EEATH AMAND JAMAN DSPAM
ALOTW EREGI VENRH YMEST HATAR EQUIT EUNSI NGABL EINWA RWERE
TOUGH ANDAB LEWER EOPER AMADI NCAME LOTQU ITEIN DEFAT IGABL
EWESI NGFRO MTHED IAPHR AGMAL OTBET WEENO URQUE STSWE SEQUI
NVEST SANDI MPERS ONATE CLARK GABLE ITSAB USYLI FEINC AMELO
TIHAV ETOPU SHTHE PRAMA LOTMO NTYPY THON
For this, we also need a frequency count - a count of how many times each letter appears in the text.
Code:
 
A       40
B       12
C        9
D       14
E       66
F        7
G        9
H       19
I       25
J        1
K        4
L       19
M       17
N       29
O       27
P        8
Q        4
R       26
S       22
T       34
U       15
V        3
W       19
X        0
Y        5
Z        0
And again we need a total - a sum of all the numbers. This time, it's 434.

Now we need to multiply the value for A in the standard frequency count by the value for A in our test frequency count, and do the same for B, C, etc..
Code:
 
A: 40 *  8.167 = 326.680
B: 12 *  1.492 =  17.904
C:  9 *  2.782 =  25.038
D: 14 *  4.253 =  59.542
E: 66 * 12.702 = 838.332
F:  7 *  2.228 =  15.596
G:  9 *  2.015 =  18.135
H: 19 *  6.094 = 115.786
I: 25 *  6.966 = 174.150
J:  1 *  0.153 =   0.153
K:  4 *  0.772 =   3.088
L: 19 *  4.025 =  76.475
M: 17 *  2.406 =  40.902
N: 29 *  6.749 = 195.721
O: 27 *  7.507 = 202.689
P:  8 *  1.929 =  15.432
Q:  4 *  0.095 =   0.380
R: 26 *  5.987 = 155.662
S: 22 *  6.327 = 139.194
T: 34 *  9.056 = 307.904
U: 15 *  2.758 =  41.370
V:  3 *  0.978 =   2.934
W: 19 *  2.360 =  44.840
X:  0 *  0.015 =   0.000
Y:  5 *  1.974 =   9.870
Z:  0 *  0.074 =   0.000
And when we're done, we need to add up all those products:
Code:
 
   326.680
+   17.904
+   25.038
+   59.542
+  838.332
+   15.596
+   18.135
+  115.786
+  174.150
+    0.153
+    3.088
+   76.475
+   40.902
+  195.721
+  202.689
+   15.432
+    0.380
+  155.662
+  139.194
+  307.904
+   41.370
+    2.934
+   44.840
+    0.000
+    9.870
+    0.000
= 2827.780
The final step is to divide this sum - the sum of the products - by the product of the two sums you obtained for the two distributions separately.
Code:
 
chi = 2827.780 / ( 99.999 * 434 ) = 0.0652
The next question is what does this tell us? Remember - the IC of random text is 0.0385 and the IC of normal English is 0.0653. The chi test returns values in the same range. A chi of 0.0652 tells us that the distribution of this sample text is almost exactly the same as that of our sample of ordinary English.
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post
Cracking A Vig With Ic · General