Welcome Guest [Log In] [Register]
Viewing Single Post From: Cracking A Vig With Ic
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
I was talking about using the Index of Coincidence to break Vigs, and someone asked for an example.

Let me start by saying that in my opinion, the most important thing computers can do for us, when we're playing around with hobby crypto, is to create cyphertexts for us, without our having to know which keys were used.

So we'll start with a ciphertext, encrypted as a plain Vig, with a keyword length of somewhere between 3 and 10.
Code:
 
  AOPLP QYEQK OMSJL URNKF NLYDN CVYCQ HSCSS IRZNL AJOWZ ZIQYC RMYBR JZLQB
  UPGIB ZRLXQ KJUEM ZEXIH JTMKI CLUYN KUTLP NCWZB JXAFN LIAUS QOYEC RBGOP
  UMCBE QOOSH PRMIO UHQAF NLHDL GUJSU NEIJF MZWLI QKRFY LLCYE XNPFO CAFSI
  MBZKQ QIVOI PZAPJ MXZGD TVRQY VCOHT HLMFS RXFEA SYIRZ GDTVR QYVQU JOLYW
  XWJAO POYAG SIOAQ LYLBS ONFDM PXHXV TENHA TZSGR PIVLN URJZI QYRQY VZLPD
  LRQYW VMFSV CBUFJ AUPVG WARJZ IUUEC YEXNJ ZNLXN EGQJZ NRQYX ZTUTV SBURY
  ZIUUE BLMNK BZKDJ FPROU TVSCW IVYFO YCJNM IMZQA LXNAJ XLUUE HYXIU UFHIR
  HKXXF PPRFI VFOTP VLNNL ZTGAY EXNXZ TUTLP NZSMK JFSYB NWAUS QCCA
Let's start with some frequency counts:
Code:
 
 L  U  Y  Z  N  Q  I  R  J  X  S  P  O  F  A  V  C  M  E  T  B  H  K  G  W  D
28 27 26 25 25 23 23 22 20 19 19 19 19 19 19 17 17 16 15 14 13 12 11 10  9  7
For a total of 473 letters.

Every letter appears, and even the least frequent letters appear with some regularity. This is far flatter a distribution than we'd expect from a mono-alphabet substitution cipher, and the Index of Coincidence should reflect that.

The kappa for this ciphertext is calculated from the frequency counts, as follows:
Code:
 
 kappa = ( 28 * 27 + 27 * 26 + 26 * 25 + 25 * 24 + 25 * 24 + 23 * 22 +
           23 * 22 + 22 * 21 + 20 * 19 + 19 * 18 + 19 * 18 + 19 * 18 +
           19 * 18 + 19 * 18 + 19 * 19 + 17 * 16 + 17 * 16 + 16 * 15 +
           15 * 14 + 14 * 13 + 13 * 12 + 12 * 11 + 11 * 10 + 10 *  9 +
            9 *  8 +  7 *  6 ) / ( 473 * 472 )

    = 9,440 / 224,202
    = 0.04210
We scale this by the kappa of random text, to obtain the Index of Coincidence:
Code:
 
 IC = 0.04210 / 0.03846
    = 1.095
Now remember what we expect for an IC. Random text should give us an IC of 1.0. Ordinary English text should give us an IC of 1.7. What we have is an IC of 1.1, which is very near to random text, and exactly what we would expect of Vig using anything other than a very short keyword.

So, next we have to find the length of the keyword.

First, trying a keyword of length two, we calculate the IC for the even and for the odd letters. The letters with indexes of 0 and 1, mod 2.
Code:
 
IC[0 mod 2] = 1.167
IC[1 mod 2] = 1.120
The average:
Code:
 
  IC[n=2] = 1.143
And we do the same for a keylength of 3:
Code:
 
  IC[0 mod 3] = 1.551
  IC[1 mod 3] = 1.374
  IC[2 mod 3] = 1.432
  IC[n=3] = 1.452
And continue:
Code:
 
  IC[n=2] = 1.143
  IC[n=3] = 1.452
  IC[n=4] = 1.270
  IC[n=5] = 1.196
  IC[n=6] = 1.545
  IC[n=7] = 1.304
  IC[n=8] = 1.388
  IC[n=9] = 2.147
  IC[n=10] = 1.305
  IC[n=11] = 1.274
  IC[n=12] = 1.821
  IC[n=13] = 1.389
  IC[n=14] = 1.418
  IC[n=15] = 1.706
  IC[n=16] = 1.358
  IC[n=17] = 1.272
  IC[n=18] = 2.312
  IC[n=19] = 1.360
  IC[n=20] = 1.465
Our IC peaks where n=9 and again where n=18. Which is what we'd expect for a keyword of length 9.

So lets assume a keyword of length 9, and look at the slices. Slice 0 has this frequency count:
Code:
 
A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z
3  1  0  0  0  0  3  0  2  1  5  2  3  3  5  0  1  4  0  5  6  0  0  3  2  4
And a total of 53 letters.

Doing a chi-test against a file of english letter frequencies, we get 0.04179.

Next is to try shifting. Our chi tests for shifting slice 0 from 0-25:
Code:
 
0  0.04179
1  0.03688
2  0.03963
3  0.03395
4  0.03166
5  0.03732
6  0.06339
7  0.04039
8  0.03832
9  0.03728
10  0.04022
11  0.03069
12  0.04337
13  0.04017
14  0.02968
15  0.03483
16  0.03733
17  0.03807
18  0.03472
19  0.04533
20  0.03865
21  0.04179
22  0.03650
23  0.03672
24  0.03425
25  0.03708
We have a clear peak at a shift of 6, or a keyword value of 'G'. ('A' == 0).

Do the same for slice 2:
Code:
 
0  0.03832
1  0.06781
2  0.03819
3  0.03517
4  0.03061
5  0.04191
6  0.03524
7  0.04081
8  0.03222
9  0.03090
10  0.03172
11  0.03630
12  0.04425
13  0.03795
14  0.04354
15  0.03519
16  0.04742
17  0.04018
18  0.03890
19  0.02796
20  0.03985
21  0.04079
22  0.03877
23  0.04293
24  0.03418
25  0.02887
Here, we have a peak at 1, or 'B'. Do the same for slices 3-8:
Code:
 
slice 0:   6["G"]  0.06339
slice 1:   1["B"]  0.06781
slice 2:  12["M"]  0.06365
slice 3:   7["H"]  0.07324
slice 4:  11["Y"]  0.06431
slice 5:   9["J"]  0.06622
slice 6:   20["U"]  0.07353
slice 7:    4["E"]  0.06646
slice 8:   21["V"]  0.06481
Or a keyword of "GBMHYJUEV". (Which is what I should expect, having used random letters for a keyword, instead of picking some nice, readable word out of a dictionary.)

Decrypted with that keyword our message is:
Code:
 
UNDER HEAVE NALLC ANSEE BEAUT YASBE AUTYO NLYBE CAUSE THERE ISUGL INESS
ALLCA NKNOW GOODA SGOOD ONLYB ECAUS ETHER EISEV ILTHE REFOR EHAVI NGAND
NOTHA VINGA RISET OGETH ERDIF FICUL TANDE ASYCO MPLEM ENTEA CHOTH ERLON
GANDS HORTC ONTRA STEAC HOTHE RHIGH ANDLO WREST UPONE ACHOT HERVO ICEAN
DSOUN DHARM ONIZE EACHO THERF RONTA NDBAC KFOLL OWONE ANOTH ERTHE REFOR
ETHES AGEGO ESABO UTDOI NGNOT HINGT EACHI NGNOT ALKIN GTHET ENTHO USAND
THING SRISE ANDFA LLWIT HOUTC EASEC REATI NGYET NOTWO RKING YETNO TTAKI
NGCRE DITWO RKISD ONETH ENFOR GOTTE NTHER EFORE ITLAS TSFOR EVER
If the text is long enough, this almost always works. As it gets shorter, the differences between the chi tests becomes less distinct. I should note, this text is three times as long as the recommended length for Vigs in the ACA. Shorter messages are more challenging, because normal variation can make the statistics less clear, Even in this one, slice 4 nearly had an incorrect peak. The correct shift had a chi of 0.06431 for a "Y", but a shift of 11 had a chi of 0.06069, which would have been an "L". If one or two slices have incorrect shifts, though, the text usually comes through, if you format the decrypted text one line per keyword length:
Code:
 
  0 1 2 3 4 5 6 7 8
  U N D E E H E A V
  E N A L Y C A N S
  E E B E N U T Y A
  S B E A H T Y O N
  L Y B E P A U S E
  T H E R R I S U G
  L I N E F S A L L
  C A N K A O W G O
  O D A S T O O D O
  N L Y B R C A U S
From this, it's obvious that you're on the right track, and that its slice four that is incorrect.
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post
Cracking A Vig With Ic · General