Welcome Guest [Log In] [Register]
Welcome to Crypto. We hope you enjoy your visit.


You're currently viewing our forum as a guest. This means you are limited to certain areas of the board and there are some features you can't use. If you join our community, you'll be able to access member-only sections, and use many member-only features such as customizing your profile, sending personal messages, and voting in polls. Registration is simple, fast, and completely free.


Join our community!


If you're already a member please log in to your account to access all of our features:

Username:   Password:
Add Reply
Cracking A Vig With Ic
Topic Started: May 7 2008, 03:33 AM (1,091 Views)
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
I was talking about using the Index of Coincidence to break Vigs, and someone asked for an example.

Let me start by saying that in my opinion, the most important thing computers can do for us, when we're playing around with hobby crypto, is to create cyphertexts for us, without our having to know which keys were used.

So we'll start with a ciphertext, encrypted as a plain Vig, with a keyword length of somewhere between 3 and 10.
Code:
 
  AOPLP QYEQK OMSJL URNKF NLYDN CVYCQ HSCSS IRZNL AJOWZ ZIQYC RMYBR JZLQB
  UPGIB ZRLXQ KJUEM ZEXIH JTMKI CLUYN KUTLP NCWZB JXAFN LIAUS QOYEC RBGOP
  UMCBE QOOSH PRMIO UHQAF NLHDL GUJSU NEIJF MZWLI QKRFY LLCYE XNPFO CAFSI
  MBZKQ QIVOI PZAPJ MXZGD TVRQY VCOHT HLMFS RXFEA SYIRZ GDTVR QYVQU JOLYW
  XWJAO POYAG SIOAQ LYLBS ONFDM PXHXV TENHA TZSGR PIVLN URJZI QYRQY VZLPD
  LRQYW VMFSV CBUFJ AUPVG WARJZ IUUEC YEXNJ ZNLXN EGQJZ NRQYX ZTUTV SBURY
  ZIUUE BLMNK BZKDJ FPROU TVSCW IVYFO YCJNM IMZQA LXNAJ XLUUE HYXIU UFHIR
  HKXXF PPRFI VFOTP VLNNL ZTGAY EXNXZ TUTLP NZSMK JFSYB NWAUS QCCA
Let's start with some frequency counts:
Code:
 
 L  U  Y  Z  N  Q  I  R  J  X  S  P  O  F  A  V  C  M  E  T  B  H  K  G  W  D
28 27 26 25 25 23 23 22 20 19 19 19 19 19 19 17 17 16 15 14 13 12 11 10  9  7
For a total of 473 letters.

Every letter appears, and even the least frequent letters appear with some regularity. This is far flatter a distribution than we'd expect from a mono-alphabet substitution cipher, and the Index of Coincidence should reflect that.

The kappa for this ciphertext is calculated from the frequency counts, as follows:
Code:
 
 kappa = ( 28 * 27 + 27 * 26 + 26 * 25 + 25 * 24 + 25 * 24 + 23 * 22 +
           23 * 22 + 22 * 21 + 20 * 19 + 19 * 18 + 19 * 18 + 19 * 18 +
           19 * 18 + 19 * 18 + 19 * 19 + 17 * 16 + 17 * 16 + 16 * 15 +
           15 * 14 + 14 * 13 + 13 * 12 + 12 * 11 + 11 * 10 + 10 *  9 +
            9 *  8 +  7 *  6 ) / ( 473 * 472 )

    = 9,440 / 224,202
    = 0.04210
We scale this by the kappa of random text, to obtain the Index of Coincidence:
Code:
 
 IC = 0.04210 / 0.03846
    = 1.095
Now remember what we expect for an IC. Random text should give us an IC of 1.0. Ordinary English text should give us an IC of 1.7. What we have is an IC of 1.1, which is very near to random text, and exactly what we would expect of Vig using anything other than a very short keyword.

So, next we have to find the length of the keyword.

First, trying a keyword of length two, we calculate the IC for the even and for the odd letters. The letters with indexes of 0 and 1, mod 2.
Code:
 
IC[0 mod 2] = 1.167
IC[1 mod 2] = 1.120
The average:
Code:
 
  IC[n=2] = 1.143
And we do the same for a keylength of 3:
Code:
 
  IC[0 mod 3] = 1.551
  IC[1 mod 3] = 1.374
  IC[2 mod 3] = 1.432
  IC[n=3] = 1.452
And continue:
Code:
 
  IC[n=2] = 1.143
  IC[n=3] = 1.452
  IC[n=4] = 1.270
  IC[n=5] = 1.196
  IC[n=6] = 1.545
  IC[n=7] = 1.304
  IC[n=8] = 1.388
  IC[n=9] = 2.147
  IC[n=10] = 1.305
  IC[n=11] = 1.274
  IC[n=12] = 1.821
  IC[n=13] = 1.389
  IC[n=14] = 1.418
  IC[n=15] = 1.706
  IC[n=16] = 1.358
  IC[n=17] = 1.272
  IC[n=18] = 2.312
  IC[n=19] = 1.360
  IC[n=20] = 1.465
Our IC peaks where n=9 and again where n=18. Which is what we'd expect for a keyword of length 9.

So lets assume a keyword of length 9, and look at the slices. Slice 0 has this frequency count:
Code:
 
A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z
3  1  0  0  0  0  3  0  2  1  5  2  3  3  5  0  1  4  0  5  6  0  0  3  2  4
And a total of 53 letters.

Doing a chi-test against a file of english letter frequencies, we get 0.04179.

Next is to try shifting. Our chi tests for shifting slice 0 from 0-25:
Code:
 
0  0.04179
1  0.03688
2  0.03963
3  0.03395
4  0.03166
5  0.03732
6  0.06339
7  0.04039
8  0.03832
9  0.03728
10  0.04022
11  0.03069
12  0.04337
13  0.04017
14  0.02968
15  0.03483
16  0.03733
17  0.03807
18  0.03472
19  0.04533
20  0.03865
21  0.04179
22  0.03650
23  0.03672
24  0.03425
25  0.03708
We have a clear peak at a shift of 6, or a keyword value of 'G'. ('A' == 0).

Do the same for slice 2:
Code:
 
0  0.03832
1  0.06781
2  0.03819
3  0.03517
4  0.03061
5  0.04191
6  0.03524
7  0.04081
8  0.03222
9  0.03090
10  0.03172
11  0.03630
12  0.04425
13  0.03795
14  0.04354
15  0.03519
16  0.04742
17  0.04018
18  0.03890
19  0.02796
20  0.03985
21  0.04079
22  0.03877
23  0.04293
24  0.03418
25  0.02887
Here, we have a peak at 1, or 'B'. Do the same for slices 3-8:
Code:
 
slice 0:   6["G"]  0.06339
slice 1:   1["B"]  0.06781
slice 2:  12["M"]  0.06365
slice 3:   7["H"]  0.07324
slice 4:  11["Y"]  0.06431
slice 5:   9["J"]  0.06622
slice 6:   20["U"]  0.07353
slice 7:    4["E"]  0.06646
slice 8:   21["V"]  0.06481
Or a keyword of "GBMHYJUEV". (Which is what I should expect, having used random letters for a keyword, instead of picking some nice, readable word out of a dictionary.)

Decrypted with that keyword our message is:
Code:
 
UNDER HEAVE NALLC ANSEE BEAUT YASBE AUTYO NLYBE CAUSE THERE ISUGL INESS
ALLCA NKNOW GOODA SGOOD ONLYB ECAUS ETHER EISEV ILTHE REFOR EHAVI NGAND
NOTHA VINGA RISET OGETH ERDIF FICUL TANDE ASYCO MPLEM ENTEA CHOTH ERLON
GANDS HORTC ONTRA STEAC HOTHE RHIGH ANDLO WREST UPONE ACHOT HERVO ICEAN
DSOUN DHARM ONIZE EACHO THERF RONTA NDBAC KFOLL OWONE ANOTH ERTHE REFOR
ETHES AGEGO ESABO UTDOI NGNOT HINGT EACHI NGNOT ALKIN GTHET ENTHO USAND
THING SRISE ANDFA LLWIT HOUTC EASEC REATI NGYET NOTWO RKING YETNO TTAKI
NGCRE DITWO RKISD ONETH ENFOR GOTTE NTHER EFORE ITLAS TSFOR EVER
If the text is long enough, this almost always works. As it gets shorter, the differences between the chi tests becomes less distinct. I should note, this text is three times as long as the recommended length for Vigs in the ACA. Shorter messages are more challenging, because normal variation can make the statistics less clear, Even in this one, slice 4 nearly had an incorrect peak. The correct shift had a chi of 0.06431 for a "Y", but a shift of 11 had a chi of 0.06069, which would have been an "L". If one or two slices have incorrect shifts, though, the text usually comes through, if you format the decrypted text one line per keyword length:
Code:
 
  0 1 2 3 4 5 6 7 8
  U N D E E H E A V
  E N A L Y C A N S
  E E B E N U T Y A
  S B E A H T Y O N
  L Y B E P A U S E
  T H E R R I S U G
  L I N E F S A L L
  C A N K A O W G O
  O D A S T O O D O
  N L Y B R C A U S
From this, it's obvious that you're on the right track, and that its slice four that is incorrect.
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post Goto Top
 
Donald
Elite member
[ *  *  *  *  * ]
This is incredibly detailed good stuff, thanks! I added it to the tutorials.
Offline Profile Quote Post Goto Top
 
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
I'd been intending to take a shot at cracking polyalphabetic substitutions - Vig, Beaufort, Variant, Porta, Gronsfeld, Quagmires I-IV - for some time. Now that I'm done wasting my time on cryptograms.org, I've started putting some scripts together.
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post Goto Top
 
Revelation
Member Avatar
Administrator
[ *  *  *  *  * ]
I've posted a polyalphabetic substitution cipher a while ago. I was surprised about how fast it was cracked!

Thanks for the tutorial btw :)
RRRREJMEEEEEPVKLWENFNVJKEEEEEAOLKAFKLXCFZAASDJXZTTTTTTTLSIOWJXMOKLAFJNNKFNXN
RAGRBAQEMHIGDJVDSEOXVIYCELFHWLELJFIENXLRATALSJFSLCYTKLASJDKMHGOVOKAJDNMNUITN
RRRRLJVEEEEECLYVYHNVPFTAEEEEEMWLMEIRNGLARWJAKJDFLWNTIERJMIPQWOTZEOCXKNUBNXCN
RJIRPOWEANFUSNCZVDVZNMSFEKLOEPZLDKDJWSAAAAAAAOERHJCTNCKFRIMVKSOFOMKMANREWNBN
RZUDRGXEEEEENFQIDVLQNCKNEEEEEDGLLLLLLAWIOSNCDARLODMTOEJXMILDFJROTKJSDNLVCZNN
Offline Profile Quote Post Goto Top
 
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
Revelation
May 8 2008, 09:21 AM
I've posted a polyalphabetic substitution cipher a while ago. I was surprised about how fast it was cracked!
Truth is, once you know how to find the keylength, Vig is easier to crack than simple substitution. Underneath, it's just a couple of Caesar shifts. It's Pats that drive me nuts.
Revelation
May 8 2008, 09:21 AM
Thanks for the tutorial btw :)
I have one more in mind, on what I've learned about attacking Pats when you don't have a crib.
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post Goto Top
 
Paarth Dave
Advanced Member
[ *  *  * ]
Quote:
 
So, next we have to find the length of the keyword.

First, trying a keyword of length two, we calculate the IC for the even and for the odd letters. The letters with indexes of 0 and 1, mod 2.


How do you do that?

And how do you do a chi test?

Cryptography Vanquished....
Offline Profile Quote Post Goto Top
 
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
Paarth Dave
May 9 2008, 01:09 AM
How do you do that?
Have you read: http://z13.invisionfree.com/Crypto/index.php?showtopic=456?

You split the ciphertext into n slices, where n is your provisional keylength.

So if you're testing to see if the keylength is 3, you split the ciphertext into three slices, where slice 0 has letters 0,3,6,9,12,...slice 1 has letters 1,4,7,10,..., and slice 2 has letters 2,5,8,11,...

The for each slice we do a frequency count, and then calculate the Incidence of Coincidence of that slice, and then we average the IC's for the n slices.
Paarth Dave
May 9 2008, 01:09 AM
And how do you do a chi test?
You need two frequency counts, one of slice you're testing, and one of a piece of standard English text that you're using as a reference.

Suppose you have F1: {A=3, B=1, C=1, D=1, E=12, F=3,...} and F2: {A=8, B=1, C=3, D=0, E=21, F=8,...}. Where the sum of the counts for F1 is N1: 105, and the sum of the counts for F2 is N2: 306. You multiple the count for A in F1 by the count for A in F2, and add to it the product of the count for B in F1 and B in F2, etc. Then divide by the product of the sums.
Code:
 
(F1[A]*F2[A] + F1[B]*F2[B] + F1[C]*F2[C] + F1[D]*F2[D] + F1[E]*F2[E] + F1[F]*F2[F] + ... ) / (N1 * N2)

(3*8 + 1*1 + 1*3 + 1*0 + 12*21 + 3*8 + ...) / (105*306)
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post Goto Top
 
Donald
Elite member
[ *  *  *  *  * ]
"Jdege"
 
It's Pats that drive me nuts

Pats are hard, but rot13's playfair was the hardest nut I ever cracked. And the scary thing is, that was a very easy playfair with a simple and obviously keyed matrix and a nice long easy to place crib. Whoo!
Offline Profile Quote Post Goto Top
 
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
Donald
May 9 2008, 03:02 PM
Pats are hard,

And the problem is that the solutions of so many of the polyalphabets involve converting to a pat.

The Quags and the Gronsfeld, are, in effect, mixed-alphabet Vigs. So the approach I described here should work for all of them. It would even work for the Porta. though the shifting would be a bit different.

But what this gives you is a Pat - which I can pretty much always solve given a couple of hundred letters to work with, but for shorter messages I get stuck as much as half the time. I can cheat the ACA K1-K3, with a dictionary attack. But the unkeyed and the K4 I just bang my head against as often as not.


When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post Goto Top
 
Paarth Dave
Advanced Member
[ *  *  * ]
Quote:
 
You split the ciphertext into n slices, where n is your provisional keylength.

So if you're testing to see if the keylength is 3, you split the ciphertext into three slices, where slice 0 has letters 0,3,6,9,12,...slice 1 has letters 1,4,7,10,..., and slice 2 has letters 2,5,8,11,...

The for each slice we do a frequency count, and then calculate the Incidence of Coincidence of that slice, and then we average the IC's for the n slices.


So for calculating the IOC of the n slices, do we also have to do the kappa test?

Cryptography Vanquished....
Offline Profile Quote Post Goto Top
 
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
Paarth Dave
May 9 2008, 11:16 PM
So for calculating the IOC of the n slices, do we also have to do the kappa test?

The IC is the kappa test, divided by a scaling factor. And yes, we need to calculate it for each slice, then calculate the average for all of the slices, then do it again for the next prospective key length.

It's tedious to do it by hand, but it's what computers were invented for. (And I'm not being figurative - it's solving exactly this kind of problem for which the first computer was invented.)
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post Goto Top
 
Paarth Dave
Advanced Member
[ *  *  * ]
Can you please explain to me the chi test in simple terms? I have read your topics based on it...but it went above my head...

Cryptography Vanquished....
Offline Profile Quote Post Goto Top
 
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
Paarth Dave
May 10 2008, 12:06 AM
Can you please explain to me the chi test in simple terms? I have read your topics based on it...but it went above my head...

You use the chi test to measure how similar two frequency distributions are.

When you draw a graph a frequency distribution, you get a curve with peaks and valleys. If two messages are encoded with the same alphabet, they will have similar curves. If they are encoded with alphabets that are shifted relative to one another, they will have curves that won't match up - the peaks and valleys of one will be shifted relative to the other.

The Vig uses the standard alphabet - A-Z, shifted by an amount equal to the numeric value of the key letter. If the key letter is 'B', the cipher text consists of the plaintext shifted by one character, if it's 'D', it's shifted by three.

So, after we've figured out how many alphabets there are - how many letters there are in the keyword - the next step is to figure out by how much each of the alphabets has been shifted.

And we do that by trying out the 26 different shift amounts, and comparing the frequency distribution of each against the frequency distribution of plain text, to see which lines up the best.

This is the frequency distribution of normal English text:
Code:
 
        #                                          
        #                                          
        #                             #            
        #                             #            
#       #                   #         #            
#       #       #         # #         #            
#       #     # #         # #     # # #            
#     # #     # #     #   # #     # # #            
#   # # #     # #     #   # #     # # # #          
#   # # # # # # #     # # # # #   # # # #   #   #  
# # # # # # # # #   # # # # # #   # # # # # #   #
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Now suppose you were trying to crack a vig, and you'd already figured out that it had a five-letter keyword. Starting at the first letter, you'd take every fifth letter, and do a frequency distribution, and you might get something like this:
Code:
 
                  #                                  
                  #       #                          
                # #       # #                          
      #         # #       # # #           #       #    
      #         # #     # # # #           #       #    
      #         # #     # # # #       #   #       #    
      #     #   # #     # # # #       #   #     # #    
  # # #     # # # #     # # # #   # # #   # # # # #    
# # # #   # # # # # #   # # # # # # # #   # # # # #    
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Now this doesn't line up nicely with the graph above. But it does have the same number of peaks and valleys. So when we shift it over a bit, we get a graph that does line up nicely:
Code:
 
                            #                        
                            #       #                
                          # #       # #                
#       #       #         # #       # # #              
#       #       #         # #     # # # #              
#       #       #         # #     # # # #       #      
#     # #       #     #   # #     # # # #       #      
# # # # #   # # #     # # # #     # # # #   # # #      
# # # # # # # # #   # # # # # #   # # # # # # # #      
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
The question is how do you recognize that you things lined up properly? It's not a matter of finding the perfect fit, because you'll never get a perfect fit. It's a matter of recognizing which is the best fit, of the 26 possibilities you have.

You can do it by eye, but we'd really rather have a process we can use that's a little more reliable, and it'd be really nice if we had something we could program into a computer.

That's what the chi test is - a measure of how closely two frequency distributions match.

And the calculation is pretty simple:
Code:
 
        1st     2nd
Letter   Text    Text      Product

A         12        4      12 * 4 = 48
B          6        2       6 * 2 = 12
C          3        1       3 * 1 =  3

  ...

Z          1        0       1 *  0 = 0
______________________________________
SUM      257      101              741
257 is the sum of the frequency counts of the first text - the total number of latters in the first text. 101 is the sum of the frequency counts of the second text.

741 is the sum of the products of the frequncy counts for each letter.

And chi is:
Code:
 
 chi = 714 / (257 * 101)
The sum of the products divided by the product of the sums

Your result will range from about 0 - meaning that the frequency distributiions are inverses of each other (one peaks where the other vallteys, and vice versa), to about 0.0385 - meaning that there is no correlation whatsoever between the two distributions, to about 0.070 - meaning that the two distributions line up almost perfectly.
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post Goto Top
 
Paarth Dave
Advanced Member
[ *  *  * ]
Can you explain it to me with an example?

Cryptography Vanquished....
Offline Profile Quote Post Goto Top
 
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
Paarth Dave
May 10 2008, 01:48 AM
Can you explain it to me with an example?

That was an example.

What are you having problems with?

How to do the arithmetic?

How to calculate the frequencies?

How you shift an alphabet?

When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post Goto Top
 
1 user reading this topic (1 Guest and 0 Anonymous)
Go to Next Page
« Previous Topic · General · Next Topic »
Add Reply