Welcome Guest [Log In] [Register]
Welcome to Crypto. We hope you enjoy your visit.


You're currently viewing our forum as a guest. This means you are limited to certain areas of the board and there are some features you can't use. If you join our community, you'll be able to access member-only sections, and use many member-only features such as customizing your profile, sending personal messages, and voting in polls. Registration is simple, fast, and completely free.


Join our community!


If you're already a member please log in to your account to access all of our features:

Username:   Password:
Add Reply
Hill-levine Matrix Cipher
Topic Started: Jan 29 2007, 05:54 AM (335 Views)
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
I'd mentioned that I'd managed to get a hold of some back-issues of Cryptologia. I've since found out that all of the back-issues are available electronically:

http://www.informaworld.com/smpp/title~db=...~tab=issueslist

In any case, I've been looking at the Cryptanalysts' Corner column. The first column was published in the January issue, 1978. The first problem was:

QHDIW QQQEI WFRLI YLUIO WQUVC NQDHV SNTQV YRLEP RVMND ERMOA GTNFQ QGWBS TJXCR IWQUH PBQME XMTXH WFXJS ACOZA SPKGS PAOYV NSJQK JXHZU PACAA I

This is encrypted using the Hill-Levine system [SINKOV, "Elementary Cryptanalysis", 1968, p. 115 et seq.]

The alphabet is the integers mod-26, A=1, Z=26 (well, Z=0, but that's never the way they say it).

The key is a matrix of four integers, mod-26, and the system enciphers two plaintext characters into two ciphertext characters:

CT1 = PT1*A + PT2*B mod 26
CT2 = PT1*C + PT2*D mod 26

The author had encouraged his readers to take up their pencils, charge up their programmable calculators, and "even use a computer if you're lucky enough to have one". I'll freely admit I'd not been able to crack this using any computer I'd have been able to own in 1978.

The author offers two cribs: SUBMARINE and OBSERVING.

I have no access to the boks that the author cited as having breaks for this system. On my own, I could think of two:

1:If you can correctly guess two digraphs, you have four equations in four unknowns, which is solvable. Or would be solvable, if we weren't doing math mod-26. The basic complication is division. if you have 4D = 8, D could be 2, or D could be 15. If you have 4D = 7, D is undefined. There is no integer, mod-26, that can be multiplied times 4 to get 7.

But these simply complicate things. The difficulty is correctly guessing two digraphs. I'd thought that I'd be able to match the digraphs with the highest frequency. What I discovered is that the ciphertext is simply too short for digraph frequencies to mean much.


2:Brute force. The key is an invertable 2x2 matrix of integers mod-26. Four integers mod-26 gives 456,976 combinations. And a lot of them result in matrices that cannot be inverted. There are 157,248 invertable matrices. On my machine, running an admittedly inefficient program (a shell script generating the keys, calling a C++ program once for each possible key) took roughly an hour. The Heathkit H-8 I owned in 1978 had an 8-bit 8080 CPU running at 2MHz, and 4k RAM. It'd not have been much help.
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post Goto Top
 
rot13
Elite member
[ *  *  *  *  * ]
This one is particularly tricky because one of the cribs appears twice, once beginning in an even-numbered column, once in an odd-numbered column. The other crib will yield the right value.

I'll put the rest in white to avoid spoiling it for anyone who wants to try cracking it.


You don't have to go through 26^4 combinations, only 676, which should be easily crackable with 1978 technology. Look again at the two formulas:

CT1 = PT1*A + PT2*B mod 26
CT2 = PT1*C + PT2*D mod 26

Notice that CT1 does not depend on C or D and that CT2 does not depend on A or B. The decryption formula works the same way, so when decrypting, PT1 doesn't depend on the inverses of C and D, nor does PT2 depend on the inverses of A and B.

This means that you really only need to try 26*26 values. Your outer loop is the 26 values for A and C, your inner loop is the 26 values for B and D. For now, just think about the outer loop as X and the inner loop as Y. Inside the loop, you compute:
CT1 = PT1*X + PT2*Y mod 26

Now when X is equal to the correct A and Y is equal to the correct B, your decrypted text contains every other plaintext letter starting at the beginning. If you're looking for a crib word, you have to look at every other letter in the crib word.

When X is equal to the correct value for C and Y is equal to the correct value for D, your decrypted text contains every other plaintext letter starting at the second letter. Again, when looking for cribs, you have to look at every other letter. Since you don't know whether the crib word begins on an even or odd boundary, you have to try both. Unless you have the case where the crib word occurs multiple times on multiple boundaries, there should be only one valid combination.


For a longer text, you can solve this without a crib. You just look for the two combinations that yield frequencies that are closest to English. One of them will yield the odd letters, the other yields the even letters. Just print it out each way and one should be correct. When I try that technique on this text, I get one half correct, but the other is wrong.
Offline Profile Quote Post Goto Top
 
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
rot13
Jan 29 2007, 08:54 PM
This one is particularly tricky because one of the cribs appears twice, once beginning in an even-numbered column, once in an odd-numbered column. The other crib will yield the right value.

I'll put the rest in white to avoid spoiling it for anyone who wants to try cracking it.

In WHITE:

I've generally found that I have better success matching di or trigraph frequencies than individual letters, but if you're solving for only every other letter, that'd not work...

Your approach is 26x26x2, instead of 26x26x26x26 - 1352 trials instead of 456976. Just a bit of a difference :)

Did you recognize this as a possible approach from analyzing the problem? Or did you find it in Sinkov?
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post Goto Top
 
rot13
Elite member
[ *  *  *  *  * ]
I came up with that approach when attacking a Hill cipher last year. The cipher in question used a 3x3 matrix instead of 2x2. I realized that it really wasn't much different from a Vigenere cipher with a key length of 3, just that the keyspace at each position was much larger. I don't have Sinkov's book, unfortunately.


I might quibble as to whether it is 676 or 1352. My particular attack involves only one pass through the 676 xy combinations, and only one decrypt of the text each time. The thing that is done twice is the scan for the halves of the key value.


In general, I use frequency matching when solving periodic ciphers like the Vigenere family, where I am knocking down one piece at a time. While I may occasionally use digraph and trigraph matching, I often resort to a more compute-intensive scoring mechanism that looks for words by the time I get to the point where multi-letter frequencies would start to help.

For example, the program to crack the recent transposition challenge uses the word matching and just brute forces the possible starting position. Not exactly quick 1978 technology, but these days it finds it immediately.
Offline Profile Quote Post Goto Top
 
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
rot13
Jan 30 2007, 12:36 AM
In general, I use frequency matching when solving periodic ciphers like the Vigenere family, where I am knocking down one piece at a time. While I may occasionally use digraph and trigraph matching, I often resort to a more compute-intensive scoring mechanism that looks for words by the time I get to the point where multi-letter frequencies would start to help.

For example, the program to crack the recent transposition challenge uses the word matching and just brute forces the possible starting position. Not exactly quick 1978 technology, but these days it finds it immediately.

I've only used word matching on Aristocrats, where the word boundaries are given.
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post Goto Top
 
rot13
Elite member
[ *  *  *  *  * ]
jdege
Jan 30 2007, 01:21 AM
rot13
Jan 30 2007, 12:36 AM
In general, I use frequency matching when solving periodic ciphers like the Vigenere family, where I am knocking down one piece at a time. While I may occasionally use digraph and trigraph matching, I often resort to a more compute-intensive scoring mechanism that looks for words by the time I get to the point where multi-letter frequencies would start to help.

For example, the program to crack the recent transposition challenge uses the word matching and just brute forces the possible starting position. Not exactly quick 1978 technology, but these days it finds it immediately.

I've only used word matching on Aristocrats, where the word boundaries are given.

My word matcher is a little odd. I make use of a dawg (directed acyclic word graph) that is normally used by Scrabble programs for word validation. I proceed down the text looking for sequences that represent a word and in each position where I find a word, I start from that next position and recursively search for words again. I will also skip positions and try searching for words. Overall, it is fairly compute intensive, but I find that in the overall scheme of things, it helps the programs zoom in on a correct solution much faster than using n-graphs. The handy thing about my matcher is that it doesn't need word breaks, in fact it ignores them.
Offline Profile Quote Post Goto Top
 
rot13
Elite member
[ *  *  *  *  * ]
Okay, I know you're gonna say I'm sick, but I wanted to see how this would have run on a vintage computer, so I used an Altair Z80 simulator running CP/M 2 and Microsoft Basic. I set the clock speed to 4 MHz. It took about 10 minutes for the program to run, but it did come up with the correct answer, although I would estimate that it could have taken almost half an hour in the worst case.

Here is the program to solve it. My basic is a bit rusty, but then this basic was more like the good old days anyway - for if/then you either do a goto or execute a single statement. This led to some ugly spaghetti and I could probably tune it up a bit, but it should at least be tolerable. I just downloaded a C compiler for the Altair, I'll try that when I get a chance.

Code:
 

10 CT$="QHDIWQQQEIWFRLIYLUIOWQUVCNQDHVSNTQVYRLEPRVMNDERMOAGTNFQQGWBSTJXCRIWQUHPB
QMEXMTXHWFXJSACOZASPKGSPAOYVNSJQKJXHZUPACAAI"
20 MATCH1$="SBAIE"
30 MATCH2$="UMRN"
40 DIM CT(200), PT(200), MATCH1(5), MATCH2(5)
50 CTLEN = LEN(CT$)
60 REM convert string to int mod 26
70 FOR I=1 TO CTLEN
80 CT(I) = (1+ASC(MID$(CT$,I,1))-65) MOD 26
90 NEXT I
100 REM convert match1$ to int mod 26
110 M1LEN = LEN(MATCH1$)
120 FOR I = 1 TO M1LEN
130 MATCH1(I) = ASC(MID$(MATCH1$,I,1))-65
140 NEXT I
150 REM convert match2$ to int mod 26
160 M2LEN = LEN(MATCH2$)
170 FOR I = 1 TO M2LEN
180 MATCH2(I) = ASC(MID$(MATCH2$,I,1))-65
190 NEXT I
200 FOUND1=0
210 F1POS=-1
220 A=0
230 B=0
240 FOUND2=0
250 F2POS=-1
260 C=0
270 D=0
280 FOR X = 0 TO 25
290 FOR Y = 0 TO 25
300 REM decrypt using x and y
310 PTLEN = 0
320 FOR I = 1 TO CTLEN-1 STEP 2
330 PT(PTLEN)=(26+X*CT(I)+Y*CT(I+1)-1) MOD 26
340 PTLEN = PTLEN + 1
350 NEXT I
360 REM if a&b haven't been found, see if it is present here
370 IF FOUND1 GOTO 550
380 FOR I = 1 TO PTLEN-M1LEN
390 FOUND = 1
400 FOR J = 1 TO M1LEN
410 REM if the pt at this pos matches the match1 at this pos keep going
420 IF PT(I+J-1) = MATCH1(J) GOTO 460
430 REM if we get here, the chars didn't match, go try the next pos in the pt
440 FOUND = 0
450 GOTO 540
460 NEXT J
470 REM at this point, every char in match1 matched the pt, so we have A&B
480 FOUND1=1
490 F1POS=I
500 A=X
510 B=Y
520 REM if we have already found C&D, we are done
530 IF FOUND2=1 GOTO 780
540 NEXT I
550 REM if C&D haven't been found, see if it is present here
560 IF FOUND2 = 1 GOTO 740
570 FOR I = 1 TO PTLEN - M2LEN
580 FOUND = 1
590 FOR J = 1 TO M2LEN
600 REM if the pt at this pos matches the match1 at this pos keep going
610 IF PT(I+J-1) = MATCH2(J) GOTO 650
620 REM if we get here, the chars didn't match, go try the next pos in the pt
630 FOUND = 0
640 GOTO 730
650 NEXT J
660 REM at this point, every char in match2 matched the pt, so we have C&D
670 FOUND2 = 1
680 F2POS=I
690 C=X
700 D=Y
710 REM if we have already found A&B, we are done
720 IF FOUND1 = 1 GOTO 780
730 NEXT I
740 NEXT Y
750 NEXT X
760 PRINT "Unable to locate matches"
770 END
780 REM we are done, we should have found A&B and C&D
790 PTSTR$=""
800 FOR I=1 TO CTLEN-1 STEP 2
810 PT1=(26+A*CT(I)+B*CT(I+1)-1) MOD 26
820 PT2=(26+C*CT(I)+D*CT(I+1)-1) MOD 26
830 REM if f1 and f2 are in the same char pair, print pt1 first, then pt2
840 REM otherwise, pt2 should come first
850 IF F2POS = F1POS GOTO 880
860 PTSTR$ = PTSTR$+CHR$(65+PT2)+CHR$(65+PT1)
870 GOTO 890
880 PTSTR$ = PTSTR$+CHR$(65+PT1)+CHR$(65+PT2)
890 NEXT I
900 PRINT PTSTR$

Offline Profile Quote Post Goto Top
 
Donald
Elite member
[ *  *  *  *  * ]
That is cool. I have an old TRS-80 color computer packed away, I should pull it out some day and see if I can still get it to work. :)
Offline Profile Quote Post Goto Top
 
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
rot13
Jan 30 2007, 03:10 AM
330 PT(PTLEN)=(26+X*CT(I)+Y*CT(I+1)-1) MOD 26

I don't understand why the -1 in line 330. Yes, Basic uses 1-based arrays, instead of C's 0-based arrays, but modular arithmetic I'd think would be the same.

In any case, a simple C conversion, running on my 1200MHz Cempron, finished in 35 milliseconds.

Code:
 
#include <stdio.h>
#include <string.h>

int main()
{
char *ct =
 "QHDIWQQQEIWFRLIYLUIOWQUVCNQDHVSNTQVYRLEPRVMNDERMOAGTNFQQGWB"
 "STJXCRIWQUHPBQMEXMTXHWFXJSACOZASPKGSPAOYVNSJQKJXHZUPACAAI";
char *match1 = "SBAIE";
char *match2 = "UMRN";
char pt[200];

int ctLen = strlen(ct);

bool found1 = false;
int a = 0;
int b = 0;

bool found2 = false;
int c = 0;
int d = 0;

for (int x=0; x<26; x++)
{
 for (int y=0; y<26; y++)
 {
  // decrypt using x and y
  int ptLen = 0;
 
  for (int i=0; i<ctLen-2; i+=2)
  {
   int ct1 = (ct[i] - 'A' + 1) % 26;
   int ct2 = (ct[i+1] - 'A' + 1) % 26;
   int pt1 = (x*ct1 + y*ct2) % 26;
   
   if (pt1 == 0)
    pt1 = 26;
   
   pt[ptLen++] = pt1 + 'A' - 1;
  }
  pt[ptLen] = '\0';
 
  // if if a&b haven't been found, see if it is present here
  if (!found1)
  {
   char *p = strstr(pt, match1);
   if (p)
   {
    found1 = true;
    a = x;
    b = y;
   }
  }
  else if (!found2)
  {
   char *p = strstr(pt, match2);
   if (p)
   {
    found2 = true;
    c = x;
    d = y;
   }
  }
 }
}

if (found1 && found2)
{
 int ptLen = 0;
 for (int i=0; i<ctLen-2; i+=2)
 {
  int ct1 = (ct[i] - 'A' + 1) % 26;
  int ct2 = (ct[i+1] - 'A' + 1) % 26;

  int pt1 = (a*ct1 + b*ct2) % 26;
  if (pt1 == 0)
   pt1 = 26;
  pt[ptLen++]  = pt1 + 'A' - 1;

  int pt2 = (c*ct1 + d*ct2) % 26;
  if (pt2 == 0)
   pt2 = 26;
  pt[ptLen++]  = pt2 + 'A' - 1;
 }

 pt[ptLen++]  = '\0';

 printf("%d:%d:%d:%d %s\n", a, b, c, d, pt);
}
else
 printf("No solution\n");

return 0;
}

When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post Goto Top
 
rot13
Elite member
[ *  *  *  *  * ]
jdege
Jan 30 2007, 01:54 PM
rot13
Jan 30 2007, 03:10 AM
330 PT(PTLEN)=(26+X*CT(I)+Y*CT(I+1)-1) MOD 26

I don't understand why the -1 in line 330. Yes, Basic uses 1-based arrays, instead of C's 0-based arrays, but modular arithmetic I'd think would be the same.

It isn't actually the 1-based arrays that is the problem here, it is that A-Z are numbered 1-26(0) instead of 0-25. When I convert the match texts to 0-25, I just subtract 65. It would have been slightly faster, I guess, to convert the match texts so that A=1 instead of 0, then I wouldn't have to subtract 1 each time when decrypting.

I originally wrote my program in C, but when I wanted to run something that might have been present in a book on cryptanalysis with microcomputers from the late 70's, I tried to go for something from that era. I think I may still try converting to this old-style C compiler (bdsc) just to see how much faster it is than the basic version.

It was kinda fun doing the basic. It brought back a lot of memories. It has been a long time since I used a line-numbering editor like that.
Offline Profile Quote Post Goto Top
 
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
rot13
Jan 30 2007, 02:03 PM
jdege
Jan 30 2007, 01:54 PM
rot13
Jan 30 2007, 03:10 AM
330 PT(PTLEN)=(26+X*CT(I)+Y*CT(I+1)-1) MOD 26

I don't understand why the -1 in line 330. Yes, Basic uses 1-based arrays, instead of C's 0-based arrays, but modular arithmetic I'd think would be the same.

It isn't actually the 1-based arrays that is the problem here, it is that A-Z are numbered 1-26(0) instead of 0-25. When I convert the match texts to 0-25, I just subtract 65. It would have been slightly faster, I guess, to convert the match texts so that A=1 instead of 0, then I wouldn't have to subtract 1 each time when decrypting.

I originally wrote my program in C, but when I wanted to run something that might have been present in a book on cryptanalysis with microcomputers from the late 70's, I tried to go for something from that era. I think I may still try converting to this old-style C compiler (bdsc) just to see how much faster it is than the basic version.

It was kinda fun doing the basic. It brought back a lot of memories. It has been a long time since I used a line-numbering editor like that.

Gotcha. I'd figured that since you'd already converted the arrays in advance, you'd not have needed to convert each character. But I didn't look closely enough at what you'd done in that first conversion.

In the C++, I kept the arrays in ASCII, so that I could use strstr() to find the matches. Though to tell you the truth, if I'd been writing this from scratch, I'd have copied ct2 into pt, and done a regex match for "S.B.A.I.E|U.M.R.N".

It's been a long time since I've coded BASIC. I have an old Atari 400 lying around. I wonder if I could get it working?
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post Goto Top
 
rot13
Elite member
[ *  *  *  *  * ]
jdege
Jan 30 2007, 03:34 PM
It's been a long time since I've coded BASIC.  I have an old Atari 400 lying around.  I wonder if I could get it working?

Emulation is another option. I see there are a number of Atari emulators, you might give that a shot. You can emulate at the original speed but you don't have to take up any additional physical space.

Donald, there are some Coco emulators out there, too, including one in Java.
Offline Profile Quote Post Goto Top
 
1 user reading this topic (1 Guest and 0 Anonymous)
« Previous Topic · Challenges · Next Topic »
Add Reply