Welcome Guest [Log In] [Register]
Welcome to Crypto. We hope you enjoy your visit.


You're currently viewing our forum as a guest. This means you are limited to certain areas of the board and there are some features you can't use. If you join our community, you'll be able to access member-only sections, and use many member-only features such as customizing your profile, sending personal messages, and voting in polls. Registration is simple, fast, and completely free.


Join our community!


If you're already a member please log in to your account to access all of our features:

Username:   Password:
Add Reply
Brute Forcing The Adfgvx Cipher
Topic Started: Apr 19 2008, 08:06 AM (2,746 Views)
Revelation
Member Avatar
Administrator
[ *  *  *  *  * ]
Good news! After doing the kappa test I get only 6 possible keys and the program runs faster because it doesn't have to print so many things.

Note: I use 1.60 and 1.80 as boundaries for valid I.C.s.

Code:
 

// By Ben Ruyl

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <assert.h>

#define MAXKEYLENGTH 30
#define MAXCRIB 30
#define MAXCIPHER 2000
#define MAXFILENAME 128

char crib[MAXCRIB];
int cribpat[MAXCRIB];
char ct[MAXCIPHER];

int begin = 1; // standard cipher length region, if not specified
int end = 12;

int results = 0; // count the matches

void createpattern(char* v, int* pat)
{
   int i = 0, j = 0, highpat = 0;

   for (i = 0; i < strlen(v); i++)
   {
       int found = 0;
       for (j = 0; j < i; j++)
           if (v[i] == v[j])
           {
               pat[i] = pat[j];
               found = 1;
           }

       if (!found)
       {
           pat[i] = highpat;
           highpat++;
       }
   }
}

void createpatternbigram(char* v, int* pat)
{
   int i = 0, j = 0, highpat = 0;

   assert(strlen(v) % 2 == 0);

   for (i = 0; i < strlen(v); i += 2)
   {
       int found = 0;
       for (j = 0; j < i; j += 2)
           if (v[i] == v[j] && v[i + 1] == v[j + 1])
           {
               pat[i / 2] = pat[j / 2];
               found = 1;
           }

       if (!found)
       {
           pat[i / 2] = highpat;
           highpat++;
       }
   }

}

void makepatofpat(int* v, int len, int* pat)
{
   int i = 0, j = 0, highpat = 0;

   for (i = 0; i < len; i++)
   {
       int found = 0;
       for (j = 0; j < i; j++)
           if (v[i] == v[j])
           {
               pat[i] = pat[j];
               found = 1;
           }

       if (!found)
       {
           pat[i] = highpat;
           highpat++;
       }
   }
}

float dokappatest(const int *pat, const int size)
{
// FIXME: we say that there are 6 * 6 = 36 possibilities max. Sometimes a 10 * 10 square is used
int freq[36] = {0};
int i;
float kappa = 0.0f;

// first calculate the frequencies of the pattern
// WARNING: this function assumes no numbers are entered, so their freq is zero
for (i = 0; i < size; i++)
{
 assert(pat[i] < 36);
 freq[pat[i]]++;
}

for (i = 0; i < 36; i++)
 kappa += (freq[i] * freq[i] - freq[i]);

kappa = kappa / (size * size - size) * 26; // just 26, ignore the numbers

return kappa;
}


void checkperm(const int *p, const int size)
{
   if (p != 0)
   {
       const int len = strlen(ct);
       int i = 0, j = 0, u = 0, v = 0;
       int pos = 0;
       int minrow = len / size;
       int rest = len % size;
       int rowsatperm = minrow;

       char table[MAXCIPHER]; // FIXME: terrible waste of space
       int ctpat[MAXCIPHER];
       int revp[MAXCRIB];

       memset(table, 0, MAXCIPHER);

       for (j = 0; j < size; j++)
           revp[p[j]] = j; // reverse it, index -> perm, perm -> index

       // untranspose the cipher
       for (i = 0; i < size; i++)
       {
           rowsatperm = minrow;

           if (revp[i] < rest)
               rowsatperm++;

           for (j = 0; j < rowsatperm; j++)
           {
               table[revp[i] + j * size] = ct[pos];
               pos++;
           }
       }

       createpatternbigram(table, ctpat); // create the pattern of the bigrams. we will also use this pattern for freq. analysis

       // sweep the crib through the pattern
       for (u = 0; u < strlen(table) / 2 - strlen(crib); u++) // FIXME: remove all those strlens
       {
           int matches = 1;
           for (v = 0; v < strlen(crib); v++)
           {
               // now make a pattern of this part of the pattern of the ciphertext, because 12 13 12 14 = 0 1 0 2!
               int patofpat[MAXCRIB];
               makepatofpat(ctpat + u, strlen(crib), patofpat);

               if (patofpat[v] != cribpat[v])
               {
                   matches = 0;
                   break;
               }

           }

           if (matches) // if it matches, print the permutation
           {
   float ic = dokappatest(ctpat, strlen(table) / 2);

   if (ic > 1.60f && ic < 1.80f) // sometimes you get a kappa that's too high
   {
    results++;
    printf("Possible key found at perm:");

    for (v = 0; v < size; v++)
     printf("%d", p[v]);

    printf(", with I.C. %f\n", ic);
   }
           }
       }

   }
}

// University of Exeter algorithm
// a fast algorithm, we don't care about sorting it like a dictionary
void permute(int *v, const int start, const int n)
{
   if (start == n - 1)
   {
       checkperm(v, n);
   }
   else
   {
       int i = 0;
       for (i = start; i < n; i++)
       {
           int tmp = v[i];

           v[i] = v[start];
           v[start] = tmp;
           permute(v, start + 1, n);
           v[start] = v[i];
           v[i] = tmp;
       }
   }
}


int main(int argc, char *argv[])
{
   int beginloc = -2, endloc = -2, cribloc = -2, fileloc = -2;
   long linenum = 0;
   int theoption, except = 0, number = 0, found =0;

   int i = 0;
   int v[MAXKEYLENGTH];

   // parse the arguments
   for (i = 0; i < argc; i++)
   {
       if (argv[i][0] == '-')
       {
           if (i == beginloc || i == endloc || i == cribloc || i == fileloc)
           {
               printf("No parameter entered");
               return -1;
           }

           switch (argv[i][1])
           {
           case 'b':
               beginloc = i + 1;
               break;

           case 'e':
               endloc = i + 1;
               break;

           case 'c':
               cribloc = i + 1;
               break;

           case 'f':
               fileloc = i + 1;
               break;
           }


       }
       else
       {
           if (i == beginloc)
           {
               int res = atoi(argv[i]);
               if (res >= 2)
               {
                   begin = res;
                   beginloc = -1;

               }
               else
               {
                   printf("Wrong starting value entered");
                   return -1;
               }
           }
           else
               if (i == endloc)
               {
                   int res = atoi(argv[i]);
                   if (res >= 2) // also checks for error, yielding 0
                   {
                       end = res;
                       endloc = -1;

                   }
                   else
                   {
                       printf("Wrong ending value entered");
                       return -1;
                   }
               }
               else
                   if (i == cribloc)
                   {
                       if (strlen(argv[i]) <= MAXCRIB)
                       {
                           strcpy(crib, argv[i]);
                           cribloc = -1;

                       }
                       else
                       {
                           printf("The crib is too long");
                           return -1;
                       }
                   }
                   else
                       if (i == fileloc)
                       {
                           if (strlen(argv[i]) <= MAXFILENAME)
                           {

                               FILE *file = fopen( argv[i], "r" );


                               if ( file == 0 )
                               {
                                   printf( "Could not open file\n" );
                                   return -1;
                               }


                               fgets(ct, MAXCIPHER, file); // read the cipher, FIXME: no support for spaces, no support for multiline

                               if (ct[strlen(ct) - 1] == '\n')
                                   ct[strlen(ct) - 1] = 0;

                               if (ct[strlen(ct) - 1] == '\r') // support for Windows
                                   ct[strlen(ct) - 1] = 0;

                               fclose(file);

                               fileloc = -1;
                           }
                           else
                           {
                               printf("The filename is too long");
                               return -1;
                           }
                       }
                       else // read the ciphertext, leave blank if you've specified a file
                       {
                           if (strlen(argv[i]) <= MAXCIPHER)
                           {
                               strcpy(ct, argv[i]);

                           }
                           else
                           {
                               printf("The ciphertext is too long");
                               return -1;
                           }
                       }
       }

   }

   assert(strlen(ct) % 2 == 0);

   // create a pattern from the crib
   createpattern(crib, cribpat);

   // print some stuff
   printf("Begin: %i\n", begin);
   printf("End: %i\n", end);
   printf("Crib: %s\n", crib);

   printf("Crib pattern: ");
   for (i = 0; i < strlen(crib); i++)
       printf("%d", cribpat[i]);
   printf("\n");


   printf("Cipher: %s\n", ct);

   // create the list that will be permutated. no need to create a new sorted one in the keylength loop, because we don't care about order
   for (i = 0; i < MAXKEYLENGTH; i++)
       v[i] = i;

   for (i = begin; i < end + 1; i++)
   {
       printf("\nStarting with keylength %d.\n", i);
       permute(v, 0, i);
   }


   printf("\nSearch done. %d matches found. Press any key to exit.\n", results);

   getchar();

   return 0;

}
RRRREJMEEEEEPVKLWENFNVJKEEEEEAOLKAFKLXCFZAASDJXZTTTTTTTLSIOWJXMOKLAFJNNKFNXN
RAGRBAQEMHIGDJVDSEOXVIYCELFHWLELJFIENXLRATALSJFSLCYTKLASJDKMHGOVOKAJDNMNUITN
RRRRLJVEEEEECLYVYHNVPFTAEEEEEMWLMEIRNGLARWJAKJDFLWNTIERJMIPQWOTZEOCXKNUBNXCN
RJIRPOWEANFUSNCZVDVZNMSFEKLOEPZLDKDJWSAAAAAAAOERHJCTNCKFRIMVKSOFOMKMANREWNBN
RZUDRGXEEEEENFQIDVLQNCKNEEEEEDGLLLLLLAWIOSNCDARLODMTOEJXMILDFJROTKJSDNLVCZNN
Offline Profile Quote Post Goto Top
 
Revelation
Member Avatar
Administrator
[ *  *  *  *  * ]
Okay, 1.60 and 1.80 is a bit rough. I now have 1.50 and 1.90.

With this ciphertext:
VDVDDXDXVDDDXGGDDFXDVGVGDDFFDVXXXVDADAGDXXDDGDDXDXDAGVXDVAFDDVVGXDDGDADAFVXVAVGXXGDDDGDGVADADDVGDGXDXGDGAXAFDFDGXGAAVDXDXADXVDDFDDFXXDDDDDGGVVXDAXVGVGXGFVXGAVAGXDVFDXGXXGDXXGGDVDVDAFVDXGXGVAGDVDDXVAGDVXDADVGAGDXVGGAXFDDADXVGXFXAGDFAGGVAFADGDXDAGVDDDVFFXGDFDDVDXXXVFVXDDDGGDDVGGDDDDXGVVFXDFDXGVGXXDVGDXADDDXGGVDDDVX

I get only one results, and it's correct! :D
RRRREJMEEEEEPVKLWENFNVJKEEEEEAOLKAFKLXCFZAASDJXZTTTTTTTLSIOWJXMOKLAFJNNKFNXN
RAGRBAQEMHIGDJVDSEOXVIYCELFHWLELJFIENXLRATALSJFSLCYTKLASJDKMHGOVOKAJDNMNUITN
RRRRLJVEEEEECLYVYHNVPFTAEEEEEMWLMEIRNGLARWJAKJDFLWNTIERJMIPQWOTZEOCXKNUBNXCN
RJIRPOWEANFUSNCZVDVZNMSFEKLOEPZLDKDJWSAAAAAAAOERHJCTNCKFRIMVKSOFOMKMANREWNBN
RZUDRGXEEEEENFQIDVLQNCKNEEEEEDGLLLLLLAWIOSNCDARLODMTOEJXMILDFJROTKJSDNLVCZNN
Offline Profile Quote Post Goto Top
 
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
I'm not sure why you're setting a top limit on the IC I've been using as a test text a very short message with a lot of repeated letters - "attack at dawn". As configured, your program does not find the correct solution. When I remove the upper bound, it does, and reports the correct solution as having an IC of 3.5.

OTOH, with the upper limit removed, for larger key sizes it's finding a lot of false positives.
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post Goto Top
 
Revelation
Member Avatar
Administrator
[ *  *  *  *  * ]
Quote:
 
OTOH, with the upper limit removed, for larger key sizes it's finding a lot of false positives.


Yes, that's why I initially made an upper limit.

I'm now working on a simple monosub guesser, it will be done very soon.
RRRREJMEEEEEPVKLWENFNVJKEEEEEAOLKAFKLXCFZAASDJXZTTTTTTTLSIOWJXMOKLAFJNNKFNXN
RAGRBAQEMHIGDJVDSEOXVIYCELFHWLELJFIENXLRATALSJFSLCYTKLASJDKMHGOVOKAJDNMNUITN
RRRRLJVEEEEECLYVYHNVPFTAEEEEEMWLMEIRNGLARWJAKJDFLWNTIERJMIPQWOTZEOCXKNUBNXCN
RJIRPOWEANFUSNCZVDVZNMSFEKLOEPZLDKDJWSAAAAAAAOERHJCTNCKFRIMVKSOFOMKMANREWNBN
RZUDRGXEEEEENFQIDVLQNCKNEEEEEDGLLLLLLAWIOSNCDARLODMTOEJXMILDFJROTKJSDNLVCZNN
Offline Profile Quote Post Goto Top
 
Revelation
Member Avatar
Administrator
[ *  *  *  *  * ]
Okay, I've added a simple substitution solver. The results are quite disappointing. Most texts I've tried are far from the original. Maybe I have made a mistake in my algorithm.

Code:
 

// By Ben Ruyl

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <assert.h>

#define MAXKEYLENGTH 30
#define MAXCRIB 30
#define MAXCIPHER 8000
#define MAXFILENAME 128

char crib[MAXCRIB];
int cribpat[MAXCRIB];
char ct[MAXCIPHER];

const char engfreq[] = "etaoinshrdlucmfgypwbvkxjqz";
int freq[36];

int begin = 1; // standard cipher length region, if not specified
int end = 12;

int results = 0; // count the matches

void createpattern(char* v, int* pat)
{
   int i = 0, j = 0, highpat = 0;

   for (i = 0; i < strlen(v); i++)
   {
       int found = 0;
       for (j = 0; j < i; j++)
           if (v[i] == v[j])
           {
               pat[i] = pat[j];
               found = 1;
           }

       if (!found)
       {
           pat[i] = highpat;
           highpat++;
       }
   }
}

void createpatternbigram(char* v, int* pat)
{
   int i = 0, j = 0, highpat = 0;

   assert(strlen(v) % 2 == 0);

   for (i = 0; i < strlen(v); i += 2)
   {
       int found = 0;
       for (j = 0; j < i; j += 2)
           if (v[i] == v[j] && v[i + 1] == v[j + 1])
           {
               pat[i / 2] = pat[j / 2];
               found = 1;
           }

       if (!found)
       {
           pat[i / 2] = highpat;
           highpat++;
       }
   }

}

void makepatofpat(int* v, int len, int* pat)
{
   int i = 0, j = 0, highpat = 0;

   for (i = 0; i < len; i++)
   {
       int found = 0;
       for (j = 0; j < i; j++)
           if (v[i] == v[j])
           {
               pat[i] = pat[j];
               found = 1;
           }

       if (!found)
       {
           pat[i] = highpat;
           highpat++;
       }
   }
}

void calcfreq(const int *pat, const int size)
{
int i;

memset(freq, 0, 36 * sizeof(int));

// calculate the frequencies of the pattern
// WARNING: this function assumes no numbers are entered, so their freq is zero
for (i = 0; i < size; i++)
{
 assert(pat[i] < 36);
 freq[pat[i]]++;
}
}

float dokappatest(const int size)
{
// FIXME: we say that there are 6 * 6 = 36 possibilities max. Sometimes a 10 * 10 square is used
int i;
float kappa = 0.0f;

for (i = 0; i < 36; i++)
 kappa += (freq[i] * freq[i] - freq[i]);

kappa = kappa / (size * size - size) * 26; // just 26, ignore the numbers

return kappa; // optimisation: we could leave the * 26 out
}

// sort from high to low
int compfreq(const void *elem1, const void *elem2)
{
return freq[*(int*)elem2] - freq[*(int*)elem1];
}

void guessmonosub(const int *p, const int size, char* guess)
{
int psorted[36];
int i, j;

memset(guess, 0, MAXCIPHER);

// copy the indices into a temp array
for (i = 0; i < 36; i++)
 psorted[i] = i;

// sort the indices, by checking the frequencies of the number
// then we can just lay the eng freq on top of the array
qsort(psorted, 36, sizeof(int), compfreq);

// now construct the guess
for (i = 0; i < 26; i++)
 for(j = 0; j < size; j++)
  if (p[j] == psorted[i])
   guess[j] = engfreq[i];

printf("Guess: %s\n", guess);
}


void checkperm(const int *p, const int size)
{
   if (p != 0)
   {
       const int len = strlen(ct);
       int i = 0, j = 0, u = 0, v = 0;
       int pos = 0;
       int minrow = len / size;
       int rest = len % size;
       int rowsatperm = minrow;

       char table[MAXCIPHER]; // FIXME: terrible waste of space
       int ctpat[MAXCIPHER];
       int revp[MAXCRIB];

       memset(table, 0, MAXCIPHER);

       for (j = 0; j < size; j++)
           revp[p[j]] = j; // reverse it, index -> perm, perm -> index

       // untranspose the cipher
       for (i = 0; i < size; i++)
       {
           rowsatperm = minrow;

           if (revp[i] < rest)
               rowsatperm++;

           for (j = 0; j < rowsatperm; j++)
           {
               table[revp[i] + j * size] = ct[pos];
               pos++;
           }
       }

       createpatternbigram(table, ctpat); // create the pattern of the bigrams. we will also use this pattern for freq. analysis

       // sweep the crib through the pattern
       for (u = 0; u < strlen(table) / 2 - strlen(crib); u++) // FIXME: remove all those strlens
       {
           int matches = 1;
           for (v = 0; v < strlen(crib); v++)
           {
               // now make a pattern of this part of the pattern of the ciphertext, because 12 13 12 14 = 0 1 0 2!
               int patofpat[MAXCRIB];
               makepatofpat(ctpat + u, strlen(crib), patofpat);

               if (patofpat[v] != cribpat[v])
               {
                   matches = 0;
                   break;
               }

           }

           if (matches) // if it matches, print the permutation
  {
   char guess[MAXCIPHER]; // wasting memory
   float ic = 0.0f;
   int textlen = strlen(table) / 2;

   calcfreq(ctpat, textlen); // calculate the letter frequency, used for kappa test and mono sub decryption

   ic = dokappatest(textlen);

   if (ic > 1.50f && ic < 1.90f) // sometimes you get a kappa that's too high
   {
    results++;
    printf("Possible key found at perm:");

    for (v = 0; v < size; v++)
     printf("%d", p[v]);

    printf(", with I.C. %f\n", ic);
   
    // now guess the monosub
    guessmonosub(ctpat, textlen, guess);
    //printf("Guess: %s", guess);
   }
 
   break; // only count patterns once, maybe count them in the future
           }
       }

   }
}

// University of Exeter algorithm
// a fast algorithm, we don't care about sorting it like a dictionary
void permute(int *v, const int start, const int n)
{
   if (start == n - 1)
   {
       checkperm(v, n);
   }
   else
   {
       int i = 0;
       for (i = start; i < n; i++)
       {
           int tmp = v[i];

           v[i] = v[start];
           v[start] = tmp;
           permute(v, start + 1, n);
           v[start] = v[i];
           v[i] = tmp;
       }
   }
}

//strip spaces
void stripspaces(char *text)
{
int i = 0, newpos = 0;
int oldlen = strlen(text);

for (i = 0; i < oldlen; i++)
 if (text[i] != ' ')
 {
  text[newpos] = text[i];
  newpos++;
 }

for (i = newpos; i < oldlen; i++)
 text[i] = 0;
}


int main(int argc, char *argv[])
{
   int beginloc = -2, endloc = -2, cribloc = -2, fileloc = -2;
   long linenum = 0;
   int theoption, except = 0, number = 0, found = 0;

   int i = 0, newpos = 0;
   int v[MAXKEYLENGTH];

   // parse the arguments
   for (i = 0; i < argc; i++)
   {
       if (argv[i][0] == '-')
       {
           if (i == beginloc || i == endloc || i == cribloc || i == fileloc)
           {
               printf("No parameter entered");
               return -1;
           }

           switch (argv[i][1])
           {
           case 'b':
               beginloc = i + 1;
               break;

           case 'e':
               endloc = i + 1;
               break;

           case 'c':
               cribloc = i + 1;
               break;

           case 'f':
               fileloc = i + 1;
               break;
           }


       }
       else
       {
           if (i == beginloc)
           {
               int res = atoi(argv[i]);
               if (res >= 2)
               {
                   begin = res;
                   beginloc = -1;

               }
               else
               {
                   printf("Wrong starting value entered");
                   return -1;
               }
           }
           else
               if (i == endloc)
               {
                   int res = atoi(argv[i]);
                   if (res >= 2) // also checks for error, yielding 0
                   {
                       end = res;
                       endloc = -1;

                   }
                   else
                   {
                       printf("Wrong ending value entered");
                       return -1;
                   }
               }
               else
                   if (i == cribloc)
                   {
                       if (strlen(argv[i]) <= MAXCRIB)
                       {
                           strcpy(crib, argv[i]);
                           cribloc = -1;

                       }
                       else
                       {
                           printf("The crib is too long");
                           return -1;
                       }
                   }
                   else
                       if (i == fileloc)
                       {
                           if (strlen(argv[i]) <= MAXFILENAME)
                           {
                               FILE *file = fopen( argv[i], "r" );

                               if ( file == 0 )
                               {
                                   printf( "Could not open file\n" );
                                   return -1;
                               }


                               fgets(ct, MAXCIPHER, file); // read the cipher, FIXME: no support for spaces, no support for multiline

                               if (ct[strlen(ct) - 1] == '\n')
                                   ct[strlen(ct) - 1] = 0;

                               if (ct[strlen(ct) - 1] == '\r') // support for Windows
                                   ct[strlen(ct) - 1] = 0;

                               fclose(file);

                               fileloc = -1;
                           }
                           else
                           {
                               printf("The filename is too long");
                               return -1;
                           }
                       }
                       else // read the ciphertext, leave blank if you've specified a file
                       {
                           if (strlen(argv[i]) <= MAXCIPHER)
                           {
                               strcpy(ct, argv[i]);

                           }
                           else
                           {
                               printf("The ciphertext is too long");
                               return -1;
                           }
                       }
       }

   }

stripspaces(ct);
   assert(strlen(ct) % 2 == 0);

   // create a pattern from the crib
   createpattern(crib, cribpat);

   // print some stuff
   printf("Begin: %i\n", begin);
   printf("End: %i\n", end);
   printf("Crib: %s\n", crib);

   printf("Crib pattern: ");
   for (i = 0; i < strlen(crib); i++)
       printf("%d", cribpat[i]);
   printf("\n");


   printf("Cipher: %s\n", ct);

   // create the list that will be permutated. no need to create a new sorted one in the keylength loop, because we don't care about order
   for (i = 0; i < MAXKEYLENGTH; i++)
       v[i] = i;

   for (i = begin; i < end + 1; i++)
   {
       printf("\nStarting with keylength %d.\n", i);
       permute(v, 0, i);
   }


   printf("\nSearch done. %d matches found. Press any key to exit.\n", results);

   getchar();

   return 0;

}


edit: small update: support for spaces in the file
RRRREJMEEEEEPVKLWENFNVJKEEEEEAOLKAFKLXCFZAASDJXZTTTTTTTLSIOWJXMOKLAFJNNKFNXN
RAGRBAQEMHIGDJVDSEOXVIYCELFHWLELJFIENXLRATALSJFSLCYTKLASJDKMHGOVOKAJDNMNUITN
RRRRLJVEEEEECLYVYHNVPFTAEEEEEMWLMEIRNGLARWJAKJDFLWNTIERJMIPQWOTZEOCXKNUBNXCN
RJIRPOWEANFUSNCZVDVZNMSFEKLOEPZLDKDJWSAAAAAAAOERHJCTNCKFRIMVKSOFOMKMANREWNBN
RZUDRGXEEEEENFQIDVLQNCKNEEEEEDGLLLLLLAWIOSNCDARLODMTOEJXMILDFJROTKJSDNLVCZNN
Offline Profile Quote Post Goto Top
 
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
I'm wondering what else we could use to score proposed transpositions. Digram counts? Trigram counts?
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post Goto Top
 
Donald
Elite member
[ *  *  *  *  * ]
Quote:
 
I'm wondering what else we could use to score proposed transpositions. Digram counts? Trigram counts?

I was GOING to suggest that trigram frequencies were probably too flat to be of any use, but this table seems to imply they might be very interesting:
http://home.ccil.org/~cowan/trigrams
That spike at THE should be VERY detectable in ordinary text. Even better than the digram spike at TH.
Offline Profile Quote Post Goto Top
 
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
Donald
May 15 2008, 12:18 PM
I was GOING to suggest that trigram frequencies were probably too flat to be of any use, but this table seems to imply they might be very interesting:
http://home.ccil.org/~cowan/trigrams
That spike at THE should be VERY detectable in ordinary text.  Even better than the digram spike at TH.

The problem with "THE" is that it's preponderance is only in ordinary text, and there are many common texts whose use of "THE" is far from ordinary.

Have you looked at the frequency tables in FM 34-40-2? They are derived from telegraph traffic, instead of from novels, and the top digraph and trigraph are "EN" and "ENT". And neither "spikes" - both are only slightly more frequent than the second-place n-gram.

http://www.umich.edu/~umich/fm-34-40-2/appa.pdf
http://www.umich.edu/~umich/fm-34-40-2/appb.pdf
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post Goto Top
 
Revelation
Member Avatar
Administrator
[ *  *  *  *  * ]
Quote:
 
I'm wondering what else we could use to score proposed transpositions.

So the question is: what else is typical for an English monosub except single letter frequencies?

Is there a kappa test for bigrams?
RRRREJMEEEEEPVKLWENFNVJKEEEEEAOLKAFKLXCFZAASDJXZTTTTTTTLSIOWJXMOKLAFJNNKFNXN
RAGRBAQEMHIGDJVDSEOXVIYCELFHWLELJFIENXLRATALSJFSLCYTKLASJDKMHGOVOKAJDNMNUITN
RRRRLJVEEEEECLYVYHNVPFTAEEEEEMWLMEIRNGLARWJAKJDFLWNTIERJMIPQWOTZEOCXKNUBNXCN
RJIRPOWEANFUSNCZVDVZNMSFEKLOEPZLDKDJWSAAAAAAAOERHJCTNCKFRIMVKSOFOMKMANREWNBN
RZUDRGXEEEEENFQIDVLQNCKNEEEEEDGLLLLLLAWIOSNCDARLODMTOEJXMILDFJROTKJSDNLVCZNN
Offline Profile Quote Post Goto Top
 
Donald
Elite member
[ *  *  *  *  * ]
"Jdege"
 
The problem with "THE" is that it's preponderance is only in ordinary text, and there are many common texts whose use of "THE" is far from ordinary

VERY good point. Which, as Revelation says, brings us back to our original problem. What tools ARE useful if you aren't certain of the nature of the plain text?

I'm still wondering if the trigram analysis might be of any use. Even without any obvious spikes, it seems like trigram frequencies should have much more of a slope to them when you have discovered the actual plain text, as opposed to the more random pattern of an incorrect solution.
Offline Profile Quote Post Goto Top
 
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
The only thing I can suggest is to try it on much larger texts, where the statistics are more likely to be clear.
Offline Profile Quote Post Goto Top
 
Revelation
Member Avatar
Administrator
[ *  *  *  *  * ]
Fortunately, the cracker doesn't give many false positives, so solving the monosubs can be done by hand.

The thing my code doesn't do yet, is placing the crib in the text. It checks for the pattern of the crib, but not for the crib itself. It may contribute to getting a better result.
RRRREJMEEEEEPVKLWENFNVJKEEEEEAOLKAFKLXCFZAASDJXZTTTTTTTLSIOWJXMOKLAFJNNKFNXN
RAGRBAQEMHIGDJVDSEOXVIYCELFHWLELJFIENXLRATALSJFSLCYTKLASJDKMHGOVOKAJDNMNUITN
RRRRLJVEEEEECLYVYHNVPFTAEEEEEMWLMEIRNGLARWJAKJDFLWNTIERJMIPQWOTZEOCXKNUBNXCN
RJIRPOWEANFUSNCZVDVZNMSFEKLOEPZLDKDJWSAAAAAAAOERHJCTNCKFRIMVKSOFOMKMANREWNBN
RZUDRGXEEEEENFQIDVLQNCKNEEEEEDGLLLLLLAWIOSNCDARLODMTOEJXMILDFJROTKJSDNLVCZNN
Offline Profile Quote Post Goto Top
 
rot13
Elite member
[ *  *  *  *  * ]
I did write some code to crack ADFGVX back when someone had posted on in a challenge. My attack was to figure out the transposition first by brute force - I just tried all the transpositions up to a particular key length. Then I computed the IC of the resulting pairs. I spit out the resulting un-transposed text, which I could then attack with my shotgun-hillclimbing substitution solver. I'm pretty sure that's what I did, at least. I have the "untramp" program somewhere, I wrote it in D at the time. I can dig it up if anyone is interested.

[Update]
Upon further review, what I was able to come up with was the correct number of columns, but there was still some transposition to deal with on top of substitution. It still seems pretty reasonable that you could do the transposition + substitution with shotgun-hillclimbing.
Edited by rot13, May 30 2008, 06:18 PM.
Offline Profile Quote Post Goto Top
 
rot13
Elite member
[ *  *  *  *  * ]
I am a bit rusty at this, I haven't done any cryptogram stuff in over a year, I even let my ACA subscription lapse. I've just had other priorities. The problem with my original solution was that while I was able to guess the key length by IC, I still had to deal with a substitution and transposition. It occurred to me that computing a digraphic ic as a tiebreaker might help, but it didn't in the case of Loki's challenge, since the key length was 8. When I changed it to a trigraphic IC, however, it was able to resolve the last part of the transposition as well, leaving me with a simple substitution.

Let me explain a little more how this works. When you first encode the ciphertext with the polybius square, you still just have a simple substitution cipher. It is the transposition that does the fractionating. If you brute force the transpositions, you can test a transposition by looking at the resulting IC. According to Wikipedia, though, the transposition keys were typically two dozen characters long, which is not practical for my solution.

So when I try each possible transposition, I just compute the IC of each resulting pair. When the transposition is somewhat correct (i.e. when the correct pairs line up, although the pairs may be transformed) the IC should be around the IC for english. There will be several transpositions that yield the best IC, since the pairs themselves may still be transposed. The trigraphic IC helps straighten that out. I have attached the D program I wrote a while back, with modifications to do trigraphic IC and also to spit out the letters both as pairs, and also converted to a monoalphabetic substitution. It's not pretty, but it works. I took Loki's ADFGVX challenge, ran it through this program, and then took the mono result and ran it through my shotgun-hillclimbing program and I had the answer within a few minutes.
Attached to this post:
Attachments: adfgvx_untramp3.d (5.38 KB)
Offline Profile Quote Post Goto Top
 
Revelation
Member Avatar
Administrator
[ *  *  *  *  * ]
Thanks rot13 :)

I'll have a look at it.
RRRREJMEEEEEPVKLWENFNVJKEEEEEAOLKAFKLXCFZAASDJXZTTTTTTTLSIOWJXMOKLAFJNNKFNXN
RAGRBAQEMHIGDJVDSEOXVIYCELFHWLELJFIENXLRATALSJFSLCYTKLASJDKMHGOVOKAJDNMNUITN
RRRRLJVEEEEECLYVYHNVPFTAEEEEEMWLMEIRNGLARWJAKJDFLWNTIERJMIPQWOTZEOCXKNUBNXCN
RJIRPOWEANFUSNCZVDVZNMSFEKLOEPZLDKDJWSAAAAAAAOERHJCTNCKFRIMVKSOFOMKMANREWNBN
RZUDRGXEEEEENFQIDVLQNCKNEEEEEDGLLLLLLAWIOSNCDARLODMTOEJXMILDFJROTKJSDNLVCZNN
Offline Profile Quote Post Goto Top
 
1 user reading this topic (1 Guest and 0 Anonymous)
Go to Next Page
« Previous Topic · General · Next Topic »
Add Reply