|
rot13
|
May 25 2006, 12:30 PM
Post #1
|
- Posts:
- 306
- Group:
- Trusted Member
- Member
- #14
- Joined:
- September 29, 2005
|
At first I was gonna leave this in the post where Loki was asking about the straddling checkerboard, then I decided to make it a separate post.
- loki
- May 25 2006, 10:29 AM
I would really like to see how you break the straddling checkerboards, #6, 7
I haven't tried #7 because I haven't felt up to attacking a double transposition, but #6 is actually quite easy to break. You just need to determine what the two digits are used for 2-digit numbers. After that, you can just convert the text to a regular substitution and solve it.
Here is a D program that uses Index of Coincidence to determine the best two first digits:
- Code:
-
import std.stdio; import std.stream; import std.math;
char[] best_result; double best_ic = 0; int best_depth;
char[] load_numbers(char[] filename) { char[] chars_in = cast(char[])std.file.read(filename); char[] chars_out;
foreach (ch; chars_in) { if ((ch >= '0') && (ch <= '9')) { chars_out ~= (ch-'0'); } } return chars_out; }
double compute_ic(char[] letters) { int counts[26] = 0; int all_counts;
foreach (ch; letters) { counts[ch-'A']++; all_counts++; }
int sum = 0; for (int i=0; i < 26; i++) { sum += counts[i] * (counts[i]-1); }
return (cast(double) sum) / (cast(double) (all_counts*(all_counts-1))); }
int main(char[][] args) { char[] letters = load_numbers(args[1]); double ic, margin, best_ic, best_margin; char[] best_result;
best_ic = 0; best_margin=99999.9;
// Try possible first digits from 0 to 8 for (char dig1=0; dig1 <= 8; dig1++) {
// Try possible second digits from dig1 to 9 NextDig: for (char dig2=dig1+1; dig2 <= 9; dig2++) { char assigned[3][10] = 0;
char next_let = 'A'; char[] result;
for (int i=0; i < letters.length; i++) { char ch = letters[i]; if (ch == dig1) { // If this was the last digit in the text // then dig1 can't be a first digit if (i == letters.length -1) continue NextDig;
// Get the second digit ch = letters[++i];
// If this digit pair hasn't been assigned // a letter yet, assign it if (!assigned[1][ch]) { // If all the letters have been assigned already // this particular pair of first digits can't // be right if (next_let > 'Z') continue NextDig;
assigned[1][ch] = next_let++; } result ~= assigned[1][ch]; } else if (ch == dig2) { // If this was the last digit in the text // then dig2 can't be a first digit if (i == letters.length -1) continue NextDig;
// Get the second digit ch = letters[++i];
// If this digit pair hasn't been assigned // a letter yet, assign it if (!assigned[2][ch]) { // If all the letters have been assigned already // this particular pair of first digits can't // be right if (next_let > 'Z') continue NextDig; assigned[2][ch] = next_let++; } result ~= assigned[2][ch]; } else { // If this single digit hasn't been assigned // a letter yet, assign it if (!assigned[0][ch]) { // If all the letters have been assigned already // this particular pair of first digits can't // be right if (next_let > 'Z') continue NextDig; assigned[0][ch] = next_let++; } result ~= assigned[0][ch]; } }
// Compute the IC for the resulting text ic = compute_ic(result); writefln("%d, %d, ic = %f\n", dig1, dig2, ic);
// Look for the IC that comes closest to the norm for english margin = abs(0.066 - ic); if (margin < best_margin) { best_margin = margin; best_ic = ic; best_result = result; } } } writefln(best_result);
return 0; }
When I ran it, this is the text I got (minus the printouts about the IC for each pair):
- Code:
-
AABCD AEFGH AIFJA FJHAB CDAKG LAAAF AKGLA EMBAB CDAEF NDACD FHAOK AFAEF GHAIF JAPCO ADJHD FQONM JRABO AHDSB NOLMB ARFQD ACMIS DGKAF ACDFQ LASGF TUAAD SVFTM JRABC DAKGL ASFMH AIOVW MJRGL LOXAP COACF QDAPM SCDHA BOAND QDJRD ADQDJ APMBC AHDFB CABCD ATNMV WAOKA FBMJL AMJSD VBASD DAPCF BALOX ACFQD AHOJD ABOAL OXNSD GKABO AFHHA MJSXG BABOM JYXNL AABCD AEFGH AIFJA NDTGM DHAMA VFJAD FSMGL AIFWD ATDFV DAPMB CILSD GKAED VFXSD AMAWJ OPABC DNDAP FSAJO AMJBD JBMOJ ABOAC XNBUA AEXBA LOXFJ AMGGK FQOND HAFJH AVOJB DITBM EGDAM JSDVB APCOA HDGMR CBSAM JASXV WMJRC XIFJA EGOOH AMAPM SCABC FBAMA VOXGH ACFQD AWMGG DHALO XADQD JAMKA MACFH MJVXN NDHAF ACDFQ MDNAT DJFGB LUNDQ DJRDA PMGGA CXNBA BCDAF QDJRD NA
Just for the purpose of posting, I broke the above into 5 character blocks. My program spits it out as one long string, which you'll see in a moment was actually a good thing. When I do a frequency count on it, I get this:
- Code:
-
577 Total Letters A:0.1906 D:0.1040 F:0.0711 B:0.0624 M:0.0624 J:0.0572 C:0.0520 O:0.0485 G:0.0433 H:0.0381 S:0.0312 N:0.0295 L:0.0277 X:0.0260 Q:0.0225 V:0.0208 P:0.0191 K:0.0173 E:0.0156 I:0.0156 R:0.0156 T:0.0121 W:0.0104 U:0.0052 Y:0.0017 Z:0.0000
Now the funky thing here is that A occurs way more frequently than the E does in English. In fact, it occurs with roughly the same frequence as a blank in English. I also notice that the relative frequency of D and F seem like they could represent E and T if A actually is a blank. So I changed all the A's to blanks and ended up with this text:
- Code:
-
BCD EFGH IFJ FJH BCD KGL F KGL EMB BCD EFND CDFH OK F EFGH IFJ PCO DJHDFQONMJR BO HDSBNOLMB RFQD CMISDGK F CDFQL SGFTU DSVFTMJR BCD KGL SFMH IOVWMJRGLLOX PCO CFQD PMSCDH BO NDQDJRD DQDJ PMBC HDFBC BCD TNMVW OK FBMJL MJSDVB SDD PCFB LOX CFQD HOJD BO LOXNSDGK BO FHH MJSXGB BOMJYXNL BCD EFGH IFJ NDTGMDH M VFJ DFSMGL IFWD TDFVD PMBCILSDGK EDVFXSD M WJOP BCDND PFS JO MJBDJBMOJ BO CXNBU EXB LOXFJ MGGKFQONDH FJH VOJBDITBMEGD MJSDVB PCO HDGMRCBS MJ SXVWMJRCXIFJ EGOOH M PMSC BCFB M VOXGH CFQD WMGGDH LOX DQDJ MK M CFHMJVXNNDH F CDFQMDN TDJFGBLUNDQDJRD PMGG CXNB BCD FQDJRDN
The frequency count for this was:
- Code:
-
467 Total Letters D:0.1285 F:0.0878 B:0.0771 M:0.0771 J:0.0707 C:0.0642 O:0.0600 G:0.0535 H:0.0471 S:0.0385 N:0.0364 L:0.0343 X:0.0321 Q:0.0278 V:0.0257 P:0.0236 K:0.0214 E:0.0193 I:0.0193 R:0.0193 T:0.0150 W:0.0128 U:0.0064 Y:0.0021 A:0.0000 Z:0.0000
That looks much more in line with English. Although statistically F looks like it should be T, it seems like BCD might be THE. When I look at the counts of how often various letters follow each other, I see that C comes before D 12 times, yet C is not as high a frequency letter. Also, B, the third most frequent letter, comes before C 12 times. There are no other letters that have anywhere near that much contact with C, so I am pretty sure BCD is THE. The rest of it is pretty easy, so I don't think I need to go into it.
|