|
Another technique for identifying vowels
|
|
Topic Started: Jun 5 2008, 09:58 PM (219 Views)
|
|
jdege
|
Jun 5 2008, 09:58 PM
Post #1
|
- Posts:
- 263
- Group:
- Admin
- Member
- #246
- Joined:
- December 7, 2006
|
It's been quiet for a while, so I thought I'd post code for another technique for identifying vowels.
This showed up in a discussion in the ACA group on MSN. The guy who offered it (CryptoRat) had been using it for decades, and had no idea who had originated it.
I took his idea, wrote some code, and tried it out. It did astoundingly well.
Testing on 1000 runs of ""fortune -l" from bsdgames, CryptoRat's approach (as implemented my somewhat debuggified code), correctly identified all four high-frequency vowels 593 times. It identified three or more of them 880 times, it identified two or more of them 938 times.
I'd call this a very promising technique.
More - of the 287 instances in which this identified three of the four high-frequency vowels, 117 times the fourth letter found was 'U'. So you could say that it found four vowels 710 times. (Next most common fourth letters were 'H' and 'T', 40 and 39 times.)
- Code:
-
//[space]FILE:[space]ccluster.cpp #if[space]0 From[space]CryptoRat,[space]on[space]groups.msn.com/Cryptogram[space](ACA[space]forum)
From:[space]MSN[space]NicknameCryptoRat[space][space]in[space]response[space]to[space]Message[space]1[space][space][space][space]Sent:[space]5/17/2008[space]11:46[space]AM I[space]find[space]the[space]most[space]reliable[space]system[space]is[space]a[space]statistical[space]test[space]best[space]done[space]by[space]computers,[space]but[space]it[space]can[space]be[space]done[space]by[space]hand.[space]You[space]have[space]to[space]take[space]every[space]group[space]of[space]4[space]letters[space]that[space]appear[space]in[space]the[space]con[space](but[space]if[space]you[space]start[space]with[space]the[space]4[space]most[space]frequent[space]and[space]work[space]your[space]way[space]down[space]you[space]can[space]usually[space]identify[space]the[space]best[space]group[space]pretty[space]early[space]on[space]and[space]get[space]at[space]least[space]two[space]or[space]three[space]vowels[space]right[space]very[space]quickly).
For[space]each[space]group[space]count[space]the[space]number[space]of[space]spaces[space]from[space]each[space]of[space]those[space]letters[space]to[space]the[space]next[space]one[space]that[space]appears[space]in[space]the[space]con,[space]then[space]square[space]that[space]number[space]and[space]add[space]it[space]to[space]the[space]running[space]total.[space]The[space]group[space]of[space]4[space]with[space]the[space]lowest[space]total[space]is[space]usually[space]all[space]vowels[space]or[space]contains[space]three[space]vowels[space]and[space]maybe[space]an[space]H.
Example: Now[space]is[space]the[space]time[space]for[space]all[space]good[space]men[space]to[space]come[space]to[space]the[space]aid[space]of[space]their[space]country.
If[space]you[space]are[space]testing[space]the[space]group[space]OAEI: start[space]from[space]the[space]beginning[space]and[space]count[space]the[space]distance[space]between[space]the[space]O[space]in[space]NOW[space]to[space]the[space]I[space]in[space]IS[space](2[space]squared[space]=[space]4)[space]then[space]from[space]that[space]I[space]to[space]the[space]E[space]in[space]THE[space](4[space]*[space]4[space]=[space]16[space]added[space]to[space]4[space]=[space]20),[space]then[space]from[space]that[space]E[space]to[space]the[space]I[space]in[space]TIME,[space]etc.[space]In[space]this[space]example[space]the[space]group[space]AEIO[space]scores[space]lower[space]than[space]any[space]other[space]combination.[space]The[space]test[space]isn't[space]100%[space]accurate,[space]so[space]I[space]keep[space]at[space]least[space]the[space]5[space]best[space](i.e.[space]lowest)[space]scoring[space]groups[space]to[space]explore.[space]In[space]my[space]computer[space]program[space]I[space]modified[space]the[space]test[space]to[space]prevent[space]certain[space]anomalies.[space]For[space]example,[space]if[space]the[space]group[space]produces[space]three[space]in[space]a[space]row,[space]I[space]add[space]a[space]large[space]number,[space]since[space]three[space]vowels[space]in[space]a[space]row[space]are[space]unusual,[space]even[space]though[space]the[space]score[space]would[space]be[space]1*1[space]+[space]1*1[space]=[space]2.
#endif
#if[space]0
---------------------------------- From:[space]CryptoRat Message[space]5[space]in[space]Discussion
I[space]can't[space]take[space]credit[space]for[space]the[space]original[space]idea.[space][space]I[space]read[space]it[space]somewhere, although[space]I[space]can't[space]remember[space]where.[space][space]I[space]recently[space]looked[space]at[space]my[space]code[space]in[space]my interactive[space]solver[space](written[space]in[space]QuickBasic)[space]and[space]saw[space]that[space]I[space]had[space]comments referring[space]to[space]the[space]same[space]routine[space]in[space]Applesoft[space]Basic,[space]so[space]that[space]means[space]I[space]coded it[space]back[space]before[space]I[space]switched[space]from[space]my[space]Apple[space]][e[space]to[space]my[space]DOS[space]machine,[space]whenever that[space]was.[space][space]Early[space]or[space]mid-80s,[space]I[space]think.[space][space]A[space]good[space]guess[space]might[space]be[space]Caxton Foster's[space]Cryptanalysis[space]for[space]Microcomputers,[space]or[space]it[space]could[space]have[space]been[space]in[space]the ACA[space]Computer[space]Supplement.[space][space]I[space]did[space]tweak[space]the[space]code[space]to[space]account[space]for[space]3-vowel clusters[space]and[space]maybe[space]something[space]else[space]minor[space]like[space]that.[space][space]It's[space]a[space]good[space]algorithm.
#endif
#include[space]<stdio.h> #include[space]<stdlib.h> #include[space]<ctype.h> #include[space]<time.h>
#include[space]<string> #include[space]<iostream> #include[space]<fstream>
#include[space]<iterator> #include[space]<vector> #include[space]<map>
using[space]namespace[space]std;
#include[space]"textbuffer.hpp" #include[space]"xforms.hpp"
long[space]consonantClusterCount(const[space]TextBuffer[space]&in);
int[space]main(int[space]argc,[space]char[space]*argv[]) { [space][space][space][space]TextBuffer[space]inText; [space][space][space][space]if[space](argc[space]<=[space]1) [space][space][space][space][space][space][space][space]inText.load(cin); [space][space][space][space]else [space][space][space][space]{ [space][space][space][space][space][space][space][space]for[space](int[space]i[space]=[space]1;[space]i<argc;[space]i++) [space][space][space][space][space][space][space][space][space][space][space][space]inText.load(argv[i]); [space][space][space][space]}
[space][space][space][space]consonantClusterCount(toUpperAlpha(inText));
[space][space][space][space]return[space]EXIT_SUCCESS; }
long[space]consonantClusterCount(const[space]TextBuffer[space]&in) { [space][space][space][space]map<char,bool>[space]have; [space][space][space][space] [space][space][space][space]for[space](char[space]*ch[space]=[space]"ABCDEFGHIJKLMNOPQRSTUVWXYZ";[space]*ch;[space]ch++) [space][space][space][space][space][space][space][space]have[*ch][space]=[space]false;
[space][space][space][space]for[space](vector<char>::const_iterator[space]iter[space]=[space]in.begin();[space]iter[space]!=[space]in.end();[space]++iter) [space][space][space][space][space][space][space][space]have[*iter][space]=[space]true;
[space][space][space][space]char[space]saved[6]; [space][space][space][space]long[space]maxTotal[space]=[space]-1;
[space][space][space][space]for[space](int[space]v1='A';[space]v1<='Z';[space]v1++) [space][space][space][space]{ [space][space][space][space][space][space][space][space]if[space](!have[v1]) [space][space][space][space][space][space][space][space][space][space][space][space]continue; [space][space][space][space][space][space][space][space]for[space](int[space]v2=v1+1;[space]v2<='Z';[space]v2++) [space][space][space][space][space][space][space][space]{ [space][space][space][space][space][space][space][space][space][space][space][space]if[space](!have[v2]) [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]continue; [space][space][space][space][space][space][space][space][space][space][space][space]for[space](int[space]v3=v2+1;[space]v3<='Z';[space]v3++) [space][space][space][space][space][space][space][space][space][space][space][space]{ [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]if[space](!have[v3]) [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]continue; [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]for[space](int[space]v4=v3+1;[space]v4<='Z';[space]v4++) [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]{ [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]if[space](!have[v4]) [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]continue; [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]int[space]last=0; [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]int[space]i=0; [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]long[space]total[space]=[space]-1;
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]bool[space]isVowel1[space]=[space]false; [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]bool[space]isVowel2[space]=[space]false; [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]for[space](vector<char>::const_iterator[space]iter[space]=[space]in.begin(); [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]iter[space]!=[space]in.end();[space]++iter) [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]{ [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]i[space]+=[space]1; [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space] [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]const[space]char[space]ch[space]=[space]*iter;
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]if[space](ch[space]==[space]v1[space]||[space]ch[space]==[space]v2[space]||[space]ch[space]==[space]v3[space]||[space]ch[space]==[space]v4) [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]{ [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]if[space](total[space]==[space]-1) [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]total[space]=[space]0; [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]long[space]distance[space]=[space]i[space]-[space]last; [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space] [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]if[space](isVowel1[space]&&[space]isVowel2) [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]distance[space]=[space]100; [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]total[space]+=[space]distance*distance; [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]last[space]=[space]i; [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space] [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]isVowel2[space]=[space]isVowel1; [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]isVowel1[space]=[space]true; [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]} [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]else [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]{ [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]isVowel2[space]=[space]isVowel1; [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]isVowel1[space]=[space]false; [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]} [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]} [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space] [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]if[space]((maxTotal[space]==[space]-1[space]||[space]total[space]<[space]maxTotal)[space]&&[space]total[space]!=[space]-1) [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]{ [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]saved[0][space]=[space]v1; [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]saved[1][space]=[space]v2; [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]saved[2][space]=[space]v3; [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]saved[3][space]=[space]v4; [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]saved[4][space]=[space]'\0'; [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space] [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]maxTotal[space]=[space]total; [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]cout[space]<<[space]saved[space]<<[space]":[space]"[space]<<[space]maxTotal[space]<<[space]endl; [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]} [space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]} [space][space][space][space][space][space][space][space][space][space][space][space]} [space][space][space][space][space][space][space][space]} [space][space][space][space]} }
- Code:
-
//[space]FILE:[space]xforms.hpp #ifndef[space]XFORMS_HPP_ #define[space]XFORMS_HPP_
#include[space]"textbuffer.hpp"
TextBuffer[space]toUpperAlpha(const[space]TextBuffer[space]&in);
TextBuffer[space]formatGroups(const[space]TextBuffer[space]&in);
#endif[space]//[space]#ifndef[space]XFORMS_HPP_
- Code:
-
//[space]FILE:[space]textbuffer.hpp #ifndef[space]TEXTBUFFER_H__ #define[space]TEXTBUFFER_H__
class[space]TextBuffer[space]:[space]public[space]vector<char> { public:
bool[space]load(const[space]string[space]&filename); bool[space]load(istream[space]&in);
bool[space]dump(const[space]string[space]&filename); bool[space]dump(ostream[space]&out);
};
#endif[space]//[space]#ifndef[space]TEXTBUFFER_H__
- Code:
-
//[space]FILE:[space]xforms.cpp #include[space]<stdio.h> #include[space]<stdlib.h> #include[space]<ctype.h> #include[space]<time.h>
#include[space]<string> #include[space]<iostream> #include[space]<fstream>
#include[space]<iterator> #include[space]<vector>
#include[space]<getopt.h>
using[space]namespace[space]std;
#include[space]"textbuffer.hpp" #include[space]"xforms.hpp"
template[space]<typename[space]InIt,[space]typename[space]OutIt> void[space]copy_groups([space]InIt[space]first,[space]InIt[space]last,[space]OutIt[space]out,[space] unsigned[space]major,[space]unsigned[space]minor[space]) { unsigned[space]i[space]=[space]0; while[space](first[space]!=[space]last) { *out++[space]=[space]*first++; if[space](++i[space]%[space]major[space]==[space]0) *out++[space]=[space]'\n'; else[space]if[space](i[space]%[space]minor[space]==[space]0) *out++[space]=[space]'[space]'; }
if[space]([space]i[space]%[space]major[space]!=[space]0[space]) *out++[space]=[space]'\n'; }
TextBuffer[space]toUpperAlpha(const[space]TextBuffer[space]&in) { TextBuffer[space]out;
transform(in.begin(),[space]in.end(), back_inserter(out),[space]::toupper);
out.erase( remove_if(out.begin(),[space]out.end(), not1(ptr_fun(::isalpha))), out.end());
return[space]out; }
TextBuffer[space]formatGroups(const[space]TextBuffer[space]&in) { TextBuffer[space]out;
copy_groups(in.begin(),[space]in.end(),[space] back_inserter(out),[space]60,[space]5);
return[space]out; }
- Code:
-
//[space]FILE:[space]textbuffer.cpp #include[space]<stdio.h> #include[space]<stdlib.h> #include[space]<ctype.h> #include[space]<time.h>
#include[space]<string> #include[space]<iostream> #include[space]<fstream>
#include[space]<iterator> #include[space]<vector>
#include[space]<getopt.h>
using[space]namespace[space]std;
#include[space]"textbuffer.hpp"
bool[space]TextBuffer::load(const[space]string[space]&filename) { ifstream[space]in(filename.c_str());
bool[space]rc[space]=[space]load(in);
return[space]rc; }
bool[space]TextBuffer::load(istream[space]&in) { in[space]>>[space]noskipws; copy(istream_iterator<char>(in),[space]istream_iterator<char>(), back_inserter(*this));
return[space]true; }
bool[space]TextBuffer::dump(const[space]string[space]&filename) { ofstream[space]out(filename.c_str());
bool[space]rc[space]=[space]dump(out);
return[space]rc; }
bool[space]TextBuffer::dump(ostream[space]&out) { copy(begin(),[space]end(),[space]ostream_iterator<char>(out));
return[space]true; }
(And the source code tag in this new forum software doesn't seem to preserve leading whitespace....)
A typical run on unencrypted text (so we should find AEIO):
- Code:
-
ABCD:[space]5194 ABCE:[space]3128 ABDS:[space]2976 ABEF:[space]2884 ABEU:[space]2558 ABOS:[space]2473 ACOS:[space]2469 AEFO:[space]2251 AEIO:[space]1703
|
|
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
|
| |
|
Revelation
|
Jun 6 2008, 09:23 AM
Post #2
|
- Posts:
- 386
- Group:
- Admin
- Member
- #1
- Joined:
- August 23, 2005
|
Wow, I am actually surprised that this works :O I'd expect the T or other highly frequent consonants to interfere.
A quick thought: could the inverse of this test be used to find a correct permutation in a transposition cipher?
|
RRRREJMEEEEEPVKLWENFNVJKEEEEEAOLKAFKLXCFZAASDJXZTTTTTTTLSIOWJXMOKLAFJNNKFNXN RAGRBAQEMHIGDJVDSEOXVIYCELFHWLELJFIENXLRATALSJFSLCYTKLASJDKMHGOVOKAJDNMNUITN RRRRLJVEEEEECLYVYHNVPFTAEEEEEMWLMEIRNGLARWJAKJDFLWNTIERJMIPQWOTZEOCXKNUBNXCN RJIRPOWEANFUSNCZVDVZNMSFEKLOEPZLDKDJWSAAAAAAAOERHJCTNCKFRIMVKSOFOMKMANREWNBN RZUDRGXEEEEENFQIDVLQNCKNEEEEEDGLLLLLLAWIOSNCDARLODMTOEJXMILDFJROTKJSDNLVCZNN
|
| |
|
jdege
|
Jun 6 2008, 12:14 PM
Post #3
|
- Posts:
- 263
- Group:
- Admin
- Member
- #246
- Joined:
- December 7, 2006
|
- Quote:
-
Wow, I am actually surprised that this works :O I'd expect the T or other highly frequent consonants to interfere.
The high-frequency consonants won't partition the text into short consonant clusters.
- Quote:
-
A quick thought: could the inverse of this test be used to find a correct permutation in a transposition cipher?
If you have the correct permutation, you should have short consonant clusters, so I suppose it would work (if you were working with a plain transposition, where you could identify vowels).
OTOH, you'd also have high-frequency digrams and trigrams, which is what people usually test for. Would counting consonant clusters work? Probably. Would it work better than counting trigrams? I'd have to test it.
|
|
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
|
| |
|
jdege
|
Jun 7 2008, 09:14 PM
Post #4
|
- Posts:
- 263
- Group:
- Admin
- Member
- #246
- Joined:
- December 7, 2006
|
Thinking about this and where else it might be applied.
1. A cipher in which word separations are included, but enciphered. The most common code group is usually the word separator, but it's almost certain to have the lowest sum of the squares of the distance between them.
2. I've seen grid codes in which a pair of letters represented letters, common digrams or trigrams, numbers, or syllables, or a pair of letters represented a word, where two specific pairs of letters meant that the succeeding letters were words, or the succeeding letters were letters. In other words, where there were two "shift"codes, that changed the value of all succeeding code groups. These shifts should be distinguishable by their strict alternation. But if there were several pairs that were strictly alternating, would it be likely that the proper one would have the lowest sum of squares?
|
|
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
|
| |
|
Paarth Dave
|
Aug 15 2008, 02:09 PM
Post #5
|
- Posts:
- 70
- Group:
- Members
- Member
- #2,712
- Joined:
- March 26, 2008
|
Are there any tricks for identifying vowels with concern to paper-and-pencil cryptographers??
|
Cryptography Vanquished....
|
| |
|
jdege
|
Aug 15 2008, 02:19 PM
Post #6
|
- Posts:
- 263
- Group:
- Admin
- Member
- #246
- Joined:
- December 7, 2006
|
- Paarth Dave
- Aug 15 2008, 02:09 PM
Are there any tricks for identifying vowels with concern to paper-and-pencil cryptographers?? That's what my post on Contact Analysis was about.
|
|
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
|
| |
| 1 user reading this topic (1 Guest and 0 Anonymous)
|