Welcome Guest [Log In] [Register]
Welcome to Crypto. We hope you enjoy your visit.


You're currently viewing our forum as a guest. This means you are limited to certain areas of the board and there are some features you can't use. If you join our community, you'll be able to access member-only sections, and use many member-only features such as customizing your profile, sending personal messages, and voting in polls. Registration is simple, fast, and completely free.


Join our community!


If you're already a member please log in to your account to access all of our features:

Username:   Password:
Add Reply
Another technique for identifying vowels
Topic Started: Jun 5 2008, 09:58 PM (218 Views)
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
It's been quiet for a while, so I thought I'd post code for another technique for identifying vowels.

This showed up in a discussion in the ACA group on MSN. The guy who offered it (CryptoRat) had been using it for decades, and had no idea who had originated it.

I took his idea, wrote some code, and tried it out. It did astoundingly well.

Testing on 1000 runs of ""fortune -l" from bsdgames, CryptoRat's approach (as implemented my somewhat debuggified code), correctly identified all four high-frequency vowels 593 times. It identified three or more of them 880 times, it identified two or more of them 938 times.

I'd call this a very promising technique.

More - of the 287 instances in which this identified three of the four high-frequency vowels, 117 times the fourth letter found was 'U'. So you could say that it found four vowels 710 times. (Next most common fourth letters were 'H' and 'T', 40 and 39 times.)

Code:
 

//[space]FILE:[space]ccluster.cpp
#if[space]0
From[space]CryptoRat,[space]on[space]groups.msn.com/Cryptogram[space](ACA[space]forum)


From:[space]MSN[space]NicknameCryptoRat[space][space]in[space]response[space]to[space]Message[space]1[space][space][space][space]Sent:[space]5/17/2008[space]11:46[space]AM
I[space]find[space]the[space]most[space]reliable[space]system[space]is[space]a[space]statistical[space]test[space]best[space]done[space]by[space]computers,[space]but[space]it[space]can[space]be[space]done[space]by[space]hand.[space]You[space]have[space]to[space]take[space]every[space]group[space]of[space]4[space]letters[space]that[space]appear[space]in[space]the[space]con[space](but[space]if[space]you[space]start[space]with[space]the[space]4[space]most[space]frequent[space]and[space]work[space]your[space]way[space]down[space]you[space]can[space]usually[space]identify[space]the[space]best[space]group[space]pretty[space]early[space]on[space]and[space]get[space]at[space]least[space]two[space]or[space]three[space]vowels[space]right[space]very[space]quickly).

For[space]each[space]group[space]count[space]the[space]number[space]of[space]spaces[space]from[space]each[space]of[space]those[space]letters[space]to[space]the[space]next[space]one[space]that[space]appears[space]in[space]the[space]con,[space]then[space]square[space]that[space]number[space]and[space]add[space]it[space]to[space]the[space]running[space]total.[space]The[space]group[space]of[space]4[space]with[space]the[space]lowest[space]total[space]is[space]usually[space]all[space]vowels[space]or[space]contains[space]three[space]vowels[space]and[space]maybe[space]an[space]H.

Example:
Now[space]is[space]the[space]time[space]for[space]all[space]good[space]men[space]to[space]come[space]to[space]the[space]aid[space]of[space]their[space]country.

If[space]you[space]are[space]testing[space]the[space]group[space]OAEI:
start[space]from[space]the[space]beginning[space]and[space]count[space]the[space]distance[space]between[space]the[space]O[space]in[space]NOW[space]to[space]the[space]I[space]in[space]IS[space](2[space]squared[space]=[space]4)[space]then[space]from[space]that[space]I[space]to[space]the[space]E[space]in[space]THE[space](4[space]*[space]4[space]=[space]16[space]added[space]to[space]4[space]=[space]20),[space]then[space]from[space]that[space]E[space]to[space]the[space]I[space]in[space]TIME,[space]etc.[space]In[space]this[space]example[space]the[space]group[space]AEIO[space]scores[space]lower[space]than[space]any[space]other[space]combination.[space]The[space]test[space]isn't[space]100%[space]accurate,[space]so[space]I[space]keep[space]at[space]least[space]the[space]5[space]best[space](i.e.[space]lowest)[space]scoring[space]groups[space]to[space]explore.[space]In[space]my[space]computer[space]program[space]I[space]modified[space]the[space]test[space]to[space]prevent[space]certain[space]anomalies.[space]For[space]example,[space]if[space]the[space]group[space]produces[space]three[space]in[space]a[space]row,[space]I[space]add[space]a[space]large[space]number,[space]since[space]three[space]vowels[space]in[space]a[space]row[space]are[space]unusual,[space]even[space]though[space]the[space]score[space]would[space]be[space]1*1[space]+[space]1*1[space]=[space]2.

#endif


#if[space]0

----------------------------------
From:[space]CryptoRat
Message[space]5[space]in[space]Discussion

I[space]can't[space]take[space]credit[space]for[space]the[space]original[space]idea.[space][space]I[space]read[space]it[space]somewhere,
although[space]I[space]can't[space]remember[space]where.[space][space]I[space]recently[space]looked[space]at[space]my[space]code[space]in[space]my
interactive[space]solver[space](written[space]in[space]QuickBasic)[space]and[space]saw[space]that[space]I[space]had[space]comments
referring[space]to[space]the[space]same[space]routine[space]in[space]Applesoft[space]Basic,[space]so[space]that[space]means[space]I[space]coded
it[space]back[space]before[space]I[space]switched[space]from[space]my[space]Apple[space]][e[space]to[space]my[space]DOS[space]machine,[space]whenever
that[space]was.[space][space]Early[space]or[space]mid-80s,[space]I[space]think.[space][space]A[space]good[space]guess[space]might[space]be[space]Caxton
Foster's[space]Cryptanalysis[space]for[space]Microcomputers,[space]or[space]it[space]could[space]have[space]been[space]in[space]the
ACA[space]Computer[space]Supplement.[space][space]I[space]did[space]tweak[space]the[space]code[space]to[space]account[space]for[space]3-vowel
clusters[space]and[space]maybe[space]something[space]else[space]minor[space]like[space]that.[space][space]It's[space]a[space]good[space]algorithm.

#endif


#include[space]<stdio.h>
#include[space]<stdlib.h>
#include[space]<ctype.h>
#include[space]<time.h>

#include[space]<string>
#include[space]<iostream>
#include[space]<fstream>

#include[space]<iterator>
#include[space]<vector>
#include[space]<map>

using[space]namespace[space]std;

#include[space]"textbuffer.hpp"
#include[space]"xforms.hpp"

long[space]consonantClusterCount(const[space]TextBuffer[space]&in);

int[space]main(int[space]argc,[space]char[space]*argv[])
{
[space][space][space][space]TextBuffer[space]inText;
[space][space][space][space]if[space](argc[space]<=[space]1)
[space][space][space][space][space][space][space][space]inText.load(cin);
[space][space][space][space]else
[space][space][space][space]{
[space][space][space][space][space][space][space][space]for[space](int[space]i[space]=[space]1;[space]i<argc;[space]i++)
[space][space][space][space][space][space][space][space][space][space][space][space]inText.load(argv[i]);
[space][space][space][space]}

[space][space][space][space]consonantClusterCount(toUpperAlpha(inText));

[space][space][space][space]return[space]EXIT_SUCCESS;
}


long[space]consonantClusterCount(const[space]TextBuffer[space]&in)
{
[space][space][space][space]map<char,bool>[space]have;
[space][space][space][space]
[space][space][space][space]for[space](char[space]*ch[space]=[space]"ABCDEFGHIJKLMNOPQRSTUVWXYZ";[space]*ch;[space]ch++)
[space][space][space][space][space][space][space][space]have[*ch][space]=[space]false;

[space][space][space][space]for[space](vector<char>::const_iterator[space]iter[space]=[space]in.begin();[space]iter[space]!=[space]in.end();[space]++iter)
[space][space][space][space][space][space][space][space]have[*iter][space]=[space]true;

[space][space][space][space]char[space]saved[6];
[space][space][space][space]long[space]maxTotal[space]=[space]-1;

[space][space][space][space]for[space](int[space]v1='A';[space]v1<='Z';[space]v1++)
[space][space][space][space]{
[space][space][space][space][space][space][space][space]if[space](!have[v1])
[space][space][space][space][space][space][space][space][space][space][space][space]continue;
[space][space][space][space][space][space][space][space]for[space](int[space]v2=v1+1;[space]v2<='Z';[space]v2++)
[space][space][space][space][space][space][space][space]{
[space][space][space][space][space][space][space][space][space][space][space][space]if[space](!have[v2])
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]continue;
[space][space][space][space][space][space][space][space][space][space][space][space]for[space](int[space]v3=v2+1;[space]v3<='Z';[space]v3++)
[space][space][space][space][space][space][space][space][space][space][space][space]{
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]if[space](!have[v3])
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]continue;
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]for[space](int[space]v4=v3+1;[space]v4<='Z';[space]v4++)
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]{
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]if[space](!have[v4])
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]continue;
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]int[space]last=0;
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]int[space]i=0;
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]long[space]total[space]=[space]-1;

[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]bool[space]isVowel1[space]=[space]false;
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]bool[space]isVowel2[space]=[space]false;
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]for[space](vector<char>::const_iterator[space]iter[space]=[space]in.begin();
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]iter[space]!=[space]in.end();[space]++iter)
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]{
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]i[space]+=[space]1;
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]const[space]char[space]ch[space]=[space]*iter;

[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]if[space](ch[space]==[space]v1[space]||[space]ch[space]==[space]v2[space]||[space]ch[space]==[space]v3[space]||[space]ch[space]==[space]v4)
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]{
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]if[space](total[space]==[space]-1)
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]total[space]=[space]0;
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]long[space]distance[space]=[space]i[space]-[space]last;
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]if[space](isVowel1[space]&&[space]isVowel2)
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]distance[space]=[space]100;
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]total[space]+=[space]distance*distance;
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]last[space]=[space]i;
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]isVowel2[space]=[space]isVowel1;
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]isVowel1[space]=[space]true;
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]}
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]else
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]{
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]isVowel2[space]=[space]isVowel1;
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]isVowel1[space]=[space]false;
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]}
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]}
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]if[space]((maxTotal[space]==[space]-1[space]||[space]total[space]<[space]maxTotal)[space]&&[space]total[space]!=[space]-1)
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]{
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]saved[0][space]=[space]v1;
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]saved[1][space]=[space]v2;
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]saved[2][space]=[space]v3;
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]saved[3][space]=[space]v4;
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]saved[4][space]=[space]'\0';
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]maxTotal[space]=[space]total;
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]cout[space]<<[space]saved[space]<<[space]":[space]"[space]<<[space]maxTotal[space]<<[space]endl;
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]}
[space][space][space][space][space][space][space][space][space][space][space][space][space][space][space][space]}
[space][space][space][space][space][space][space][space][space][space][space][space]}
[space][space][space][space][space][space][space][space]}
[space][space][space][space]}
}


Code:
 

//[space]FILE:[space]xforms.hpp
#ifndef[space]XFORMS_HPP_
#define[space]XFORMS_HPP_

#include[space]"textbuffer.hpp"

TextBuffer[space]toUpperAlpha(const[space]TextBuffer[space]&in);

TextBuffer[space]formatGroups(const[space]TextBuffer[space]&in);

#endif[space]//[space]#ifndef[space]XFORMS_HPP_


Code:
 

//[space]FILE:[space]textbuffer.hpp
#ifndef[space]TEXTBUFFER_H__
#define[space]TEXTBUFFER_H__

class[space]TextBuffer[space]:[space]public[space]vector<char>
{
public:

bool[space]load(const[space]string[space]&filename);
bool[space]load(istream[space]&in);

bool[space]dump(const[space]string[space]&filename);
bool[space]dump(ostream[space]&out);

};


#endif[space]//[space]#ifndef[space]TEXTBUFFER_H__


Code:
 

//[space]FILE:[space]xforms.cpp
#include[space]<stdio.h>
#include[space]<stdlib.h>
#include[space]<ctype.h>
#include[space]<time.h>

#include[space]<string>
#include[space]<iostream>
#include[space]<fstream>

#include[space]<iterator>
#include[space]<vector>

#include[space]<getopt.h>

using[space]namespace[space]std;

#include[space]"textbuffer.hpp"
#include[space]"xforms.hpp"

template[space]<typename[space]InIt,[space]typename[space]OutIt>
void[space]copy_groups([space]InIt[space]first,[space]InIt[space]last,[space]OutIt[space]out,[space]
unsigned[space]major,[space]unsigned[space]minor[space])
{
unsigned[space]i[space]=[space]0;
while[space](first[space]!=[space]last)
{
*out++[space]=[space]*first++;
if[space](++i[space]%[space]major[space]==[space]0)
*out++[space]=[space]'\n';
else[space]if[space](i[space]%[space]minor[space]==[space]0)
*out++[space]=[space]'[space]';
}

if[space]([space]i[space]%[space]major[space]!=[space]0[space])
*out++[space]=[space]'\n';
}

TextBuffer[space]toUpperAlpha(const[space]TextBuffer[space]&in)
{
TextBuffer[space]out;

transform(in.begin(),[space]in.end(),
back_inserter(out),[space]::toupper);

out.erase(
remove_if(out.begin(),[space]out.end(),
not1(ptr_fun(::isalpha))),
out.end());

return[space]out;
}

TextBuffer[space]formatGroups(const[space]TextBuffer[space]&in)
{
TextBuffer[space]out;

copy_groups(in.begin(),[space]in.end(),[space]
back_inserter(out),[space]60,[space]5);

return[space]out;
}


Code:
 

//[space]FILE:[space]textbuffer.cpp
#include[space]<stdio.h>
#include[space]<stdlib.h>
#include[space]<ctype.h>
#include[space]<time.h>

#include[space]<string>
#include[space]<iostream>
#include[space]<fstream>

#include[space]<iterator>
#include[space]<vector>

#include[space]<getopt.h>

using[space]namespace[space]std;

#include[space]"textbuffer.hpp"


bool[space]TextBuffer::load(const[space]string[space]&filename)
{
ifstream[space]in(filename.c_str());

bool[space]rc[space]=[space]load(in);

return[space]rc;
}

bool[space]TextBuffer::load(istream[space]&in)
{
in[space]>>[space]noskipws;
copy(istream_iterator<char>(in),[space]istream_iterator<char>(),
back_inserter(*this));

return[space]true;
}

bool[space]TextBuffer::dump(const[space]string[space]&filename)
{
ofstream[space]out(filename.c_str());

bool[space]rc[space]=[space]dump(out);

return[space]rc;
}

bool[space]TextBuffer::dump(ostream[space]&out)
{
copy(begin(),[space]end(),[space]ostream_iterator<char>(out));

return[space]true;
}


(And the source code tag in this new forum software doesn't seem to preserve leading whitespace....)

A typical run on unencrypted text (so we should find AEIO):

Code:
 

ABCD:[space]5194
ABCE:[space]3128
ABDS:[space]2976
ABEF:[space]2884
ABEU:[space]2558
ABOS:[space]2473
ACOS:[space]2469
AEFO:[space]2251
AEIO:[space]1703
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post Goto Top
 
Revelation
Member Avatar
Administrator
[ *  *  *  *  * ]
Wow, I am actually surprised that this works :O I'd expect the T or other highly frequent consonants to interfere.

A quick thought: could the inverse of this test be used to find a correct permutation in a transposition cipher?
RRRREJMEEEEEPVKLWENFNVJKEEEEEAOLKAFKLXCFZAASDJXZTTTTTTTLSIOWJXMOKLAFJNNKFNXN
RAGRBAQEMHIGDJVDSEOXVIYCELFHWLELJFIENXLRATALSJFSLCYTKLASJDKMHGOVOKAJDNMNUITN
RRRRLJVEEEEECLYVYHNVPFTAEEEEEMWLMEIRNGLARWJAKJDFLWNTIERJMIPQWOTZEOCXKNUBNXCN
RJIRPOWEANFUSNCZVDVZNMSFEKLOEPZLDKDJWSAAAAAAAOERHJCTNCKFRIMVKSOFOMKMANREWNBN
RZUDRGXEEEEENFQIDVLQNCKNEEEEEDGLLLLLLAWIOSNCDARLODMTOEJXMILDFJROTKJSDNLVCZNN
Offline Profile Quote Post Goto Top
 
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
Quote:
 
Wow, I am actually surprised that this works :O I'd expect the T or other highly frequent consonants to interfere.

The high-frequency consonants won't partition the text into short consonant clusters.

Quote:
 
A quick thought: could the inverse of this test be used to find a correct permutation in a transposition cipher?

If you have the correct permutation, you should have short consonant clusters, so I suppose it would work (if you were working with a plain transposition, where you could identify vowels).

OTOH, you'd also have high-frequency digrams and trigrams, which is what people usually test for. Would counting consonant clusters work? Probably. Would it work better than counting trigrams? I'd have to test it.
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post Goto Top
 
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
Thinking about this and where else it might be applied.

1. A cipher in which word separations are included, but enciphered. The most common code group is usually the word separator, but it's almost certain to have the lowest sum of the squares of the distance between them.

2. I've seen grid codes in which a pair of letters represented letters, common digrams or trigrams, numbers, or syllables, or a pair of letters represented a word, where two specific pairs of letters meant that the succeeding letters were words, or the succeeding letters were letters. In other words, where there were two "shift"codes, that changed the value of all succeeding code groups. These shifts should be distinguishable by their strict alternation. But if there were several pairs that were strictly alternating, would it be likely that the proper one would have the lowest sum of squares?
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post Goto Top
 
Paarth Dave
Advanced Member
[ *  *  * ]
Are there any tricks for identifying vowels with concern to paper-and-pencil cryptographers??

Cryptography Vanquished....
Offline Profile Quote Post Goto Top
 
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
Paarth Dave
Aug 15 2008, 02:09 PM
Are there any tricks for identifying vowels with concern to paper-and-pencil cryptographers??
That's what my post on Contact Analysis was about.

When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post Goto Top
 
1 user reading this topic (1 Guest and 0 Anonymous)
« Previous Topic · General · Next Topic »
Add Reply