| Viewing Single Post From: Contact Analysis | |
|---|---|
| jdege | May 19 2008, 01:35 PM |
|
Elite member
![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I'd noticed that the Wikipedia page on cryptanalysis was very weak on attacks on classical ciphers. It mentioned, IIRC, frequency analysis, Kasiski, and IC. Nothing on pattern words, anagramming, or contact tables. I tried to write something, patterned on the existing page on frequency analysis, and it quickly became too long. That's what was on the scratch page you Googled. I took the first part, and posted it as contact analysis, figuring to put a second page up on cluster analysis later. That's when I hit Google's citation rules. Writing up something like this is a lot more fun than tracking down citations, trying to figure out which book I had learned these various ideas from. So I pretty much gave up on it. With the recent surge in activity here, I decided to grab what I had intended for wikipedia, and to rewrite it for here. When I said I was trying to figure out how to deal with Pats - simple substitution without word separation, this is what I meant. If a Pat is long enough for the statistics to be clear, they're pretty easy to deal with. If you have a good crib - as you do for the first half-dozen Pats in every issue of The Cryptogram - they're pretty easy to deal with. But short pats can be a problem. For them, these techniques sometimes work. The basic approach, trying to distinguish vowels from consonants, is pretty much the essential first step. The consonant line is one method for trying to do so. I've seen others, and I often try the others. I've not found any that always work. Another form of contact chart: What you look for here are high-frequency letters that make frequent contact with low-frequency letters. These are usually vowels. Another I've seen was to look at the 8 highest frequency letters and the 18 highest frequency digrams. In this case: What you're looking for is which of the high-frequency letters appears in the greatest number of distinct digrams. This is usually e. Of the remaining 7 high-frequency letters, the three that are found least often in contact with e are most often a, i, and o. But again, these don't always work. What works best for me is to lay out all of the info, frequency counts, digrams, trigrams, repeated letters, ABA patterns, contact tables, consonant and vowel lines, etc., and to try to make guesses. Keeping my mind focused on what I'm looking for next is essential. I can read over something like "_anne_e__e_e__e__hatthe" dozens of times, without picking up that it means "canneverrememberwhat". It was only when I was looking at that with an eye to seeing where an r might fit, that I recognized the solution. Ditto for keeping track of vowel separation. If you have your vowels right, your consonant clusters will be small. When you have some, but not all, of your vowels, you can be assured that the remaining vowels will be among the set of letters that appear in all of your consonant clusters. When you have the vowels right, you will have few long groups of consonants.
The idea of using contact behavior to partition the ciphertext into distinct sets isn't limited to simple substitution. I was in a conversation in a SF forum, not long ago, over how strong his cipher system was. What he'd proposed was, in effect, a nomenclature. A small list of code words plus a cipher for spelling out words that weren't in the list. He was adamant that no one would be able to tell which of his code groups represented words and which would represent letters. That is, of course, nonsense. The groups representing letters would be used to spell out words - that means they would appear in clusters. Ditto for the groups representing words. Examining the frequency with which each code group appeared in contact with each other group should quickly allow you to partition the code groups into the letters and the words. The letters can then be attacked as simple substitution. The words can then be guessed from context. |
| When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl. | |
![]() |
|
| Contact Analysis · General | |




![]](http://209.85.122.85/static/1/pip_r.png)


10:21 PM Nov 27