Welcome Guest [Log In] [Register]
Welcome to Crypto. We hope you enjoy your visit.


You're currently viewing our forum as a guest. This means you are limited to certain areas of the board and there are some features you can't use. If you join our community, you'll be able to access member-only sections, and use many member-only features such as customizing your profile, sending personal messages, and voting in polls. Registration is simple, fast, and completely free.


Join our community!


If you're already a member please log in to your account to access all of our features:

Username:   Password:
Add Reply
General Strategies For Cryptanalysis
Topic Started: Aug 24 2006, 09:51 PM (558 Views)
rot13
Elite member
[ *  *  *  *  * ]
I know this is pretty common stuff for most of you guys, but for people who are new to cryptanalysis, I thought it might be good to just outline the basic philosophy and some of the techniques used. Mostly this applies to classical ciphers, I don't do much with digital ciphers.

My one-sentence summary of my cryptanalysis strategy would be: Find weaknesses and exploit them. Plain english text has a number of characteristics, and usually some of these are still visible after encryption. The ones that usually come up are:

Word Patterns
If word breaks are preserved, you can make reasonable guesses for single-letter words, or word with apostrophes. In a monoalphabetic cipher, you can look for common patterns (e.g. XYZZAXX) might be SUCCESS.

If you are working with ASCII text, don't forget that the left-most bit is always 0 for 7-bit ASCII.

Letter frequency
Whenever you have situations where a plaintext letter always encrypts to the same ciphertext letter (as in a monoalphabetic substitution, or when looking at one particular offset in a polyalphabetic substitution) you can look at the frequency with which each letter occurs. If you are just dealing with letters, you usually look for the letters ETAONIRSH. Sometimes you learn something by what letters are missing since Q, J, X and Z (the high-point Scrabble letters) are pretty rare. If you are working with a more complete character set, you should remember that space occurs about 20% of the time in English text, and that with the exception of 9, 10 and 13 (tab, linefeed, carriage return) ASCII text values should be in the range 32-127.

In more complex cases, you may look at the frequency of pairs of letters.

When you write a program to exploit letter frequency, one of the most common things to do is to assign weights to letters based on frequency (0.12 for e, 0.097 for t, etc.) and then just sum the weights of each letter. You look for the result with the highest weight. Sometimes it is better to sum the logs of the frequencies instead of the frequencies themselves. In rare cases, you might want to use a correlation coefficient instead of summing letter frequencies. A correlation coefficient is a statistical measure that tells you how similar or dissimilar two sets of numbers are.

Sliding Alphabet
Caesar ciphers and many Vigenere-class ciphers simply slide the A-Z alphabet. This is a huge help, because you can just slide it back until it looks right (which you can usually determine by letter frequency)

Index of Coincidence
The actual formula is described elsewhere, but to put it simply, IoC is a good indicator that you are working with text that repeats about as often as English text (or whatever language you think you're working with). IoC is often used in distinguishing monoalphabetic from polyalphabetic ciphers, or in figuring out the key length of a polyalphabetic cipher. It has also been used to attack the Enigma cipher, and I used it to attack ADFGVX.

Anything Out of the Ordinary
If you consider that an ideal cipher should look just like random data, anything that doesn't seem random may be a clue. For example, if the cipher is all numbers, do certain numbers only occur in certain places? Do the numbers seem to go up as you go along? If you are looking at binary text, do certain bit positions have a tendancy to be 0 more than 1 or vice-versa?


The idea here is that you build yourself a little toolkit for attacking ciphers. You don't always know going in what you're going to need, and the more things you have handy, the better.

When you attack a cipher, assuming you know the algorithm or at least some characteristics of it, you look at it from the standpoint of what characteristics does it NOT hide. Take Vigenere for example, since it has a reputation for being tough. The first big hole is that you can use IoC to determine key length. Once you know the key length, you can look at each key position as a monoalphabetic substition. If the cipher still has a sliding alphabet, you just slide the alphabet at each key position and look for the one whose letter frequencies most closely match English. If it is an offshoot of Vigenere that doesn't have a sliding alphabet, you can probably still explit letter frequencies in various ways.

In the few digital ciphers we have looked at, one of the first things I have gone after is that leftmost 0 bit in ASCII text. Look at the algorithm and see how that 0 passes through it. Can you figure out anything about the key based on that 0? Sometimes it means trying various key values looking for ones that yield certain values in certain places (really, that's what you are doing when you slide the alphabet in a Vigenere, too - you are just trying different pieces of the key).

When you attack a cipher using a program, the thing your program needs most is a way to tell when it might have the right answer. Quite often, the technique of summing letter frequencies works just great, it certainly does for Vigenere. For example, if the text started with "THEREWAS", the summation would be 0.97+0.035+0.12+0.066+0.12+0.014+0.082+0.07 (just using a chart of letter frequencies I googled for). If the text was ZQQX, the summation would just be 0.0006+0.0009+0.0009+0.0022. When attacking Vigenere, of course, you aren't going to be looking at sequences of words, because you attack each key position separately. So you might be adding up the frequencies of TOEENKSWTEARAA and comparing them to the frequencies of UPFFOLTXUFBSBB.

In more complex cases, you might have to look for common letter pairs, triplets, quads, etc. I also have a scoring routine that looks for words. It isn't very fast, but it is very effective.

I'm sure that other folks have good stuff to add to this as well. I know Donald, insecure and Revelation have done a good bit of cracking as well, maybe you guys have different approaches?
Offline Profile Quote Post Goto Top
 
Donald
Elite member
[ *  *  *  *  * ]
Thanks rot13! I've added it to the Decryption Tutorials sticky.
Offline Profile Quote Post Goto Top
 
Revelation
Member Avatar
Administrator
[ *  *  *  *  * ]
Great job! Maybe I'll add something later :)
RRRREJMEEEEEPVKLWENFNVJKEEEEEAOLKAFKLXCFZAASDJXZTTTTTTTLSIOWJXMOKLAFJNNKFNXN
RAGRBAQEMHIGDJVDSEOXVIYCELFHWLELJFIENXLRATALSJFSLCYTKLASJDKMHGOVOKAJDNMNUITN
RRRRLJVEEEEECLYVYHNVPFTAEEEEEMWLMEIRNGLARWJAKJDFLWNTIERJMIPQWOTZEOCXKNUBNXCN
RJIRPOWEANFUSNCZVDVZNMSFEKLOEPZLDKDJWSAAAAAAAOERHJCTNCKFRIMVKSOFOMKMANREWNBN
RZUDRGXEEEEENFQIDVLQNCKNEEEEEDGLLLLLLAWIOSNCDARLODMTOEJXMILDFJROTKJSDNLVCZNN
Offline Profile Quote Post Goto Top
 
1 user reading this topic (1 Guest and 0 Anonymous)
« Previous Topic · General · Next Topic »
Add Reply