Welcome Guest [Log In] [Register]
Viewing Single Post From: Cracking A Vig With Ic
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
Paarth Dave
May 10 2008, 12:06 AM
Can you please explain to me the chi test in simple terms? I have read your topics based on it...but it went above my head...

You use the chi test to measure how similar two frequency distributions are.

When you draw a graph a frequency distribution, you get a curve with peaks and valleys. If two messages are encoded with the same alphabet, they will have similar curves. If they are encoded with alphabets that are shifted relative to one another, they will have curves that won't match up - the peaks and valleys of one will be shifted relative to the other.

The Vig uses the standard alphabet - A-Z, shifted by an amount equal to the numeric value of the key letter. If the key letter is 'B', the cipher text consists of the plaintext shifted by one character, if it's 'D', it's shifted by three.

So, after we've figured out how many alphabets there are - how many letters there are in the keyword - the next step is to figure out by how much each of the alphabets has been shifted.

And we do that by trying out the 26 different shift amounts, and comparing the frequency distribution of each against the frequency distribution of plain text, to see which lines up the best.

This is the frequency distribution of normal English text:
Code:
 
        #                                          
        #                                          
        #                             #            
        #                             #            
#       #                   #         #            
#       #       #         # #         #            
#       #     # #         # #     # # #            
#     # #     # #     #   # #     # # #            
#   # # #     # #     #   # #     # # # #          
#   # # # # # # #     # # # # #   # # # #   #   #  
# # # # # # # # #   # # # # # #   # # # # # #   #
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Now suppose you were trying to crack a vig, and you'd already figured out that it had a five-letter keyword. Starting at the first letter, you'd take every fifth letter, and do a frequency distribution, and you might get something like this:
Code:
 
                  #                                  
                  #       #                          
                # #       # #                          
      #         # #       # # #           #       #    
      #         # #     # # # #           #       #    
      #         # #     # # # #       #   #       #    
      #     #   # #     # # # #       #   #     # #    
  # # #     # # # #     # # # #   # # #   # # # # #    
# # # #   # # # # # #   # # # # # # # #   # # # # #    
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Now this doesn't line up nicely with the graph above. But it does have the same number of peaks and valleys. So when we shift it over a bit, we get a graph that does line up nicely:
Code:
 
                            #                        
                            #       #                
                          # #       # #                
#       #       #         # #       # # #              
#       #       #         # #     # # # #              
#       #       #         # #     # # # #       #      
#     # #       #     #   # #     # # # #       #      
# # # # #   # # #     # # # #     # # # #   # # #      
# # # # # # # # #   # # # # # #   # # # # # # # #      
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
The question is how do you recognize that you things lined up properly? It's not a matter of finding the perfect fit, because you'll never get a perfect fit. It's a matter of recognizing which is the best fit, of the 26 possibilities you have.

You can do it by eye, but we'd really rather have a process we can use that's a little more reliable, and it'd be really nice if we had something we could program into a computer.

That's what the chi test is - a measure of how closely two frequency distributions match.

And the calculation is pretty simple:
Code:
 
        1st     2nd
Letter   Text    Text      Product

A         12        4      12 * 4 = 48
B          6        2       6 * 2 = 12
C          3        1       3 * 1 =  3

  ...

Z          1        0       1 *  0 = 0
______________________________________
SUM      257      101              741
257 is the sum of the frequency counts of the first text - the total number of latters in the first text. 101 is the sum of the frequency counts of the second text.

741 is the sum of the products of the frequncy counts for each letter.

And chi is:
Code:
 
 chi = 714 / (257 * 101)
The sum of the products divided by the product of the sums

Your result will range from about 0 - meaning that the frequency distributiions are inverses of each other (one peaks where the other vallteys, and vice versa), to about 0.0385 - meaning that there is no correlation whatsoever between the two distributions, to about 0.070 - meaning that the two distributions line up almost perfectly.
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post
Cracking A Vig With Ic · General