| Welcome to Crypto. We hope you enjoy your visit. You're currently viewing our forum as a guest. This means you are limited to certain areas of the board and there are some features you can't use. If you join our community, you'll be able to access member-only sections, and use many member-only features such as customizing your profile, sending personal messages, and voting in polls. Registration is simple, fast, and completely free. Join our community! If you're already a member please log in to your account to access all of our features: |
- Pages:
- 1
- 2
| Cryptostat; My work in progress | |
|---|---|
| Topic Started: Sep 8 2005, 11:18 PM (412 Views) | |
| PulsarSL | Sep 8 2005, 11:18 PM Post #1 |
|
Super member
![]() ![]() ![]() ![]() ![]() ![]()
|
Since I'm not very good at making algorithims, I've decided to try something a bit different with crypto software. I'm working on a program that will accept input (an encoded message) and will analyze the frequency of the letters in it to try and predict (obviously, it can't be perfect) which letters are which and also provide a best-guess of the message. I realize this will only work with simple encryptions and none of the harder stuff will work with it, but it's both a cryptography and C++ learning experience for me. I'm pretty busy, so work is pretty slow. PulsarSL P.S. - We need more members... this site isn't listed in Google is it?? |
![]() |
|
| Revelation | Sep 9 2005, 01:09 PM Post #2 |
|
Administrator
![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Oh we do need more members! I don't know if it is listed. This is my code for the frequency check:
|
|
RRRREJMEEEEEPVKLWENFNVJKEEEEEAOLKAFKLXCFZAASDJXZTTTTTTTLSIOWJXMOKLAFJNNKFNXN RAGRBAQEMHIGDJVDSEOXVIYCELFHWLELJFIENXLRATALSJFSLCYTKLASJDKMHGOVOKAJDNMNUITN RRRRLJVEEEEECLYVYHNVPFTAEEEEEMWLMEIRNGLARWJAKJDFLWNTIERJMIPQWOTZEOCXKNUBNXCN RJIRPOWEANFUSNCZVDVZNMSFEKLOEPZLDKDJWSAAAAAAAOERHJCTNCKFRIMVKSOFOMKMANREWNBN RZUDRGXEEEEENFQIDVLQNCKNEEEEEDGLLLLLLAWIOSNCDARLODMTOEJXMILDFJROTKJSDNLVCZNN | |
![]() |
|
| codebreaker11235 | Sep 9 2005, 04:14 PM Post #3 |
|
Just registered
![]() ![]() ![]()
|
If you would like here is my Java code that performs basically the same task, it also calculates the observed phi, the expected monographic phi, and the expected random phi. I plan to add a few more statistical tests to it to help in analysis. Code:
Edit Revelation: Code put between code tags |
![]() |
|
| insecure | Sep 10 2005, 01:05 AM Post #4 |
|
Elite member
![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Unless I am mistaken, all the programs shown to date deal only with alphabetical characters. Unfortunately, many encryption mechanisms do not restrict themselves to alphabetics. The following program (which is written in standard C) will display frequency stats for a file named in its command line, irrespective of the kind of data stored in that file. I apologise that it's a bit long (around 100 lines), but the alternative is to write "clever" C, which doesn't really help people to read it easily - and it is a complete program, not just a fragment:
|
![]() |
|
| Revelation | Sep 10 2005, 08:46 AM Post #5 |
|
Administrator
![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Mine takes every ascii character starting at 33. See here why. I like it that much people here are programmers (and in different languages!).
|
|
RRRREJMEEEEEPVKLWENFNVJKEEEEEAOLKAFKLXCFZAASDJXZTTTTTTTLSIOWJXMOKLAFJNNKFNXN RAGRBAQEMHIGDJVDSEOXVIYCELFHWLELJFIENXLRATALSJFSLCYTKLASJDKMHGOVOKAJDNMNUITN RRRRLJVEEEEECLYVYHNVPFTAEEEEEMWLMEIRNGLARWJAKJDFLWNTIERJMIPQWOTZEOCXKNUBNXCN RJIRPOWEANFUSNCZVDVZNMSFEKLOEPZLDKDJWSAAAAAAAOERHJCTNCKFRIMVKSOFOMKMANREWNBN RZUDRGXEEEEENFQIDVLQNCKNEEEEEDGLLLLLLAWIOSNCDARLODMTOEJXMILDFJROTKJSDNLVCZNN | |
![]() |
|
| insecure | Sep 10 2005, 10:20 AM Post #6 |
|
Elite member
![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Yes, I understand why you start at 33 (basically it's because printing characters with lower codes is a PITA). That's fine for some purposes, but some ciphers do map some plaintext characters to ciphertext character values below 33, so it's as well to have a way to analyse such ciphertexts. |
![]() |
|
| PulsarSL | Sep 10 2005, 06:14 PM Post #7 |
|
Super member
![]() ![]() ![]() ![]() ![]() ![]()
|
But how are those displayed? |
![]() |
|
| PulsarSL | Sep 10 2005, 06:15 PM Post #8 |
|
Super member
![]() ![]() ![]() ![]() ![]() ![]()
|
That's how I'm planning on writing mine, but I have to get around to it
|
![]() |
|
| PulsarSL | Sep 10 2005, 06:17 PM Post #9 |
|
Super member
![]() ![]() ![]() ![]() ![]() ![]()
|
As for listing with google, you can do it here http://www.google.com/addurl/?continue=/addurl |
![]() |
|
| insecure | Sep 11 2005, 01:09 AM Post #10 |
|
Elite member
![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
PulsarSL asks how to print non-printable characters. Of course, the answer is that you can't. But what you can do is print their code points. My program produces output (which I've wrapped in code tags for you) like this:
Here, the code point is given in decimal. This is followed by two characters in parentheses. These are EITHER a space and the printing character corresponding to the code point, OR the sequence !!, which indicates that no printing character is available. The next figures describe the frequency (first the absolute frequency, and then expressed as a percentage). The characters are displayed in descending order of frequency, so the most significant are at the top. By far the most common way of displaying non-printable characters, however, is to replace them with a single dot. Look at any hex dump, and you'll see this immediately. For example, here's how I got a hex dump of the object file corresponding to my C program (it's piped through head to keep the output down to a few lines):
(Ignore the first column, which is just a file offset, in hex.) As you can see, there are quite a few non-printable characters here, most of which are simply ASCII 0 (although there are a few others too). The hex dumper displays their hexadecimal value easily (always in two digits, since my system - like many desktop systems - uses 8-bit bytes). Then it gives the printable value if possible, but replaces with a dot if necessary. Personally, I prefer ciphertexts to be expressed in hexadecimal, sixteen bytes per line, with no file offset column and no print translation, e.g. as follows:
This provides a nice consistent way to express ciphertexts, irrespective of algorithm, and makes it easy to write parsers to read them. Also, you can see every single byte, irrespective of its value. If everybody here produced ciphertexts in this form, each of us would only need to write (or steal!) one ciphertext-parsing routine, which we could use for any challenge or exercise posted in this forum. That would leave us free to concentrate on the challenge or exercise itself. |
![]() |
|
| Donald | Sep 11 2005, 02:47 AM Post #11 |
|
Elite member
![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The problem as I see it, is that posting your challenge in hex will alienate some of the pen and paper cryptographers who aren't into programming. Ok not a lot, I mean it's just one more level of encryption, but it might scare off a few who are new to the entire thing. Give them a string of letters and they might be willing to try, give them a hex table and they are going to assume right off that it's too complicated for them. BUT, we can have the best of both worlds. It's pretty simple to write a program that will convert 'A' to 'Z' (or any ascii text) into hex. So unless your challenge has unusual characters in it, just post it with the ordinary character stream and we programmers (and others using such tools) can convert to hex as needed. Donald |
![]() |
|
| insecure | Sep 11 2005, 04:04 AM Post #12 |
|
Elite member
![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
It's a fair point. It seems to me that we can get the best of both worlds by deciding, when posting a ciphertext, how it can best be presented to maximise the chance that someone will be motivated to have a crack at it (if you will pardon the pun). If the domain of the algorithm is text-based, then it makes sense to present the ciphertext in letters. If it is not, then a hexadecimal presentation works better. For example, let's say I decide that my monoalphabetic substitution cipher is uncrackable (yeah, right!), or perhaps I am simply providing an exercise in monoalphabetic cracking. I could reasonably suggest that we are concerned only with letters of the alphabet, and display a ciphertext like this: FGV OL ZIT ZODT YGK QSS UGGR DTF ZG EGDT ZG ZIT QOR GY ZITOK HQKZN. (Here, I have retained the original word spacing, to make it a bit easier to crack. Sorry about ZIT, by the way - just luck. You shouldn't have too many problems decoding this - it's just an ordinary cryptogram, using simple letter substitution - but it's supposed to be an illustration, not a challenge!) Nobody is going to be put off from cracking this. But if we look at the challenges some people have posted here already, we will see that they contain bizarre characters such as that little y with the two dots over it (which I think is the Windows rendition of code point 255). Now, this might come as a shock to some people, but not all operating systems interpret and display non-ASCII characters in the same way. ASCII is a 7-bit code, so any code point > 127 is not, strictly speaking, an ASCII character, and so is not covered by the ASCII standard, and ASCII-based systems are free to render it in any way they like, or not at all (some systems just display a little box there instead, to mean "er, huh?"). Actually, not all computers even use ASCII! But I think it's fair to assume that people reading this forum will be on ASCII-based machines. When a character is not in the ASCII range, though, all bets are off. It is in these circumstances that a hexadecimal display makes sense. When you are designing your algorithm, it's a good idea to decide in advance whether the domain of the ciphertext "alphabet" will be strictly printable-ASCII. If so, then by all means let's display it as such. But if there are some characters that look "weird", that's a hint that we may need to present it in a way that is utterly unambiguous. To make this a bit more concrete, consider this code to encipher a plaintext using a one-time pad:
The input domain (for both the plaintext and the key) is A-Z, and the output domain is also A-Z. So it makes lots of sense to display the ciphertext in - ha! - "English". But the following version does not assume or require restrictions on either its input domain or its output domain:
This code could produce any old junk as its output, and to display it as ASCII would scare most pencil-and-paper junkies away very quickly, but a hex output would make a pencil-and-paper analysis much more feasible (provided the cryptanalyst had a firm grasp of the principle of XOR, of course!). So I propose a three-pronged approach: 1) If the output domain of the cipher is pure printable ASCII, display the ciphertext as ASCII: FGV OL ZIT ZODT YGK QSS UGGR DTF ZG EGDT ZG ZIT QOR GY ZITOK HQKZN. 2) If the output domain is ASCII, but you want to make it a bit harder, remove spaces (to eliminate word-length information), and then display the ciphertext in five-letter groups: FGVOL ZITZO DTYGK QSSUG GRDTF ZGEGD TZGZI TQORG YZITO KHQKZ N. 3) If the output domain is not restricted to pure printable ASCII, display the ciphertext in hexadecimal:
I think you and I are broadly in agreement here. Perhaps we should do a poll or something, to find out what most people think about this suggestion? |
![]() |
|
| Donald | Sep 11 2005, 04:18 AM Post #13 |
|
Elite member
![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Yes, I think we are. If your cipher has unusual characters in it, putting it up in hex seems a good idea. Donald |
![]() |
|
| PulsarSL | Sep 11 2005, 05:37 AM Post #14 |
|
Super member
![]() ![]() ![]() ![]() ![]() ![]()
|
Sounds good to me. |
![]() |
|
| PulsarSL | Oct 3 2005, 07:45 PM Post #15 |
|
Super member
![]() ![]() ![]() ![]() ![]() ![]()
|
I am reviving this thread from the dead to tell you that I'm nearly done with my simple frequency counter. More to come. |
![]() |
|
| 1 user reading this topic (1 Guest and 0 Anonymous) | |
| Go to Next Page | |
| « Previous Topic · General · Next Topic » |
- Pages:
- 1
- 2





![]](http://209.85.122.85/static/1/pip_r.png)



1:42 AM Nov 28