| Welcome to Crypto. We hope you enjoy your visit. You're currently viewing our forum as a guest. This means you are limited to certain areas of the board and there are some features you can't use. If you join our community, you'll be able to access member-only sections, and use many member-only features such as customizing your profile, sending personal messages, and voting in polls. Registration is simple, fast, and completely free. Join our community! If you're already a member please log in to your account to access all of our features: |
| A Base 26 Huffman Code; Just For Fun | |
|---|---|
| Tweet Topic Started: Apr 4 2014, 06:51 AM (570 Views) | |
| fiziwig | Apr 4 2014, 06:51 AM Post #1 |
|
Elite member
![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Huffman codes don't have to be binary. They can be worked out with any number base. Here an interesting base 26 Huffman code I worked out just for fun. I used "Moby Dick" (the Gutenberg ebook edition) as the source for n-graph frequencies and wrote a C++ Huffman code generator that can use any number base for the generated Huffman code. Use the longest plaintext string you can find in the tables, since this will result in the shortest ciphertext. So if the plaintext word is "sing" you have the choice of using (s->R) + (ing->E) giving "RE", or (si->SD) + (n->D) + (g->ZW) giving "SDDZW". Either one gives a valid encipherment, and either one can be correctly deciphered, but RE is the shorter ciphertext. The tables are based on trigraph, digraph, and single letter frequencies. Plaintext is lower case and ciphertext Huffman code is upper case.
On Edit: It occurred to me after I did this that I probably would have gotten better compression using base 62 with {a..z; A..Z; 0..9} as the symbol set. Maybe I'll re-run the program tomorrow and do it that way. Edited by fiziwig, Apr 4 2014, 07:02 AM.
|
![]() |
|
| novice | Apr 4 2014, 07:27 AM Post #2 |
|
Super member
![]() ![]() ![]() ![]() ![]() ![]()
|
I find this a very nice demonstration of Huffman encoding. It is thought-provoking, practical, clear, to the point and lacking in verbosity. It is not complicated by the idiotic use of a PRNG (if you have to include random numbers in a cipher then in my view the cipher concept is weak and worthless). It has some similarity to an expanded Syllabury cipher. One could imagine a 26x26 grid with 676 cells, each populated with a letter, digraph or trigraph in alphabetical order. One could also include numbers and punctuation marks. The row and column markers would, in the base case, run A to Z in order. In other cases the markers would be mixed alphabets that would serve as the two keys for the code. A good and enjoyable contribution Thank you.Edited by novice, Apr 4 2014, 08:26 AM.
|
![]() |
|
| fiziwig | Apr 4 2014, 10:00 PM Post #3 |
|
Elite member
![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
It would be fun to have a 26x27 syllabary table where every ciphertext syllable was pronounceable so that the enciphered message could be spoken. Like a "code talker" language. (The table needs to be 26 x 27 = 702 in case the last letter has no pair.) Lets see... Initials: b bl br bw ch d dr f fl fr g gl gr gw h j k kl kr kw l m n p pr pl r s sh sk sl sm sn sp spr st str sw t th tr tw v w y z Terminals: a au ay e ee i o oi u an en een in on un 47 initials (including no initial) times 15 terminals = 705 Syllables: a an au ay ba ban bau bay be bee been ben bi bin bla blan blau blay ble blee bleen blen bli blin blo bloi blon blu blun bo boi bon bra bran brau bray bre bree breen bren bri brin bro broi bron bru brun bu bun bwa bwan bwau bway bwe bwee bween bwen bwi bwin bwo bwoi bwon bwu bwun cha chan chau chay che chee cheen chen chi chin cho choi chon chu chun da dan dau day de dee deen den di din do doi don dra dran drau dray dre dree dreen dren dri drin dro droi dron dru drun du dun e ee een en fa fan fau fay fe fee feen fen fi fin fla flan flau flay fle flee fleen flen fli flin flo floi flon flu flun fo foi fon fra fran frau fray fre free freen fren fri frin fro froi fron fru frun fu fun ga gan gau gay ge gee geen gen gi gin gla glan glau glay gle glee gleen glen gli glin glo gloi glon glu glun go goi gon gra gran grau gray gre gree green gren gri grin gro groi gron gru grun gu gun gwa gwan gwau gway gwe gwee gween gwen gwi gwin gwo gwoi gwon gwu gwun ha han hau hay he hee heen hen hi hin ho hoi hon hu hun i in ja jan jau jay je jee jeen jen ji jin jo joi jon ju jun ka kan kau kay ke kee keen ken ki kin kla klan klau klay kle klee kleen klen kli klin klo kloi klon klu klun ko koi kon kra kran krau kray kre kree kreen kren kri krin kro kroi kron kru krun ku kun kwa kwan kwau kway kwe kwee kween kwen kwi kwin kwo kwoi kwon kwu kwun la lan lau lay le lee leen len li lin lo loi lon lu lun ma man mau may me mee meen men mi min mo moi mon mu mun na nan nau nay ne nee neen nen ni nin no noi non nu nun o oi on pa pan pau pay pe pee peen pen pi pin pla plan plau play ple plee pleen plen pli plin plo ploi plon plu plun po poi pon pra pran prau pray pre pree preen pren pri prin pro proi pron pru prun pu pun ra ran rau ray re ree reen ren ri rin ro roi ron ru run sa san sau say se see seen sen sha shan shau shay she shee sheen shen shi shin sho shoi shon shu shun si sin ska skan skau skay ske skee skeen sken ski skin sko skoi skon sku skun sla slan slau slay sle slee sleen slen sli slin slo sloi slon slu slun sma sman smau smay sme smee smeen smen smi smin smo smoi smon smu smun sna snan snau snay sne snee sneen snen sni snin sno snoi snon snu snun so soi son spa span spau spay spe spee speen spen spi spin spo spoi spon spra spran sprau spray spre spree spreen spren spri sprin spro sproi spron spru sprun spu spun sta stan stau stay ste stee steen sten sti stin sto stoi ston stra stran strau stray stre stree streen stren stri strin stro stroi stron stru strun stu stun su sun swa swan swau sway swe swee sween swen swi swin swo swoi swon swu swun ta tan tau tay te tee teen ten tha than thau thay the thee theen then thi thin tho thoi thon thu thun ti tin to toi ton tra tran trau tray tre tree treen tren tri trin tro troi tron tru trun tu tun twa twan twau tway twe twee tween twen twi twin two twoi twon twu twun u un va van vau vay ve vee veen ven vi vin vo voi von vu vun wa wan wau way we wee ween wen wi win wo woi won wu wun ya yan yau yay ye yee yeen yen yi yin yo yoi yon yu yun z za zan zau zay ze zee zeen zen zin zo zoi zon zu zun Then you could scatter those syllables around in the table in some agreed-upon order and you might end up with:
Completely impractical, and too easy to crack, but fun anyway. |
![]() |
|
| mok-kong shen | Apr 5 2014, 03:12 PM Post #4 |
|
NSA worthy
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Maybe I misunderstood your posts in this thread, but anyway IMHO the following could be useful in practice: In 7-bit ASCII there are 95 printable characters (or 94 if space is excluded, for discussion purposes we take 95). that is, if the plaintext alphabet, e.g. lower-case, has 26 characters, there are 69 characters left for coding the more frequent trigrams and bigrams (if desired also some frequent phrases and sentences in special applications). One maps the trigrams and bigrams in a given plaintext as far as possible, leaving the rest as it is. Note that this mapping could, but need not, be a secret one, since the (main) purpose is compression. Subsequently one does an encryption with whichever encryption scheme one prefers. (What one has to do to get optimal performance is to determine which specific bigrams and trigrams are to be considered in the mapping.) I like to learn critiques, if any, to the above scheme. Edited by mok-kong shen, Apr 5 2014, 03:13 PM.
|
![]() |
|
| 1 user reading this topic (1 Guest and 0 Anonymous) | |
| « Previous Topic · General · Next Topic » |





![]](http://z2.ifrm.com/static/1/pip_r.png)



Thank you.
7:27 PM Jul 11