Welcome Guest [Log In] [Register]
Welcome to Crypto. We hope you enjoy your visit.


You're currently viewing our forum as a guest. This means you are limited to certain areas of the board and there are some features you can't use. If you join our community, you'll be able to access member-only sections, and use many member-only features such as customizing your profile, sending personal messages, and voting in polls. Registration is simple, fast, and completely free.


Join our community!


If you're already a member please log in to your account to access all of our features:

Username:   Password:
Add Reply
Has anyone tried parallel processing?
Topic Started: Jul 25 2009, 12:57 AM (168 Views)
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
Has anyone tried to use parallel processing in their crypto programs? Until today, I never had a multi-cpu machine, so I never had. Today I swapped out my old 1.2GHz AMD Sempron for a 3.0GHz AMD Phenom II Quad-core.

You may remember my mentioning my cryptarithm brute-forcer. I decided to use it as a benchmark, to see how much I would gain. The original program took 190 seconds to crack the first nine cryptarithms from the 2009-JA Cm. I knew my new chip would be faster. But I also knew my new chip would be quad-core. So I rewrote the app to split the search range into four sections, running each in a different thread. That ran, on my old machine, in 230 seconds. Slower, due to the increased overhead.

Today, I'm running on my new machine. The original program ran in 65 seconds - almost three times faster. The threaded program ran in 20 seconds. More than three times as fast as the unthreaded version, and nearly ten times as fast as the original on the old hardware.

Clearly, there is significant benefit from using parallel algorithms.

Seems to me that most of the heuristic algorithms could easily be parallelized. Hill-climbers, simulated annealing, Osric's chunk, most of these would benefit significantly from running multi-threaded.

Anyone tried it?
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post Goto Top
 
Revelation
Member Avatar
Administrator
[ *  *  *  *  * ]
I've never tried it, but it should run very fast. I think the goal is to try to minimize shared memory access, because mutex objects slow things down.
RRRREJMEEEEEPVKLWENFNVJKEEEEEAOLKAFKLXCFZAASDJXZTTTTTTTLSIOWJXMOKLAFJNNKFNXN
RAGRBAQEMHIGDJVDSEOXVIYCELFHWLELJFIENXLRATALSJFSLCYTKLASJDKMHGOVOKAJDNMNUITN
RRRRLJVEEEEECLYVYHNVPFTAEEEEEMWLMEIRNGLARWJAKJDFLWNTIERJMIPQWOTZEOCXKNUBNXCN
RJIRPOWEANFUSNCZVDVZNMSFEKLOEPZLDKDJWSAAAAAAAOERHJCTNCKFRIMVKSOFOMKMANREWNBN
RZUDRGXEEEEENFQIDVLQNCKNEEEEEDGLLLLLLAWIOSNCDARLODMTOEJXMILDFJROTKJSDNLVCZNN
Offline Profile Quote Post Goto Top
 
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
Seems to me that in a hillclimber there isn't much that needs to be shared. In my brute-forcer, the threads are entirely independent, except for reporting success.

In my original version, I used a recursive method for walking the permutations, as I had seen you do in your transposition cracker. I didn't see an easy way to partition that into separate threads, so I went looking. I found a method for mapping the transpositions of a set of size N into the integers from one to N-factorial. So the outermost permute() call in my cryptarithm solver was replaced by a for loop from 0 to 10-factorial-1, decoding each int into a permutation then testing that.

The single for loop was easy to partition into four subsets, for four separate threads.

Most computers sold, these days, are multi-core. It seems a waste not to take advantage of the extra cores.
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post Goto Top
 
osric
Advanced Member
[ *  *  * ]
Quote:
 
Clearly, there is significant benefit from using parallel algorithms.


Interesting stuff.

I would very much like to hear how you program to take advantage of a multicore processor. Could you tell us something about this, preferably in language that non-professional programmers like me will be able to follow. Could you also tell us a bit about how to choose the strategy to follow -- for example would you run the application in parallel through each of the cores, partitioning the job so that in a 4-core chip, the perms to be examined are divided into 4 parts. Or alternatively, would different parts of the app be carried out with different cores?


Other questions in my mind: do you need a particular programming language or will Java or C++ do the job? What does a 'multi-core' program look like for a simple application? Are there tutorials available on the Web? Do you recommend a particular book to help the beginner?




Offline Profile Quote Post Goto Top
 
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
Quote:
 
I would very much like to hear how you program to take advantage of a multicore processor. Could you tell us something about this, preferably in language that non-professional programmers like me will be able to follow.

Parallel programming made easy?

I'm reminded of a test question I saw once on what should happen if you asked whether an unsigned int was less than a signed int, in ANSI C. The C standard mandated some very non-intuitive behavior. Answer A was the way about 70% of the existing compilers did it, answer B was the way about 30% of existing compilers did it, answer C was the way the ANSI committee had said it should be done (which differed from pretty much every compiler in existence did it), and answer D was "if you don't write code like this, it doesn't matter".

The correct answer was, of course, D.

There are inherent complexities in working with multi-threaded programming. But if you're a working programmer trying to solve a problem, you can adopt some simple patterns that work, and avoid them.

Quote:
 
Could you also tell us a bit about how to choose the strategy to follow -- for example would you run the application in parallel through each of the cores, partitioning the job so that in a 4-core chip, the perms to be examined are divided into 4 parts. Or alternatively, would different parts of the app be carried out with different cores?

Generally, you don't schedule the different threads. You simply let the operating system schedule them, along with everything else that is going on in the machine. On a multi-core machine, a multi-core-aware OS will schedule threads across the available processors.

What you do need to do is to find a way to partition the problem into separate pieces. For certain problem sets, this is easy, for others, it's not so clear.

Think about a hill-climber. It runs in a loop, picking a random key and then testing it and its neighbors. If you opened four command-line windows, and ran four copies of it, on a single-core machine they'd be sharing CPU time, so you'd be no better off than if you ran one. But if you were on a quad-core machine, they'd be scheduled across the four separate processors, and you'd be testing four times as many keys at once. Of course, they'd not be aware of each others top scores, so one might be reporting keys that another would have rejected, but that's just a matter of some extra output.

You could do the same thing with a dictionary attack - split the dictionary into four parts, run four copies of your dictionary attack program, and run through the entire dictionary in a fourth the time.

What I did was find an algorithm that would allow me to partition the permutations of a set into four parts. My original cryptarithm cracker used the University of Exeter algorithm for generating permutations. This is a recursive algorithm that isn't easy to partition into subsets.

Looking around, I found an algorithm on Wikipedia for encoding and decodinjg a permutation into an integer. I could then, instead of making recursive calls to the Exeter permute() function, simply loop through the integers between 0 and n!-1, convert each into a permutation, and then test that. It was easy, then, to convert the program to run four loops, from 0 to x-1, x to 2x-1, 2x to 3x-1, and 3x to 4x-1, where x = n!/4. And then to run each loop in a separate thread.

Quote:
 
Other questions in my mind: do you need a particular programming language or will Java or C++ do the job? What does a 'multi-core' program look like for a simple application? Are there tutorials available on the Web? Do you recommend a particular book to help the beginner?

What you need to do threads is a way to run threads and a way to arbitrate access to shared memory. The difference between running a single-threaded hillclimber four times and a four-threaded hillclimber is that in the latter the threads have access to each others memory, so they can test against the best scored key that any of them have found, and they can have a single output stream. But that means that you have to make sure that they don't screw each other up by changing variables.

C and C++ don't have threading functionality in their language standard, but any reasonable C or C++ compiler will have an API that provides access to the underlying operating system's threading functionality. I was using g++ on Linux, so I used the Posix threads library, which is likely to exist on any platform that is capable of multi-threading. Other languages have threading capabilities defined as part of their standard libraries. C#, Java, and Python all have threading defined as a part of the language.

From the top level, a thread is either a function that you pass to a thread-create function, or a class derived from a Thread class that has a special run() method that is executed when you tell the thread to start. Within the function, coding is done normally. Each thread has its own program counter and its own stack, which means that its flow of execution is independent from any other threads, and so are its local variables. Global variables, OTOH, and variables that are accessed via references from outside the thread, are very different. There is only one copy of them that is shared with the other threads, and this leads to the possibility of race conditions.

Consider a multi-threaded hillclimber. It keeps, in a variable that is shared across threads, the best score found so far, and the key that found it. Suppose one thread finds a key with a better score. It writes both the score and the key to these shared variables. Suppose then, that while the first thread is writing , a second thread tries to read the score and the key. It might read the new high score, and then the old value of the best key. Or even worse, it might read the first half of the new best key and the second half of the old best key. The second thread will then continue on with incorrect and inconsistent data, and do who knows what. (And the really aggravating thing is that situations like this - called "race conditions" happen only infrequently, unpredictably, and are very difficult to recreate, which makes debugging them a royal pain.)

The solution is to create some sort of lock, which will keep other threads from accessing the shared variables. Any thread that wants to read or write the variables would obtain the lock. If it gets it, it makes the changes it wants and then frees the lock. If it doesn't, it waits until the lock is free and then it gets the lock, and makes its changes.

Any threading API will include these locking mechanisms. They are normally written so that the thread that is asking for a lock is put to sleep until the lock is available, so you don't need to waste code and CPU time looping and testing to see if the lock is free yet.

Now here's where it gets complicated. If you have a lot of threads, that share a great many variables, and you use just one lock to control access to all of them, you have simple and foolproof code, but that doesn't run very efficiently. If nearly everything a thread needs to do requires that one lock, then every thread is going to spend most of its time waiting for that single lock, and there will be very little parallelism going on.

The answer to that is to create finer-grained locks. Create a separate lock for each variable. A thread is only going to lock the variables that it needs to access, and other threads that don't need those variables can run unhindered. But now a thread needs multiple locks in order to work, and other threads need multiple locks in order to work, and now we can have situations where one thread gets lock A and tries to get lock B, while another thread already has lock B and is trying to get lock A. Each has one lock, and is waiting for the other to be released. This is called a "deadlock", and it's the other nasty possibility that multi-threaded programming allows.

There has a great deal of theoretical work, on this, going all the way back to Dijsktra and his Dining Philospher's Problem. Finding the right balance between locking on too coarse a grain, or on too fine a grain, and to prevent both race conditions and deadlocks while providing a large degree of concurrency is anything but a simple problem.

The great thing, though, is that in this problem area it's not really a problem. Heuristic searches involve multiple calls to routines that are nearly independent of each other. My cryptarithm brute-forcer has no data that is shared between threads - each simply prints any success. A dictionary attacker would likely be the same. Any of the hillclimbing variants would only share the best score and keys between threads, which could be locked and freed without reducing concurrency to any noticeable degree.

So, in this area, multi-threading looks easy, and with multi-core processors nearly everywhere, I was wondering if anyone had done any of it.
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post Goto Top
 
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
I'm going to throw out one more thing - just because a language has a threading facility does not mean that it's actually using independent operating-system threads. It may well be that the language's threading implementation is simulating threads internally. The Java language specification explicitly allows this, since Java may well be implemented on platforms that do not have support for threads at the OS level.

Truth is most of what programmers use threads for doesn't require separately-scheduled operating-systems threads. The most common use for threads is to allow a user interface to remain responsive while long calculations are going on. This doesn't require operating system threads.

Unfortunately, without separately scheduled operating-system threads, we get no performance gain from running multiple threads.

I bring this up because I was trying to write up a simple example, and discovered that at least in the version 2.4 implementation I'm running on my Centos 5.2 Linxu box, Python's threads are not independently scheduled, and hence aren't of any use to us. (That is, splitting a task into four Python threads doesn't speed it up, because all four threads run in the single Python interpreter process, and that is single-threaded from the operating system's point of view, and runs only on a single CPU.)


====

A later addition. Python 2.6.2 has a multiprocessing module, that provides constructs for running code in multiple parallel instances of the Python interpreter, which would scale across multiple CPUs. Python does not, and apparently will not, scale well with multiple threads - the lead author claims that running multiple processes is a better approach. Unfortunately, the version of Linux I'm running includes Python 2.4, so I can't provide any examples of how to work with it.

Edited by jdege, Aug 1 2009, 10:34 PM.
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post Goto Top
 
osric
Advanced Member
[ *  *  * ]
Python is too slow to interest me when it comes to applications requiring speed, but I would like to see a Java program that uses more than one thread or core.

Can you show us such a Java program that does something simple, as an example? A picture is worth a thousand words. Let one thread print out 'A' every 10 million iterations and the other 'B' or something simple like that. I want to see how the program is structured and what the commands are.

Also can you respond to my query on recommended Web tutorials or a suitable book for the non-pro programmer?
Offline Profile Quote Post Goto Top
 
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
Quote:
 
Python is too slow to interest me when it comes to applications requiring speed, but I would like to see a Java program that uses more than one thread or core.

It does seem a bit silly to try to get a 4x speed up by using mult-threading in Python when you could get a 10x speedup from using Java or a 100x speedup by using C.

Code:
 
Can[space]you[space]show[space]us[space]such[space]a[space]Java[space]program[space]that[space]does[space]something[space]simple,[space]as[space]an[space]example?[space]A[space]picture[space]is[space]worth[space]a[space]thousand[space]words.[space]Let[space]one[space]thread[space]print[space]out[space]'A'[space]every[space]10[space]million[space]iterations[space]and[space]the[space]other[space]'B'[space]or[space]something[space]simple[space]like[space]that.[space]I[space]want[space]to[space]see[space]how[space]the[space]program[space]is[space]structured[space]and[space]what[space]the[space]commands[space]are.

It's been quite a while since I did much Java, and I don't have a current Java implementation installed on my machine.

Quote:
 
Also can you respond to my query on recommended Web tutorials or a suitable book for the non-pro programmer?

Not really. I took a semester-long class on operating system internals, years ago, that spent quite a bit of time on how people were handling concurrency issues, deadlock detection and prevention, race conditions, and the like. I've never tried to teach it to anybody, and I have no idea as to what books or online tutorials might be worthwhile.

You might start with Sun's tutorial on concurrency:

http://java.sun.com/docs/books/tutorial/essential/concurrency/
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post Goto Top
 
osric
Advanced Member
[ *  *  * ]
Quote:
 

You might start with Sun's tutorial on concurrency:

http://java.sun.com/docs/books/tutorial/essential/concurrency/


That's a useful suggestion -- thank you. I note, for anyone else who is interested, there's a recommended reading list at:

http://java.sun.com/docs/books/tutorial/essential/concurrency/further.html

A few comments on the potential usefulness of multicore programming to solving cryptograms.

First of all Brute force algorithms are relatively very slow and so are to be avoided. There's little point in seeking speed by multithreading a brute force program when speed can be increased with a much faster algorithm. For example cryptarithms are solved by hillclimbing in a fraction of the time taken by brute force, as I have described in a May 24th 2009 posting at

http://s13.zetaboards.com/Crypto/topic/123878/3/

When keys are long brute forcing is out of the question. So it's a good idea to get away from Brute force and into Hillclimbing, a Genetic algorithm or, best of all, Simulated Annealing or one of its derivatives.

Would multi-threading speed up such stochastic algorithms? Yes, from what I have read, I think so. With these algorithms you are waiting for a randomly generated key to turn up that can be transformed into the correct key. The more randomly generated keys you can process in a given time the quicker the solution will appear, so one can envisage the benefit of four parallel hillclimbers running in a 4-core machine. However stochastic algorithms generally solve single-key ciphers so quickly that such an increase in speed might be of little practical usefulness.

For multi-key ciphers the situation is different, and here I am thinking of Two Square, Four Square, Digrafid and TriSquare. With two, or three, keys to recover there is exponentially more work to be done and it is here that I would guess the multi-thread program would be of practical benefit.

Many thanks to jdege for raising this topic. If anyone else has experience in this field it would be very interesting to hear from them.

Quote:
 
It does seem a bit silly to try to get a 4x speed up by using mult-threading in Python when you could get a 10x speedup from using Java or a 100x speedup by using C.


I would put the C++/Python speedup at closer to 20. On my Mac pro, C++ and Java programs run at the same speed. That was also my experience on my PC running Windows XP. A colleague got slightly faster speeds from Java than C++ on his Apple mini.

Edited by osric, Aug 2 2009, 08:54 AM.
Offline Profile Quote Post Goto Top
 
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
The only two places I've ever used brute force is on Caesar shifts and cryptarithms. There's no question that even a dumb hill-climber would be faster. But when the key space is small, it's not always necessary.

As for the relative speed of different languages, that depends a great deal on what you are doing, and of the performance of the particular compiler or interpreter you are using. One language might be ten times as fast as the other at numeric processing but only five times as fast at string processing.

I wrote a simple timing comparison, a few weeks ago - calculating fibonacci numbers using the same recursive descent algorithm in awk, python, java, and c. As a test, it mostly measures how long it takes to perform function calls.

Calculating the 32nd fibonacci number in awk took 2.269 seconds, in python took 1.987 seconds, in java took 0.097 seconds, and in c took 0.012 seconds.
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post Goto Top
 
osric
Advanced Member
[ *  *  * ]
Quote:
 
The only two places I've ever used brute force is on Caesar shifts and cryptarithms. There's no question that even a dumb hill-climber would be faster. But when the key space is small, it's not always necessary.



Most people who use a computer to solve cryptograms try and be a bit smart about it and to learn from others -- that precludes using other peoples' programs and leads to a bit more adventure than Brute force. We shouldn't, in my view, be encouraging people to go down this route. Thus the reason for my comments.

Quote:
 

Calculating the 32nd fibonacci number in awk took 2.269 seconds, in python took 1.987 seconds, in java took 0.097 seconds, and in c took 0.012 seconds.


I thought this discussion group was about Cryptology :lmao:

But anyway I have just run a program to calculate the 32nd Fibonacci and the time taken is exactly the same in C++ as in Java -- 1.4 seconds for 10 million cycles, each of which calculates up to the 32nd Fibonacci, or 0.14 * 10^-6 secs per calculation of 32 Fibonaccis. So I stand by my contention that C++ and Java run at the same speed. My timings are a tiny fraction of what you quote -- don't tell me you have included the compile time!





Edited by osric, Aug 2 2009, 01:51 PM.
Offline Profile Quote Post Goto Top
 
jdege
Member Avatar
Elite member
[ *  *  *  *  * ]
As for calculating fibonacci numbers, I intentionally used an inefficient algorithm - my goal wasn't to calculate them efficiently, but to measure how each language performed in handling deeply-nested recursion.

Code:
 
def[space]fib(n):
[space]if[space](n[space]<=[space]1):
[space][space]return[space]n

[space]return[space]fib(n-1)[space]+[space]fib(n-2)


As I said, this measures the expense of function calls, not much else. There are far more efficient implementations.

Code:
 
def[space]fib(n):
[space][space][space][space]a,[space]b[space]=[space]0,[space]1
[space][space][space][space]for[space]i[space]in[space]range(n):
[space][space][space][space][space][space][space][space]a,[space]b[space]=[space]b,[space]a[space]+[space]b
[space][space][space][space]return[space]a


Running this algorithm one million times took ..665 seconds in Python. .098 seconds in Java, and .039 seconds in C.

Clearly you have a more highly-optimized JVM than I do. And in mine, at least, function calls are much more expensive, compared to C, than is looping. That doesn't surprise me a great deal.

And as for brute force, I think that understanding how to generate a complete set of permutations is a worthwhile exercise for anyone interested in cryptography.
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Offline Profile Quote Post Goto Top
 
1 user reading this topic (1 Guest and 0 Anonymous)
« Previous Topic · General · Next Topic »
Add Reply