Threads vs Processes
5 answers - 1560 bytes -

Wed, 26 Jul 2006 10:54:48 -0700, Carl J. Van Arsdall wrote:
Alright, based a on discussion on this mailing list, I've started to
wonder, why use threads vs processes.
The debate should not be about "threads vs processes", it should be
about "threads vs events". Dr. John (creator of Tcl,
Professor of Comp Sci at UC Berkeley, etc), started a famous debate
about this 10 years ago with the following simple presentation.
That sentiment has largely been ignored and thread usage dominates but,
if you have been programming for as long as I have, and have used both
thread based architectures AND event/reactor/callback based
architectures, then that simple presentation above should ring very
true. Problem is, young people merely equate newer == better.
large systems and over time, thread based architectures often tend
towards chaos. I have seen a few thread based systems where the
programmers become so frustrated with subtle timing issues etc, and they
eventually overlay so many mutexes etc, that the implementation becomes
single threaded in practice anyhow(!), and very inefficient.
BTW, I am fairly new to python but I have seen that the python Twisted
framework is a good example of the event/reactor design alternative to
threads. See
.
Douglas Schmidt is a famous designer and author (ACE, Corba Tao, etc)
who has written much about reactor design patterns, see
"P Software Architecture, Vol 2", Wiley 2000, amongst
many other references of his.
No.1 | | 3008 bytes |
| 
mark wrote:
The debate should not be about "threads vs processes", it should be
about "threads vs events".
We are so lucky as to have both debates.
Dr. John (creator of Tcl,
Professor of Comp Sci at UC Berkeley, etc), started a famous debate
about this 10 years ago with the following simple presentation.
The school finds multiple lines of execution
unmanageable, while the Tannenbaum school finds asynchronous I/
unmanageable.
What's so hard about single-line-of-control (SLC) event-driven
programming? You can't call anything that might block. You have to
initiate the operation, store all the state you'll need in order
to pick up where you left off, then return all the way back to the
event dispatcher.
That sentiment has largely been ignored and thread usage dominates but,
if you have been programming for as long as I have, and have used both
thread based architectures AND event/reactor/callback based
architectures, then that simple presentation above should ring very
true. Problem is, young people merely equate newer == better.
Newer? They're both old as the trees. That can't be why the whiz
kids like them. Threads and process rule because of their success.
large systems and over time, thread based architectures often tend
towards chaos.
While large SLC event-driven systems surely tend to chaos. Why?
Because they *must* be structured around where blocking operations
can happen, and that is not the structure anyone would choose for
clarity, maintainability and general chaos avoidance.
Even the simplest of modular structures, the procedure, gets
broken. Whether you can encapsulate a sequence of operations in a
procedure depends upon whether it might need to do an operation
that could block.
Going farther, consider writing a class supporting overriding of
some method. Easy; we Pythoneers do it all the time; that's what
inheritance is all about. Now what if the subclass's version
of the method needs to look up external data, and thus might
block? How does a method override arrange for the call chain to
return all the way back to the event loop, and to and pick up
again with the same call chain when the I/ comes in?
I have seen a few thread based systems where the
programmers become so frustrated with subtle timing issues etc, and they
eventually overlay so many mutexes etc, that the implementation becomes
single threaded in practice anyhow(!), and very inefficient.
While we simply do not see systems as complex as modern DBMS's
written in the SLC event-driven style.
BTW, I am fairly new to python but I have seen that the python Twisted
framework is a good example of the event/reactor design alternative to
threads. See
.
And consequently, to use Twisted you rewrite all your code as
those 'deferred' things.
No.2 | | 4941 bytes |
| 
It seems that both ways are here to stay. If one was so much inferior
and problem-prone, we won't be talking about it now, it would have been
forgotten on the same shelf with a stack of punch cards.
The rule of thumb is 'the right tool for the right job.'
Threading model is very useful for long CPU-bound processing, as it can
potentially take advantage of multiple CPUs/Cores (alas not in Python
now because of GIL). The events will not work as well here. But note,
if there is not much sharing of resources between threads processes
could be used! It turns out that there are very few cases where threads
are simply indispensable.
The event model is usually well suited for I/ or for any large number
of shared resources occurs that would require lots of synchronizations
if threads would be used.
DBMS' are not a good example of typical large, so 'saying see DBMS use
threads -- therefore threads are better' doesn't make a good example.
DBMS are highly optimized, only a few of them actually manage to
successfully take advantage of the multiple execution units. could
as well cite a hundred of other projects and say 'see it uses an event
model -- therefore event models are better' and so on. Again "right
tool for the right job". A good programmer should know both
And consequently, to use Twisted you rewrite all your code as
those 'deferred' things.
Then, try re-writing Twisted using threads in the same number of lines
having the same or better performance. I bet you'll end up having a
whole bunch of 'locks', 'waits' and 'notify's instead of a bunch of
"those 'deferred' things." Debugging all those threads should be a
project in an of itself.
-Nick
bryanjugglercryptographer (AT) yahoo (DOT) com wrote:
mark wrote:
The debate should not be about "threads vs processes", it should be
about "threads vs events".
We are so lucky as to have both debates.
Dr. John (creator of Tcl,
Professor of Comp Sci at UC Berkeley, etc), started a famous debate
about this 10 years ago with the following simple presentation.
The school finds multiple lines of execution
unmanageable, while the Tannenbaum school finds asynchronous I/
unmanageable.
What's so hard about single-line-of-control (SLC) event-driven
programming? You can't call anything that might block. You have to
initiate the operation, store all the state you'll need in order
to pick up where you left off, then return all the way back to the
event dispatcher.
That sentiment has largely been ignored and thread usage dominates but,
if you have been programming for as long as I have, and have used both
thread based architectures AND event/reactor/callback based
architectures, then that simple presentation above should ring very
true. Problem is, young people merely equate newer == better.
Newer? They're both old as the trees. That can't be why the whiz
kids like them. Threads and process rule because of their success.
large systems and over time, thread based architectures often tend
towards chaos.
While large SLC event-driven systems surely tend to chaos. Why?
Because they *must* be structured around where blocking operations
can happen, and that is not the structure anyone would choose for
clarity, maintainability and general chaos avoidance.
Even the simplest of modular structures, the procedure, gets
broken. Whether you can encapsulate a sequence of operations in a
procedure depends upon whether it might need to do an operation
that could block.
Going farther, consider writing a class supporting overriding of
some method. Easy; we Pythoneers do it all the time; that's what
inheritance is all about. Now what if the subclass's version
of the method needs to look up external data, and thus might
block? How does a method override arrange for the call chain to
return all the way back to the event loop, and to and pick up
again with the same call chain when the I/ comes in?
I have seen a few thread based systems where the
programmers become so frustrated with subtle timing issues etc, and they
eventually overlay so many mutexes etc, that the implementation becomes
single threaded in practice anyhow(!), and very inefficient.
While we simply do not see systems as complex as modern DBMS's
written in the SLC event-driven style.
BTW, I am fairly new to python but I have seen that the python Twisted
framework is a good example of the event/reactor design alternative to
threads. See
.
And consequently, to use Twisted you rewrite all your code as
those 'deferred' things.
No.3 | | 232 bytes |
| 
[mark]
.
At my work, we started writing a web app using the twisted framework,
but it was somehow too twisted for the developers, so actually they
chose to do threading rather than using twisted's async methods.
No.4 | | 546 bytes |
| 
Thu, 27 Jul 2006 20:53:54 -0700, Nick Vatamaniuc wrote:
Debugging all those threads should be a project in an of itself.
Ahh, debugging - I forgot to bring that one up in my argument! Thanks
Nick ;)
Certainly I agree of course that there are many applications which suit
a threaded design. I just think there is a general over-emphasis on
using threads and see it applied very often where an event based
approach would be cleaner and more efficient. Thanks for your comments
Bryan and Nick, an interesting debate.
No.5 | | 995 bytes |
| 
mark wrote:
Wed, 26 Jul 2006 10:54:48 -0700, Carl J. Van Arsdall wrote:
Alright, based a on discussion on this mailing list, I've started to
wonder, why use threads vs processes.
The debate should not be about "threads vs processes", it should be
about "threads vs events".
Events serve a seperate problem space.
Use event-driven state machine models for efficient multiplexing and
fast network I/ (e.g. writing an efficient static HTTP server)
Use multi-execution models for efficient multiprocessing. No matter
how scalable your event-driven app is it's not going to take advantage
of multi-CPU systems, or modern multi-core processors.
Event-driven state machines can be harder to program and maintain than
multi-process solutions, but they are usually easier than
multi-threaded solutions.
: If your problem is one where event-driven state machines are
a good solution, Python generators can be a _huge_ help.