Threads (CPU not BBS)!

category: code [glöplog]

What's the simplest way to achieve a good performance with threads?

for(n_things)

for(n_processors)

Creating a whole bunch of threads for each thing you want (particles for instance) and expect the OS to be awesome, or creating a thread that does 1/n_processors of the things you want done (half of the particles for each thread, a thread per processor).
Don't think of particles as real particles.

added on the 2010-03-10 02:23:58 by xernobyl

Or any other solution... OpenMP doesn't count.

added on the 2010-03-10 02:25:23 by xernobyl

Theres an overhead associated when using threads, if you're using more threads in software than available in hardware then the OS needs to swap them in and out of the CPU, which takes time (context switching). So as a rule of thumb, if you have 4 cores split the work up into 4 threads, giving a 1/4 of the job to each thread, this keeps the CPU busy doing the actual work rather than faffing around switching back and forth.

Generally, you'd want to create a few seperate "worker" threads that you can offload tasks onto, these would be created at the start of your program, destroyed at the end and reused for different tasks encountered during execution - be it particles, physics or whatever. OpenMP is good for retrofitting old single threaded code to gain some benefit on multicore chips, but if you're starting a program from scratch you may want come up with some sort of task queue/manager (though having said this I've never actually used OpenMP only read about it).

But then it does depend entirely on what you're coding and how complicated you want to make your life. If all else fails, google.

added on the 2010-03-10 08:01:42 by xwize

In practice I've never used more the four threads. More then say three and it Introduces more possibilities race-condition bugs, which I never like. If you want to use lots of threads in your demo I'd perhaps sticking to your main thread for drawing and then another for sound.

added on the 2010-03-10 08:58:32 by sigflup

Also, depending on your os there probably will be a delay between threads. On bsd for instance pthreads are by default in user-space. Which means you don't see what one thread does to memory until a thread-kernel context switch which is usually about 50us. Syncing off multiple threads could be tricky- stick to syncing off of one.

added on the 2010-03-10 09:04:57 by sigflup

I'd say, the choice depends a lot on what you're doing.

I recently benchmarked a build of my site (which is a makefile/php setup that generates static pages I upload - silly, yes, but beside the point), and with my i7 with 4 cores and 8 hardware threads, the best results came from about 21 concurrent builds.

Up to about 6 builds at a time the speedup was linear, and slowed down drastically afterwards. Launching as many builds concurrently as possible was much slower than the 21 concurrent builds setup.

Results can be checked from my site (http://iki.fi/sol), scroll down a bit until you see a graph.

added on the 2010-03-10 09:53:47 by sol_hsa

Anyway, my point was: you won't know what the best result is without testing.

added on the 2010-03-10 09:56:31 by sol_hsa

if DoThing is a lengthy process, i'd say the first one. your machine is bound to do context swapping anyways due to its multitasking environment :)

added on the 2010-03-10 11:28:46 by rasmus/loonies

like others said :

- creating more threads than the number of cpus you have on your machine will not make the process faster.

one exception : if the threads are not using cpu at 100% but also do other stuff that make them wait (like I/O, network, ... ) then having more threads than number of physical cpu *may* have a benefit.

- having lots of threads (ex 1500 threads) consume a lot of cpu just to manage the threads (context-switching), and also lot of memory (because each thread have its own stack). so while creating more threads than needed wont give benefit, it will also make things worse.

- creating/destroying threads is expensive (at least on win9x). if you need to distribute some jobs (received asynchronously) to threads, you better use a thread pool (google for it) instead of creating/destroying a specific thread each time you receive a job.

added on the 2010-03-10 17:00:18 by Tigrou

Are you thinking what I'm thinking? If it's thread-per ray raytracing then you and I are very similar. I imagine that would crush your machine

added on the 2010-03-10 19:25:26 by sigflup

http://msdn.microsoft.com/en-us/library/ms682661(VS.85).aspx should I bother learning the difference between threads and fibers? :D

added on the 2010-03-10 20:08:33 by xernobyl

yes you should -- they can be quite useful.

added on the 2010-03-10 20:10:14 by superplek

just use a task pool. it's not much code and very easy to use.

also, wtf:

Quote:

More then say three and it Introduces more possibilities race-condition bugs, which I never like.

now this is just BS. the likelihood of race conditions has absolutely nothing to do with the number of threads whatsoever, it only depends on how the threads communicate.

if all of your shared data is immutable (after initialization), there are no race conditions, no matter how many threads are running. if you have tons of shared data with modifications all over the place, two threads are enough to make your life hell.

the code doesn't matter. it's all about your data structures. well-designed parallel apps use a small number of well-defined communication channels between threads. badly designed parallel apps just don't work.

Quote:

On bsd for instance pthreads are by default in user-space. Which means you don't see what one thread does to memory until a thread-kernel context switch which is usually about 50us. Syncing off multiple threads could be tricky- stick to syncing off of one.

and more BS. how the OS chooses to implement threads doesn't make the slightest difference with regards to memory ordering. it all depends on the HW no matter what. and if you do any significant amount of synchronization between two threads, there's no point splitting the task into two threads in the first place - they'll just wait for each other to finish anyway.

added on the 2010-03-10 20:26:45 by ryg

what ryg said. immutable is a keyword for multithreading. try to initialize what you can do in the first place before spawing in the wild ;)
there are for sure cases when more threads than cores makes sense. one is responsibility f.e. for network-communication. since this is a demo(coding)-bbs in the first place (hm, all these random ... threads seem even more strange for me now *G*) just cut your tasks in pieces and think about to execute them parallel if possible. keep the communication channels between them as short as possible. like don't aquire a critical section and then iterate a list of dozens of items and copy just a few. do the copy work first and inside the section just hand over the resulting data.

added on the 2010-03-10 21:11:13 by Danzig

the whole reason i stopped reacting to threads like these is that even if i go indepth, ryg wll find a way to top it :)

*slaps ryg*

added on the 2010-03-10 21:46:49 by superplek

That's why I wait for him to post and say he is right ;) *slaps /self*

added on the 2010-03-10 22:22:47 by Danzig

plek, i have to express my sincere concern about that most reprehensible lazy attitude you seem to be having! tut-tut-tut...

added on the 2010-03-10 23:04:10 by ryg

And the winner is... the hypnoryg! *slap* *slap* °o° All glory to the hypnoryg!

added on the 2010-03-10 23:10:59 by xTr1m

Let n_things = n_processors ... problem solved.

added on the 2010-03-11 13:24:53 by PulkoMandy

in a demo, you will usually want results every frame, so you can have a hard sync point every frame. just make sure that during that frame, you never yield to the operating system. code your own critical section. i would yield to the OS once and only once per frame, to save power and be friendly to other processes. that will cost a millisecond or so, but it's worth it.

so if you have 4 hardware threads (cores), you should create

* one main thread,
* three worker threads to occupy the remaining cores
* one thread for sound and more for network or streaming if you need that.

the main thread should start the worker threads as soon as possible to work on a list of parallel jobs. then the main thread can do all the work that is not parallel or thread safe. after that, the main thread will join the worker threads to do the parallel jobs. when all parallel jobs are done you synchronize everything and start again.

added on the 2010-03-11 13:58:54 by chaos

plek: I, for one, enjoyed your in-depth responses in this thread. Carry on, good soldier!

added on the 2010-03-11 14:37:28 by sagacity

oh just stfu already sag :)

added on the 2010-03-11 16:57:39 by superplek

chaos: you mean the sound and networking on yet another separated thread, not the ones for the other cores?

added on the 2010-03-11 17:20:26 by xernobyl

Quote:

now this is just BS. the likelihood of race conditions has absolutely nothing to do with the number of threads whatsoever, it only depends on how the threads communicate.

How I have been using threads it has been a problem every now and then- maybe you're code is bug free every time mate. With a single thread encountering a race condition is way more unlikely, so it's worth a mention.

Quote:

and more BS. how the OS chooses to implement threads doesn't make the slightest difference with regards to memory ordering. it all depends on the HW no matter what. and if you do any significant amount of synchronization between two threads, there's no point splitting the task into two threads in the first place - they'll just wait for each other to finish anyway.

I'm not talking about memory ordering mate. Just that on bsd with programs linked against gnu pthreads, as I have actually observed, one thread isn't going to see what the other thread did to memory until the thread kernel switches context unless you have a mutex or a semaphore or some other synchronization means. If you link against rthreads you have different behavior- software implementation does mater.

added on the 2010-03-11 18:09:30 by sigflup

xernobyl: sound is best kept in it's own thread. sometimes your engine might stutter, like when switching from one scene to another. having music in another thread is the safest way to ensure that you never have dropouts.

sigflup: thread synchronization and race conditions are a serious problem. i try to avoid multithreading as much as possible because it is so simple to fuck up in a way you can't possibly debug.

the whole beauty of a task pool engine, or job scheduler, is that once you split things up into independent jobs, you don't have to think anymore about synchronization. all you need is a general purpose, well debugged engine, that will be around 500 lines of really tricky code.

The key insight is that multithreading is never something you do because it's good software engineering, you never want to abstract away tasks into threads because that a good design pattern. it's the worst thing you can do if you want to write simple, robust and efficient code. you do it because you want to put those extra cores to work, a factor of 4 or 8 in performance, loosing simplicity, robustness, and efficiency.

added on the 2010-03-12 01:42:30 by chaos

pouët.net

Threads (CPU not BBS)!

login