gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

DS development > Parallel Processing, how do i do it?

#155081 - TheMagnitude - Thu Apr 24, 2008 8:12 pm

I have looked around the forums and searched for parallel processing and I keep coming across phrases such as SQRT_BUSY and stuff but I dont fully understand it.

I am computing the mandelbrot set and was wondering how I would use parallel processing to speed up my calculations?

#155082 - Lino - Thu Apr 24, 2008 8:13 pm

You can use the swp opcode for interlock operation.

#155084 - silent_code - Thu Apr 24, 2008 8:20 pm

<jokingly>
wifi input data to a pc, let it process it, display the received data, repeat if necessary. ;^)
</jokingly>

if you're taking part in the contest: good luck!

#155094 - TheMagnitude - Thu Apr 24, 2008 11:42 pm

Lino wrote:
You can use the swp opcode for interlock operation.


Searched for "swp" and "swp opcode" in google, and gbadev.org and in the devkitpro directory and found nothing helpful.

How am I supposed to find out about these things? how do other people come to know them?

#155096 - Dwedit - Fri Apr 25, 2008 12:02 am

To do parallel processing:
* First you need to make your code for both the ARM7 and ARM9. If you are using just an ARM9 project, look at the ARM7+ARM9 combined example for how to set up a project that way.
* Then you probably need an IPC FIFO library so that you can have the ARM9 send messages to and from the ARM7. Doing things this way is nice because you can have one processor interrupt the other. Dekutree64 made one, and it's available in this thread.
* Remember that main memory (4MB) is shared among both processors, but the ARM9 also caches that memory. If you need to be sure that you are fetching the memory and not just cached values, either flush the cache, or use a different memory area. There are also mirrors of main memory which are not cached.
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."

#155119 - Lino - Fri Apr 25, 2008 4:53 am

You can search lock xchg is very similar, i hope :D Im not sure if it works well with 2 cpus.

On ARM pdf there is a simple example with semaphores where it is used.

OMG 60000 processes.


Last edited by Lino on Fri Apr 25, 2008 7:33 am; edited 3 times in total

#155123 - nanou - Fri Apr 25, 2008 6:50 am

What do you hope to accomplish with parallel processing?

I'm currently working on a project that has a highly parallelized representation. There are over 6000 individual "processes", which I time share. They're all handled in serially, pausing after small segments of work to do the usual nonsense, and then pick up where it left off. If I'd tried to implement a proper parallel environment for it all to work in the overhead would absolutely kill me.
_________________
- nanou

#155125 - Dwedit - Fri Apr 25, 2008 7:20 am

Read the first post, the task to parallelize is Mandelbrot set generation. Using the ARM7 in addition to the ARM9 gives you an extra processor half as fast as the main processor.

Maybe you could take a look at Fractint's source code for some ideas on how to optimize the code.
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."

#155127 - nanou - Fri Apr 25, 2008 8:38 am

Dwedit wrote:
Read the first post, the task to parallelize is Mandelbrot set generation.


The question actually has little or nothing to do with the application itself.

--

Anyway, I guess it's pointless to bring up the subtler points here. You might find that without thinking it through very carefully that you waste more time if you use a generalized 2 processor method than if you just stick with one processor.

If you really want and need to use the arm7 for additional processing, don't ask "how do I do parallel processing?" ask "how much work can I offload onto arm7, and how do I do that without prohibitive overhead?" Just rephrasing the question like that should give you an avenue to your answer. But if your existing design isn't optimal you'll save more time by fixing that first.
_________________
- nanou

#155139 - silent_code - Fri Apr 25, 2008 12:59 pm

you could use (if applicable - i haven't "done" mb, yet) a tile based renderer.
each processor then takes an unprocessed tile (or cell) and computes it. when it's done, it checks if there are others available and processes the next. you'd need interprocessor communication, but i wouldn't use multiprocessing (or multithreading) - it's (as TheMagnitude correctly asked for) parallel processing.