gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

DS development > FIFO troubles of some sort

#106751 - HyperHacker - Mon Oct 23, 2006 5:15 am

I've been trying to use the hardware FIFO, but I ran into an odd problem. I can send however much I want to ARM7, but only the first thing sent to ARM9 is ever received.

Relevant code:
ARM7:
Code:
//Graphics Test ARM7 Code
#include "main.h"

/*
Entry Point
CPU: ARM7
Inputs:
   -argc: Number of arguments
   -argv: Pointer to arguments
Returns: Program return code
*/
int main(int argc, char** argv)
{
   //Blank out IPC
   memzero((u8*)IPC,sizeof(IPC));


   //Init interrupts
   REG_IME = 0; //Disable interrupts while changing them
   IRQ_HANDLER = Interrupt; //Set handler callback
   REG_IE = IRQ_VBLANK | IRQ_TIMER3 | IRQ_IPC_SYNC | IRQ_FIFO_NOT_EMPTY;
   REG_IF = ~0;
   DISP_SR = DISP_VBLANK_IRQ;
   REG_IME = 1; //Enable interrupts


   //Init IPC
   REG_IPC_SYNC = IPC_SYNC_IRQ_ENABLE; //Enable IRQs from ARM9
   REG_IPC_FIFO_CR = IPC_FIFO_SEND_IRQ | IPC_FIFO_RECV_IRQ | IPC_FIFO_ENABLE; //Enable FIFO and interrupt on send FIFO empty/receive FIFO not empty
   REG_IPC_FIFO_TX = 420;

   //Init reload ability
   //LOADNDS->PATH = 0;


   //Init timers
   TIMER3_DATA = 65535 - 34318; //should be one interrupt every ~1ms
   TIMER3_CR = TIMER_ENABLE | TIMER_IRQ_REQ;

   swiWaitForVBlank();
   swiWaitForVBlank();
   swiWaitForVBlank();
   REG_IPC_FIFO_TX = 1234;

   while(true)
   {
      IPC->keys = REG_KEYXY;
      //if((~REG_KEYXY) & 1) REG_IPC_FIFO_TX = 69;
      swiWaitForVBlank();
   }
   return 0;
}


/*
Interrupt handler
CPU: ARM7
*/
void Interrupt()
{
   if(REG_IF & IRQ_VBLANK)
   {
      //if (LOADNDS->PATH != 0) LOADNDS->ARM7FUNC(LOADNDS->PATH); //Reload when singalled

      VBLANK_INTR_WAIT_FLAGS |= IRQ_VBLANK; //Signal that vblank interrupt has been processed
      REG_IF |= IRQ_VBLANK;
   }
   else if(REG_IF & IRQ_TIMER3)
   {
      IPC->tickcount++;
      VBLANK_INTR_WAIT_FLAGS |= IRQ_TIMER3;
      REG_IF |= IRQ_TIMER3;
   }
   else if(REG_IF & IRQ_FIFO_NOT_EMPTY)
   {
      IPC->temperature = REG_IPC_FIFO_RX; //testing
      IPC->a7_fifo_count++;
      VBLANK_INTR_WAIT_FLAGS |= IRQ_FIFO_NOT_EMPTY;
      REG_IF |= IRQ_FIFO_NOT_EMPTY;
   }
   else
      REG_IF = REG_IF; //Trigger a write
}


ARM9:
Code:
//Graphics Test ARM9 Code
#include "main.h"

/*
Entry Point
CPU: ARM9
Inputs:
   -argc: Number of arguments
   -argv: Pointer to arguments
Returns: Program return code
*/
int main(int argc, char** argv)
{
   powerON(POWER_ALL_2D); //Turn stuff on (required for some flash cards)

   //Init video
   videoSetMode(MODE_5_2D | DISPLAY_BG3_ACTIVE);
   videoSetModeSub(MODE_5_2D | DISPLAY_BG3_ACTIVE);
   vramSetMainBanks(VRAM_A_MAIN_BG_0x6000000, VRAM_B_MAIN_BG_0x6000000, VRAM_C_SUB_BG_0x6200000, VRAM_D_LCD);

   BG3_CR = BG_BMP16_256x256;
   BG3_XDX = 1 << 8;
   BG3_XDY = 0;
   BG3_YDX = 0;
   BG3_YDY = 1 << 8;
   BG3_CX = 0;
   BG3_CY = 0;

   SUB_BG3_CR = BG_BMP16_256x256;
   SUB_BG3_XDX = 1 << 8;
   SUB_BG3_XDY = 0;
   SUB_BG3_YDX = 0;
   SUB_BG3_YDY = 1 << 8;
   SUB_BG3_CX = 0;
   SUB_BG3_CY = 0;

   //Init interrupts
   REG_IME = 0; //Disable interrupts while changing them
   IRQ_HANDLER = Interrupt; //Set handler callback
   REG_IE = IRQ_VBLANK | IRQ_HBLANK | IRQ_IPC_SYNC | IRQ_FIFO_NOT_EMPTY;
   REG_IF = ~0;
   DISP_SR = DISP_VBLANK_IRQ | DISP_HBLANK_IRQ;
   REG_IME = 1; //Enable interrupts


   //Init IPC
   REG_IPC_SYNC = IPC_SYNC_IRQ_ENABLE; //Enable IRQs from ARM7
   REG_IPC_FIFO_CR = IPC_FIFO_SEND_IRQ | IPC_FIFO_RECV_IRQ | IPC_FIFO_ENABLE; //Enable FIFO and interrupt on send FIFO empty/receive FIFO not empty
   REG_IPC_FIFO_TX = 1337;


   //Let things get set up
   swiWaitForVBlank();
   swiWaitForVBlank();
   swiWaitForVBlank();
   sysSetBusOwners(true, true, true); //All to ARM9
   REG_IPC_FIFO_TX = 5555;


   MainScreenBuf = CreateGraphicBuffer(SCREEN_WIDTH, SCREEN_HEIGHT);
   SubScreenBuf = CreateGraphicBuffer(SCREEN_WIDTH, SCREEN_HEIGHT);
   ClearGraphicBuffer(MainScreenBuf, RGB15(0, 0, 15) | 0x8000);
   ClearGraphicBuffer(SubScreenBuf, RGB15(0, 0, 0) | 0x8000);

   //FillRect(MainScreenBuf, 5, 100, 25, 25, RGB15(31, 31, 0));

   while(true)
   {
      swiWaitForVBlank();
      FillRect(MainScreenBuf, 5, 5, 100, 30, RGB15(0, 0, 15));
      Print(MainScreenBuf, 0, 5, 5, "Msec=%d\nA7 FIFO=%d, %d\nA9 FIFO=%d, %d\nKey=%08X", IPC->tickcount, IPC->a7_fifo_count, IPC->temperature, IPC->a9_fifo_count, IPC->buttons_held, IPC->keys);
   }
   return 0;
}


/*
Interrupt handler
CPU: ARM9
*/
void Interrupt()
{
   /*if(REG_IF & IRQ_HBLANK) //HBlank interrupt
   {
      VBLANK_INTR_WAIT_FLAGS |= IRQ_HBLANK;
      REG_IF |= IRQ_HBLANK;
   }
   else */if(REG_IF & IRQ_VBLANK) //VBlank interrupt
   {
      FastCopy32(MainScreenBuf->Pixels, BG_GFX, (SCREEN_WIDTH * SCREEN_HEIGHT) * 2);
      FastCopy32(SubScreenBuf->Pixels, BG_GFX_SUB, (SCREEN_WIDTH * SCREEN_HEIGHT) * 2);
      //dmaCopy(MainScreenBuf->Pixels, BG_GFX, (SCREEN_WIDTH * SCREEN_HEIGHT) * 2);
      VBLANK_INTR_WAIT_FLAGS |= IRQ_VBLANK; //Signal that vblank interrupt has been processed
      REG_IF |= IRQ_VBLANK; //Signal that vblank interrupt processing is done. We need to trigger a write even though this shouldn't change the value.
   }
   else if(REG_IF & IRQ_FIFO_NOT_EMPTY)
   {
      IPC->buttons_held = REG_IPC_FIFO_RX; //testing
      IPC->a9_fifo_count++;

      VBLANK_INTR_WAIT_FLAGS |= IRQ_FIFO_NOT_EMPTY;
      REG_IF |= IRQ_FIFO_NOT_EMPTY;
   }
   else
      REG_IF = REG_IF; //Trigger a write
}

_________________
I'm a PSP hacker now, but I still <3 DS.

#106788 - masscat - Mon Oct 23, 2006 1:13 pm

As discussed elsewhere, you only get a non-empty fifo interrupt for the transition from empty not not empty. You do not get it for every write to the fifo. Therefore in the interrupt handler you should drain the fifo, i.e. read and process the fifo until it is empty. Might be your problem.

#106853 - HyperHacker - Tue Oct 24, 2006 5:10 am

Alright, I changed it to loop and keep reading REG_IPC_FIFO_RX while(!(REG_IPC_FIFO_CR & IPC_FIFO_RECV_EMPTY)), but it didn't help.

[edit] It does work now that I put that loop in the general interrupt handler rather than the IRQ_FIFO_NOT_EMPTY handler, so it gets run during VBlank too.
_________________
I'm a PSP hacker now, but I still <3 DS.

#106859 - DekuTree64 - Tue Oct 24, 2006 6:54 am

Yeah, just read repeatedly until it's empty. There's a bit in the fifo control register that gets set when it is, so you can check that to see if you should read more.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

#106860 - HyperHacker - Tue Oct 24, 2006 6:56 am

Makes sense. And if I want to send say 16 bytes, I just write to REG_IPC_FIFO_TX as if it were an array and read it back until empty on the other CPU?
Also since FIFO is a mere 64 bytes, is there a way to ensure a global variable or malloc()'d array is placed in shared RAM so that I can just pass pointers instead of trying to stream data?
_________________
I'm a PSP hacker now, but I still <3 DS.

#106861 - DekuTree64 - Tue Oct 24, 2006 7:05 am

Not quite like an array, since you write to the same address every time, but yeah, that is about it.

Malloc will always go in main RAM. Not sure about global variables, but they probably do too. Careful of cache problems when using shared memory though. You can either flush the cache after changing things, or work with the memory as uncached in the first place (add 0x400000 to the normal main RAM address to get to the uncached mirror).
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

#106863 - HyperHacker - Tue Oct 24, 2006 7:08 am

Alright, that's about what I thought. Cache has given me some trouble before. You have to use uncached memory both for reading and writing, correct?
_________________
I'm a PSP hacker now, but I still <3 DS.

#106878 - Lick - Tue Oct 24, 2006 11:10 am

You should read my entry on large transfers with the FIFO.

What you should do is:
Code:
// Before sending
while(!(REG_IPC_FIFO_CR & IPC_FIFO_SEND_EMPTY));

That will wait until the send queue is read from the other side. Not waiting will generate a send error. After waiting, you can send 16 times again without waiting. Note that you only have to wait because you send too fast and thereby make the 16 queueslots full.
_________________
http://licklick.wordpress.com

#106907 - HyperHacker - Tue Oct 24, 2006 5:49 pm

Yeah, I could do that, but it's not really what I'd like to do. It has a few drawbacks:
-Unnecessary use of CPU power copying data in and out of FIFO instead of just reading it out of memory.
-Since I plan to use FIFO as the primary method of communicating between CPUs (ARM9 requesting a change in power status, ARM7 notifying that a sound has finished, etc), I would have to either set up some silly system in which each word is one byte indicating whether it's data or another message and 3 bytes of data, or just be unable to send any messages during the transfer. Again, this seems like an unnecessary communications block when I can just pass a pointer and be done with it.

I can see some uses for transferring data through FIFO, but for the most part I think just passing a pointer would do the job. Even if the data isn't going to be kept in memory much longer, at least this way one CPU can just copy it to some other place without the overhead of the other copying it into FIFO first.
_________________
I'm a PSP hacker now, but I still <3 DS.

#106916 - Lick - Tue Oct 24, 2006 7:25 pm

If you check out the LoveLite source, I included ndsx_firmware.h which reads and (can) write to the firmware. It also contains an example of how to use the FIFO for messaging AND transferring. Simply disable the IRQs when doing mass transfer and manual pump the bytes/halfwords/words. Check it out.
_________________
http://licklick.wordpress.com

#106922 - HyperHacker - Tue Oct 24, 2006 8:27 pm

Right, but I don't want to use it for transfers. I just want to use it to pass pointers to the data itself.
_________________
I'm a PSP hacker now, but I still <3 DS.

#106924 - Lick - Tue Oct 24, 2006 8:30 pm

Ow like that. Yeah, that's possible too. =D
_________________
http://licklick.wordpress.com

#106973 - masscat - Wed Oct 25, 2006 11:06 am

On the ARM9, using the default linking, both your global data and malloc'd memory will be in main memory (0x02000000 to 0x023FFFFF cached or 0x02400000 to 0x027FFFFF non-cached) and therefore accessable by the ARM7. You can force data/instructions to be placed into the DCTM/ICTM which would not be accessable by the ARM7.
On the ARM7, using the default linking, both you global data and malloc's memory will be in the ARM7 private RAM (0x03800000 to 0380FFFF). This is not accessable by the ARM9.

If you do not want to worry with the ARM9 cache, use the uncached main memory mirror for both reading and writing the shared data.
Depending on what you are doing with the data you may get much better performance using cached memory and flushing/invaliding the cache. To do this easily make all your shared data structures align with 32byte boundaries (32bytes is the size of the cache line). Before sending a structure address from the ARM9 to the ARM7 you need to ensure that any data in the ARM9 cache are written back to memory by calling DC_FlushRange( addr_of_struct, sizeof( struct)). When you receive a structure address from the ARM7 you must invalidate the corresponding cache entries so that the new data are read from memory and not the cache. This is done with DC_InvalidateRange( addr_of_struct, sizeof( struct)).

EDIT: you can get gcc to do the alignment for you using the aligned attribute.


Last edited by masscat on Wed Oct 25, 2006 1:53 pm; edited 1 time in total

#106977 - Lick - Wed Oct 25, 2006 12:19 pm

Great explanation. Thanks for sharing your awesome knowledge!
_________________
http://licklick.wordpress.com

#107139 - wintermute - Fri Oct 27, 2006 1:05 am

HyperHacker wrote:
Yeah, I could do that, but it's not really what I'd like to do. It has a few drawbacks:
-Unnecessary use of CPU power copying data in and out of FIFO instead of just reading it out of memory.


Actually transferring data using the FIFO is much faster than using shared memory depending on what you're doing with the data.

Quote:

-Since I plan to use FIFO as the primary method of communicating between CPUs (ARM9 requesting a change in power status, ARM7 notifying that a sound has finished, etc), I would have to either set up some silly system in which each word is one byte indicating whether it's data or another message and 3 bytes of data, or just be unable to send any messages during the transfer. Again, this seems like an unnecessary communications block when I can just pass a pointer and be done with it.


The gain in speed may be worth the added complexity.

Bear in mind that the ARM7 can only play audio data from the main 4meg - at least my experiments seemed to indicate this was the case. I didn't check the switchable RAM but I'm considering moving the default ARM7 code into this area to give a little more space.
_________________
devkitPro - professional toolchains at amateur prices
devkitPro IRC support
Personal Blog

#107140 - wintermute - Fri Oct 27, 2006 1:10 am

masscat wrote:

To do this easily make all your shared data structures align with 32byte boundaries (32bytes is the size of the cache line).


Data aligment isn't necessary, the flush and invalidate functions align the addresses used. You may invalidate or flush data that doesn't require it with unaligned data but that's going to be a maximum of an extra 63 bytes and not likely to cause major issues with anything.
_________________
devkitPro - professional toolchains at amateur prices
devkitPro IRC support
Personal Blog

#107174 - masscat - Fri Oct 27, 2006 11:24 am

wintermute wrote:
masscat wrote:

To do this easily make all your shared data structures align with 32byte boundaries (32bytes is the size of the cache line).


Data aligment isn't necessary, the flush and invalidate functions align the addresses used. You may invalidate or flush data that doesn't require it with unaligned data but that's going to be a maximum of an extra 63 bytes and not likely to cause major issues with anything.

The structure must be aligned and be the sole occupant of the cache lines it covers.
For example, you have two structures (A and B) of 16 bytes each next to each other in memory and covered by a single cache line.
ARM9 is updating A ready to send to ARM7 and ARM7 is updating B ready to send to ARM9.
Case 1:
ARM7 completes and sends the address of B to ARM9. ARM9 invalidates (no write back) the cache line covering B and all its writes to A are lost.
Case 2:
ARM9 completes updating A and calls DC_FlushRange writing back the entire cache line killing any changes that ARM7 has made to B.

The memory covered by structure A could be any data used by the ARM9 which may get written back to memory at any time (cache usage). Situations like this lead to code that runs happily most of the time but randomly breaking - bugs like these can be very hard to find.

#107216 - wintermute - Fri Oct 27, 2006 7:05 pm

Very good point, well presented and I should know better than to post about cache problems straight off the top of my head.
_________________
devkitPro - professional toolchains at amateur prices
devkitPro IRC support
Personal Blog

#107269 - gladius - Sat Oct 28, 2006 3:45 am

wintermute wrote:
Bear in mind that the ARM7 can only play audio data from the main 4meg - at least my experiments seemed to indicate this was the case. I didn't check the switchable RAM but I'm considering moving the default ARM7 code into this area to give a little more space.
The arm7 can play audio data from anywhere the arm7 can access it looks like. I have had it play from the main 4meg, shared iwram, the local 64k, and VRAM as well (which is what I currently use).

#107271 - wintermute - Sat Oct 28, 2006 3:55 am

really?

Would you mind sharing some source?

I was attempting to port an OPL emulator for dsdoom and was greeted with silence when I put the buffer in arm7 iwram. Moving the buffer out to main RAM gave me audio but it's a bit on the gritty side. I'm still having trouble synchronising the ring buffer for some reason.
_________________
devkitPro - professional toolchains at amateur prices
devkitPro IRC support
Personal Blog

#107376 - gladius - Sun Oct 29, 2006 3:33 am

That is how the pocketspc arm7 version plays it's audio. The latest source is at http://pocketspc.pocketheaven.com/pocketspc-src-v10.zip. I just did a readelf to be sure it was located in the arm7 64k, and my play buffer is at 0x0380f5a8.