#41767 - rize - Sun May 01, 2005 7:56 pm
I'm pretty sure I read that the ARM9 has a 16K Instruction and 16K data cache. What about the ARM7? I also thought I remembered reading that there is a L2 cache shared by the two, but maybe that's crazy talk. It's got 4MB of main memory right?
I checked the tech docs at bottledlight and didn't see the info in any obvious place.
Also, does anyone know anything about the prefetching hardware? Does it have any? Can the software (via the compiler optimization or via explicit compiler hints) suggest a prefetch?
How would it handle the fetching of a 1KB static sized array (say a look up table or buffer of fixed size)?
#41774 - DekuTree64 - Sun May 01, 2005 9:01 pm
You can prefetch things into the cache with the pld insruction (same as ldr, but doesn't actually load, and always takes 1 cycle), but that only loads one cache line, which is 32 bytes. According to http://www.arm.com/pdfs/DDI0155A_946ES.pdf section 3.4, you can lock down portions of the cache to keep data there, which could be used to prefetch 32 bytes at a time and lock them. It seems to have some strange requirements though, like running the program from non-cachable memory. The protection unit controls which regions are cachable.
Anyway, it would be easier to just have a 'cache' area in DTCM to copy larger chunks of temporary data into.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku
#41801 - rize - Mon May 02, 2005 12:23 am
Thanks. Sounds complex (which isn't unexpected).
I'll bookmark this and try it if I'm having performance problems with this array.
I'd still like to know the size of all the caches and whether or not there is an L2 if anyone knows.
#41830 - DekuTree64 - Mon May 02, 2005 7:44 am
You can ask the ARM9 how big its caches and TCM are. Check the CP15 section of that doc.
Do be careful when using the cache though, it doesn't mix well with direct hardware access. Like, if you fiddle with some data, and then DMA it somewhere, it may still be in the cache. DMA only sees the 'real' memory, so it will copy the data from BEFORE you fiddled with it. To fix that, you need to flush the data out of the cache before doing a DMA on it.
And of course, enabling cache on VRAM or IO registers is pretty much asking for crazy problems with things not showing up :)
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku
#41883 - mike260 - Mon May 02, 2005 7:38 pm
Has anyone tried putting the stack into DCTM?
#41898 - rize - Mon May 02, 2005 10:07 pm
Well, I at least wanted to be aware of the cache sizes even if I didn't try to do anything intentionally with them.