gbadev.org forum archive

Hey
I have a very processor intensive function.. I thought that putting the code in ITCM and the data in DTCM would speed it up some. When I later just for testing changed back to main memory I saw that the speed was exactly the same! why doesnt code execute faster in ITCM and load/store data faster in DTCM???

thanks

(the function is a loop with about 150 instructions in it)

You'll find that the ARM9 will cache regularly used instructions and data into icache and dcache respectively. Both of these use the same physical memory as ITCM and DTCM, so apart from the slight cache overhead, you'll get similar performance from either explicitly placing the code/data there or letting the CPU do it.
_________________
http://chishm.drunkencoders.com
http://dldi.drunkencoders.com

Is this done at runtime or at compile time? like: do I ever have to put small routines in itcm?

Just in case there was some confusion: Regardless of whether TCM and the cache use the same physical hardware (and I don't know if they do), they do not occupy the same logical addresses. In fact, the cache is not mapped to any memory addresses.

There are two basic reasons to put something in TCM:

1. It is accessed repeatedly during a loop, but it is too large to fit in the cache, so automatic caching has little or no benefit. You don't necessarily have to put the entire loop in TCM to eliminate access to slow RAM, though. You just have to move enough to TCM so that the rest can fit in the cache.

2. It is accessed often, but sporadically, so that its place in the cache is likely to be overwritten by the time it is accessed again. Even worse, every time this sporadic action is performed, it overwrites older cached data without any payoff -- it's a lose-lose.

I tend to look at TCM as a way to free up cache space by keeping common things from filling it up.

Edit: To answer your question more directly, the cache is handled by the CPU at runtime. http://www.arm.com/pdfs/DDI0155A_946ES.pdf

Hmm, I don't know where I read that the ITCM uses the same hardware as the Icache. Regardless, for small inner loops that are used infrequently but run through a few hundred times, it is better to let the cache take care of them. Use ITCM for code that will be executed frequently (and especially) in small chunks. You won't normally be able to fit an entire game loop in ITCM. Also, place data in DTCM. ITCM takes a performance hit when accessing data from code within it.
_________________
http://chishm.drunkencoders.com
http://dldi.drunkencoders.com

Also, be careful about DTCM usage if your stack is there.

Some of the best things to put in ITCM are matrix operations. They're kind of long (resulting in a lot of memory reads), they're often used by multiple subsystems (so use is sporadic), they aren't usually written with loops (so caching doesn't pay off during one call), and they can eat a lot of cycles (so efficiency matters).

gbadev.org forum archive

DS development > ITCM & DTCM

#101252 - ProblemBaby - Mon Sep 04, 2006 1:23 am

#101254 - chishm - Mon Sep 04, 2006 1:32 am

#101256 - ProblemBaby - Mon Sep 04, 2006 1:45 am

#101275 - sajiimori - Mon Sep 04, 2006 3:20 am

#101287 - chishm - Mon Sep 04, 2006 6:55 am

#101289 - sajiimori - Mon Sep 04, 2006 7:18 am