#38044 - Steve++ - Mon Mar 21, 2005 1:25 am
I'm implementing a tilemap engine with dynamic tile loading. I've figured most of it out (using a LUT and reverse LUT). I'm just wondering what the best tile replacement policy would be - one that would typically result in the least number of tile loads. I was thinking of using the LRU (least recently used) policy because it seems to be used in operating systems (for paging), but it might not be the best policy in this instance. I'm not really a level designer, so I don't know the answer to such questions as,
"If a tile occurs somewhere, is it likely that tile will also occur nearby?"
What other policies are there anyway? The only one I can think of is,
"Just pick the first tile that's not visible in this frame and evict it."
#38054 - sajiimori - Mon Mar 21, 2005 3:53 am
LRU is a good step up from the most trivial strategies. If the performance is still not acceptable, look into IBM's ARC strategy. It supposedly has minimal overhead and gives significant benefits over LRU, but I haven't had the need to apply it just yet.
http://www.almaden.ibm.com/StorageSystems/autonomic_storage/ARC/index.shtml
#38058 - Steve++ - Mon Mar 21, 2005 5:48 am
Thanks. That may be just what I'm looking for.
I was writing a very generic class template that implements LRU. So generic that it could be used in an OS or GBA game (or any other caching situation). This is achieved by implementing only the caching policy, leaving the actual page-to-cache loading to the user. It should be interesting doing this with ARC.
#38059 - sajiimori - Mon Mar 21, 2005 6:22 am
Cool, if you end up having both implementations and you're up for it, how about posting some statistics on hits/misses in your tile engine when you're done?
#38063 - Steve++ - Mon Mar 21, 2005 9:43 am
Hehe... I'm not one for gathering statistics, but I must try one day. In the meantime, I came up with a paging policy interface and an implementation based on LRU. Here it is:
Code: |
template<typename INDEX_TYPE>
class PagingPolicy
{
public:
virtual void notifyCacheAccess(INDEX_TYPE index) = 0;
virtual void setPageIndex(INDEX_TYPE cacheIndex, INDEX_TYPE pageIndex) = 0;
virtual INDEX_TYPE getEvicteeIndex() = 0;
virtual INDEX_TYPE getPageIndex(INDEX_TYPE cacheIndex) = 0;
};
template<int CACHE_SIZE, typename INDEX_TYPE, INDEX_TYPE UNMAPPED_VALUE>
class LRUpolicy : public PagingPolicy<INDEX_TYPE>
{
// Representation of a page mapped into the cache
struct CachePage
{
INDEX_TYPE pageIndex; // page lookup
INDEX_TYPE prev; // next stalest cache page
INDEX_TYPE next; // next freshest cache page
} __attribute__((packed));
CachePage cache[CACHE_SIZE]; // representation of the cache
INDEX_TYPE stalestCacheIndex; // the least recently accessed cache page
INDEX_TYPE freshestCacheIndex; // the most recently accessed cache page
public:
LRUpolicy()
{
for (INDEX_TYPE i=0; i<CACHE_SIZE; ++i)
{
cache[i].pageIndex = UNMAPPED_VALUE;
cache[i].prev = i-1;
cache[i].next = i+1;
}
stalestCacheIndex = 0;
freshestCacheIndex = CACHE_SIZE-1;
}
// The most recently accessed cache page moves
// to the top of the list (becomes freshest).
void notifyCacheAccess(INDEX_TYPE index)
{
if (index != freshestCacheIndex)
{
if (index != stalestCacheIndex)
{
cache[cache[index].prev].next = cache[index].next;
cache[cache[index].next].prev = cache[index].prev;
}
else
{
stalestCacheIndex = cache[index].next;
}
cache[index].prev = freshestCacheIndex;
cache[freshestCacheIndex].next = index;
freshestCacheIndex = index;
}
}
// Map a page to the cache
void setPageIndex(INDEX_TYPE cacheIndex, INDEX_TYPE pageIndex)
{
cache[cacheIndex].pageIndex = pageIndex;
}
// Get the cache page to have its page evicted
INDEX_TYPE getEvicteeIndex()
{
return stalestCacheIndex;
}
// Return the index of the page mapped to the specified cache index
INDEX_TYPE getPageIndex(INDEX_TYPE cacheIndex)
{
return cache[cacheIndex].pageIndex;
}
};
|
See, I told you I would write something very generic. I even opted for a linked list structure that uses indices instead of pointers, because it allows a memory saving in some cases. On the GBA, the index type can be u16 or s16, which uses half the space as a pointer.
GBA example:
Code: |
LRUpolicy<0x8000,0x0400,s16,-1> lru; |
Of course, this class template only implements half of the solution. You'll notice there's no reverse LUT and no code for even swapping pages in/out. That is to be implemented elsewhere. This code just portably implements the policy. Perhaps I can write an abstract class template that will implement what it can portably (such as reverse LUT) and provide empty methods for the nonportable stuff.
Watch this thread for an ARC submission.
EDIT: Just removed a template parameter from LRUpolicy. It was not used.
#38076 - Steve++ - Mon Mar 21, 2005 6:51 pm
I normally don't reply to myself. I've written an adapter template that can take any policy and implement demand paging.
Code: |
template<typename INDEX_TYPE, INDEX_TYPE UNMAPPED_VALUE>
struct PageFault
{
INDEX_TYPE pageIndex;
INDEX_TYPE cacheIndex;
PageFault()
{
}
PageFault(INDEX_TYPE pageIndex, INDEX_TYPE cacheIndex) : pageIndex(pageIndex), cacheIndex(cacheIndex)
{
}
bool isFault()
{
return pageIndex != UNMAPPED_VALUE;
}
};
// DemandPager is implemented as an adapter to a paging policy (much like adapters and containers in STL)
template<int PAGES, int CACHE_SIZE, typename INDEX_TYPE, INDEX_TYPE UNMAPPED_VALUE, typename PAGING_POLICY>
class DemandPager
{
PAGING_POLICY policy;
INDEX_TYPE pages2cache[PAGES]; // Reverse LUT - pages to cache pages
public:
DemandPager()
{
// All pages are initially unmapped (causing page faults)
for (INDEX_TYPE i=0; i<PAGES; ++i)
{
pages2cache[i] = UNMAPPED_VALUE;
}
}
// Lookup the cache index for a page. If it is uncached, a page fault is generated
// and the page is cached. The page fault information is return via pageFault.
// This method returns the cache index.
INDEX_TYPE lookupPage(INDEX_TYPE pageIndex, PageFault<INDEX_TYPE,UNMAPPED_VALUE>* pageFault)
{
INDEX_TYPE cacheIndex = pages2cache[page];
if (cacheIndex == UNMAPPED_VALUE)
{
cacheIndex = pages2cache[policy.getEvicteeIndex()];
pages2cache[pageIndex] = cacheIndex;
policy.setPageIndex(cacheIndex, pageIndex);
*pageFault = PageFault<INDEX_TYPE,UNMAPPED_VALUE>(pageIndex, cacheIndex);
}
else
{
*pageFault = PageFault<INDEX_TYPE,UNMAPPED_VALUE>(UNMAPPED_VALUE, UNMAPPED_VALUE);
}
policy.notifyCacheAccess(cacheIndex);
return cacheIndex;
}
};
// Helper macro to make life easier
#define DEMAND_PAGER(p,c,i,u,policy) DemandPager<p,c,i,u,policy<c,i,u> >
|
A couple of instantiation examples (both are identical):
Code: |
DemandPager<0x10000,1024,short,-1,LRUpolicy<1024,short,-1> > pager; // The regular way
DEMAND_PAGER(0x10000,1024,short,-1,LRUpolicy) pager2; // The easy way
|
This is pretty good if I may say so myself. From a user point of view, all that needs to be done is
1. Create a DemandPager object with the specified template parameters
2. Write the actual page loading code (in my case, loading a tile or sprite into VRAM)
3. Call lookupPage on each page to get the cache index, loading pages as necessary and specified by pageFault.
Easy!
#38079 - sajiimori - Mon Mar 21, 2005 7:51 pm
Well done. As for statistics, it could be something as simple as scrolling along the same path twice, once with LRU and once with ARC, and counting how many tile loads happened.
Obviously the nature of the map, the path, and the cache settings will affect the results, so mixing it up a bit would make things clearer.
#38085 - ampz - Mon Mar 21, 2005 10:02 pm
Actually, I read that "random" is not a very bad cache paging policy. Would probably work for tiles as well.
You don't want a too complex tile replacement policy either, because that could easily eat up the few cycles it takes to replace a tile using DMA (or load/store multiple asm instruction)
#38142 - Steve++ - Tue Mar 22, 2005 11:33 pm
True...
LRU does require some list rearrangement on every tile access, but if optimised, would probably only take the same time as loading a small number of tiles (maybe around 5?).
The thing we have to think of in GBA games is that we don't necessarily want a completely nondeterministic CPU usage. So I may implement something that uses a lot less time, so it fluctuates within a smaller CPU usage window. This would be especially useful for tilemaps that run close to worst-case-scenario most of the time. I think something like FIFO would be useful here.
Also, if I can find the fastest tile loading code possible, it may be possible to implement something I gave up on a while ago - four layers with a worst-case-scenario of every visible tile being unique. This would require some clever scheduling. I want to use krawall also, which will probably need large buffers due to the demand placed on the system by the tile engine.
In the four layer system, I would partition the VRAM into four pieces - one for each layer. Each partition has enough space for that layer's tilemap and seven rows of unique tiles. But of course, we need to display up to 22 rows of tiles, so I need a number that is a factor of 22 and is less than 7. That leaves only 2. So every two rows, another two rows of tiles will need to be loaded. I could either use FIFO or just evict and reload everything - both of these are compatible with my adapter template. Nice, eh?
The only trick is scheduling tile loading. This will need some sort of time-based queue. Shouldn't be too hard to calculate times - this depends only on the remainder of the map position's y coordinate when divided by 8. Simple. I just hope after coding it up that there is enough CPU time (and some to spare) to do all this. I wonder how many cycles it takes to load 2816 (32x22x4) tiles...
#38143 - Steve++ - Tue Mar 22, 2005 11:45 pm
Hmm, here's an intersting thought - a generic temporal queue that executes objects placed on it at the specified time. Interesting.... I could run the whole game through it, not just tile loading. Since it would be a queue structure, I would need more than one of them. They would have a priority order. I think these queues would be a lot safer than interrupts for doing the same thing, especially when operations take a long time.
The temporal parameters would be:
earliest time to start
latest time to start
estimated time to complete
They would be used like this: The queue executor wait until the time is greater than or equal to earliest time to start.If, on the first check, it is already greater than latest time to start, it will skip that item. If the item is on a low priority queue, the executor will use estimated time to complete to decide if that item is going to prevent a higher priority item to execute.
I think I could just get rid of estimated time to complete. It would probably add too much overhead.
I could just imaging the case where something interrupts a tile load, and for a split second, two rows of tiles are completely garbled. That would look kind of funny, but it would be a sign that the queue system is working to ensure that earlier items don't delay later items to much.
#38197 - tepples - Wed Mar 23, 2005 10:56 pm
Steve++ wrote: |
Hmm, here's an intersting thought - a generic temporal queue that executes objects placed on it at the specified time. Interesting.... I could run the whole game through it, not just tile loading. |
In fact, this is how PlayStation 2 programs and other real-time programs work: as a dataflow diagram with real-time constraints. Good luck on your miniature real-time OS :)
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.
#38212 - sajiimori - Thu Mar 24, 2005 1:09 am
Is there something unusual about the PS2 that makes that scheme particularly appealing?
#38213 - tepples - Thu Mar 24, 2005 1:15 am
The PS2 has multiple independent processors, and interpreting your program as a dataflow diagram makes it a lot easier to synchronize them. This will go double for the Cell processors in the PS3. Think of the GBA's PPU as another processor that executes the "program" in the registers on the "data" in VRAM.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.
#38216 - Steve++ - Thu Mar 24, 2005 2:05 am
Tepples, I modified the code to allow for the possibility of different types for cache indices and page indices. I also changed some template parameter naming and ordering. All this code supercedes the above code.
Code: |
template<typename PAGE_INDEX_TYPE, typename CACHE_PAGE_INDEX_TYPE>
class PagingPolicy
{
public:
virtual void notifyCacheAccess(CACHE_PAGE_INDEX_TYPE index) = 0;
virtual void setPageIndex(CACHE_PAGE_INDEX_TYPE cacheIndex, PAGE_INDEX_TYPE pageIndex) = 0;
virtual CACHE_PAGE_INDEX_TYPE getEvicteeIndex() = 0;
virtual PAGE_INDEX_TYPE getPageIndex(CACHE_PAGE_INDEX_TYPE cacheIndex) = 0;
};
template< typename PAGE_INDEX_TYPE,
PAGE_INDEX_TYPE UNCACHED,
typename CACHE_PAGE_INDEX_TYPE,
CACHE_PAGE_INDEX_TYPE CACHE_SIZE>
class LRUpolicy : public PagingPolicy<PAGE_INDEX_TYPE, CACHE_PAGE_INDEX_TYPE>
{
// Representation of a page mapped into the cache
struct CachePage
{
PAGE_INDEX_TYPE pageIndex; // page lookup
CACHE_PAGE_INDEX_TYPE prev; // next stalest cache page
CACHE_PAGE_INDEX_TYPE next; // next freshest cache page
} __attribute__((packed));
CachePage cache[CACHE_SIZE]; // representation of the cache (and LUT)
CACHE_PAGE_INDEX_TYPE stalestCacheIndex; // the least recently accessed cache page
CACHE_PAGE_INDEX_TYPE freshestCacheIndex; // the most recently accessed cache page
public:
LRUpolicy()
{
for (CACHE_PAGE_INDEX_TYPE i=0; i<CACHE_SIZE; ++i)
{
cache[i].pageIndex = UNCACHED;
cache[i].prev = i-1;
cache[i].next = i+1;
}
stalestCacheIndex = 0;
freshestCacheIndex = CACHE_SIZE-1;
}
// The most recently accessed cache page moves
// to the top of the list (becomes freshest).
void notifyCacheAccess(CACHE_PAGE_INDEX_TYPE index)
{
if (index != freshestCacheIndex)
{
if (index != stalestCacheIndex)
{
cache[cache[index].prev].next = cache[index].next;
cache[cache[index].next].prev = cache[index].prev;
}
else
{
stalestCacheIndex = cache[index].next;
}
cache[index].prev = freshestCacheIndex;
cache[freshestCacheIndex].next = index;
freshestCacheIndex = index;
}
}
// Map a page to the cache
void setPageIndex(CACHE_PAGE_INDEX_TYPE cacheIndex, PAGE_INDEX_TYPE pageIndex)
{
cache[cacheIndex].pageIndex = pageIndex;
}
// Get the cache page to have its page evicted
CACHE_PAGE_INDEX_TYPE getEvicteeIndex()
{
return stalestCacheIndex;
}
// Return the index of the page mapped to the specified cache index
PAGE_INDEX_TYPE getPageIndex(CACHE_PAGE_INDEX_TYPE cacheIndex)
{
return cache[cacheIndex].pageIndex;
}
};
template<typename PAGE_INDEX_TYPE, PAGE_INDEX_TYPE UNCACHED, typename CACHE_PAGE_INDEX_TYPE>
struct PageFault
{
PAGE_INDEX_TYPE pageIndex;
CACHE_PAGE_INDEX_TYPE cacheIndex;
PageFault() : pageIndex(UNCACHED)
{
}
PageFault(PAGE_INDEX_TYPE pageIndex, CACHE_PAGE_INDEX_TYPE cacheIndex) : pageIndex(pageIndex), cacheIndex(cacheIndex)
{
}
bool isFault()
{
return pageIndex != UNCACHED;
}
};
// DemandPager is implemented as an adapter to a paging policy (much like adapters and containers in STL)
template< typename PAGE_INDEX_TYPE,
PAGE_INDEX_TYPE PAGES,
PAGE_INDEX_TYPE UNCACHED,
typename CACHE_PAGE_INDEX_TYPE,
CACHE_PAGE_INDEX_TYPE CACHE_SIZE,
typename PAGING_POLICY>
class DemandPager
{
PAGING_POLICY policy;
CACHE_PAGE_INDEX_TYPE pages2cache[PAGES]; // Reverse LUT - pages to cache pages
public:
DemandPager()
{
// All pages are initially unmapped (causing page faults)
for (PAGE_INDEX_TYPE i=0; i<PAGES; ++i)
{
pages2cache[i] = UNCACHED;
}
}
// Lookup the cache index for a page. If it is uncached, a page fault is generated
// and it the page is cached. The page fault information is return via pageFault.
// This method returns the cache index.
CACHE_PAGE_INDEX_TYPE lookupPage(PAGE_INDEX_TYPE pageIndex, PageFault<PAGE_INDEX_TYPE,UNCACHED,CACHE_PAGE_INDEX_TYPE>* pageFault)
{
CACHE_PAGE_INDEX_TYPE cacheIndex = pages2cache[page];
if (cacheIndex == UNCACHED)
{
cacheIndex = pages2cache[policy.getEvicteeIndex()];
pages2cache[pageIndex] = cacheIndex;
policy.setPageIndex(cacheIndex, pageIndex);
*pageFault = PageFault<PAGE_INDEX_TYPE,UNCACHED,CACHE_PAGE_INDEX_TYPE>(pageIndex, cacheIndex);
}
else
{
*pageFault = PageFault<PAGE_INDEX_TYPE,UNCACHED,CACHE_PAGE_INDEX_TYPE>();
}
policy.notifyCacheAccess(cacheIndex);
return cacheIndex;
}
};
// Helper macro to make life easier
#define DEMAND_PAGER(ptype,pages,u,ctype,cache,policy) DemandPager<ptype,pages,u,ctype,cache,policy<ptype,u,ctype,cache> >
|
Instantiation examples (both equivalent):
Code: |
DemandPager<int,0x20000,-1,short,0x400,LRUpolicy<int,-1,short,0x400> > pager; // The regular way
DEMAND_PAGER(int,0x20000,-1,short,0x400,LRUpolicy) pager2; // The easy way
|