gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

DS development > Writing a performance analyser

#112721 - simonjhall - Tue Dec 19, 2006 8:51 pm

I'm reaching the stage with my Quake port where the performance is starting to become my main focus. Yeah, you're gonna automatically say that it's in the rendering, but that's not actually true :-)
As I can't whip out a profiler to see what's going on (I seem to remember gprof moaning at me when linking), I'm thinking about rolling my own. An alternate method of properly finding bottlenecks in the code would be to add profiling support to an emulator (not too hard...).

The most obvious way to do this would be to add a custom epilogue and prologue to each function, but I can't find an obvious way of telling gcc that I'd like to do this.

So...has anyone ever set up their own prologue with gcc? I can see options for other architectures, just not ARM.

----
Oh another way would be to replace all bl-s/bx-s with a dash of code but I'd like to do it properly for once :-D
_________________
Big thanks to everyone who donated for Quake2

#112722 - spectre1989 - Tue Dec 19, 2006 8:52 pm

Not quite sure how to solve your problem, but just wanted to say it sounds like an awesome idea :)
_________________
Pro-test: Standing up for science, standing up for what's right

#112729 - nornagon - Tue Dec 19, 2006 11:03 pm

The US$1750 version of no$gba has a profiler :)

#112737 - simonjhall - Tue Dec 19, 2006 11:28 pm

Quote:
The US$1750 version of no$gba has a profiler :)
Sweet! Although I'll be damned if I'll buy it - I'd much rather write my own emulator!

But anyway, I couldn't seem to find an obvious way into the gcc prolog so I decided to have a proper look at the insides of gprof. Yeah, linker errors (fixed with stub functions), then power-on errors, then what looked like corrupt instructions...
But I'm making progress! After stepping through a simple function in the debugger I think I can see what's going on - gcc inserts the address of a piece of memory directly after the branch to the performance-timing function. Time to rewrite my (stub) performance-timer in (my rusty) assembler :-)

This looks easier than I expected...
_________________
Big thanks to everyone who donated for Quake2

#112740 - tepples - Tue Dec 19, 2006 11:50 pm

Profiling on a GBA or DS is a matter of reading VCOUNT at various points in the game loop.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#112744 - simonjhall - Tue Dec 19, 2006 11:58 pm

Yeah, that's the easy option! (ie similar to what I'm already doing...)
I really want something which prints out a big list of all the functions called and the amount of milliseconds spent in them, how many times called etc.

What does VCOUNT actually do (ie when is it what value if you get my gist)? What happens if it's value is read and then read again after exactly one vertical blank - will VCOUNT be the same both times it's read?
_________________
Big thanks to everyone who donated for Quake2

#112748 - tepples - Wed Dec 20, 2006 12:03 am

simonjhall wrote:
What does VCOUNT actually do (ie when is it what value if you get my gist)?

The value of VCOUNT is the Y position of the scanline that the 2D engine is (GBA) or 2D engines are (DS) rendering.

Quote:
What happens if it's value is read and then read again after exactly one vertical blank - will VCOUNT be the same both times it's read?

Yes.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#112749 - Dark Knight ez - Wed Dec 20, 2006 12:04 am

Quote:
VCOUNT - Vertical Counter (Read only)
Indicates the currently drawn scanline, values in range from 160-227 indicate 'hidden' scanlines within VBlank area.


edit: Tepples beat me to it. :)
_________________
AmplituDS website

#112755 - simonjhall - Wed Dec 20, 2006 12:28 am

Quote:
VCOUNT - Vertical Counter (Read only)
Indicates the currently drawn scanline, values in range from 160-227 indicate 'hidden' scanlines within VBlank area.
So what would the value be when it's at scanline 191? Scanlines 0 to 191 are visible, right? I always assumed the screen was 256x192...
_________________
Big thanks to everyone who donated for Quake2

#112757 - sasq - Wed Dec 20, 2006 12:40 am

Another way is to us a timer interrupt and store the location of the PC and use that to look up which function was interrupted and count how many times each function was interrupted.
Not as exact but requires no changes to the code.

#112767 - tepples - Wed Dec 20, 2006 1:34 am

simonjhall wrote:
Quote:
VCOUNT - Vertical Counter (Read only)
Indicates the currently drawn scanline, values in range from 160-227 indicate 'hidden' scanlines within VBlank area.
So what would the value be when it's at scanline 191?

191. On the DS, this scanline is visible; on the GBA, it isn't.

Quote:
Scanlines 0 to 191 are visible, right? I always assumed the screen was 256x192...

GBA screen is 160 lines plus 68 blank lines. DS screen is 192 lines plus 71 blank lines.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#112773 - simonjhall - Wed Dec 20, 2006 2:09 am

Reading the docs on nocash, it seems that the non-visble scanlines don't trigger an interrupt - can anyone confirm this? It'll be a bit of a crap time base if the the gap between 0, 1, 2, 3, 4....191 is consistent then there's suddenly a long gap from 192 to 0 again!
_________________
Big thanks to everyone who donated for Quake2

#112779 - tepples - Wed Dec 20, 2006 3:25 am

simonjhall wrote:
Reading the docs on nocash, it seems that the non-visble scanlines don't trigger an interrupt - can anyone confirm this? It'll be a bit of a crap time base if the the gap between 0, 1, 2, 3, 4....191 is consistent then there's suddenly a long gap from 192 to 0 again!

At least on the GBA, hblank DMA is paused during vblank, but the vcount is not paused (..., 159, 160, 161, ..., 226, 227, 0, ...), and neither are hblank interrupts.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#112827 - simonjhall - Wed Dec 20, 2006 4:53 pm

Yar! Thar be profiling!
I've got it working nicely now - I run my program, it stops after a while, dumps the results to a file and I can check out the results on the PC. Here's an example of what I get:
Code:
Loading symbols from squake ... OK
Seen 3118 symbols, stored 1891 function offsets
        address      calls seconds function
        0x2050a7c,      1, 0.0000, quake_main
        0x2050a18,   2441, 0.9714, Sys_Printf
        0x20153f8,      1, 0.3109, Host_Init
        0x200734c,    121, 0.0211, Cmd_AddCommand
        0x200bb64,    121, 0.0097, Cvar_VariableString
        0x200ba00,    332, 0.0266, Cvar_FindVar
        0x2007e88,      1, 0.0006, Cbuf_Init
        0x2007438,      1, 0.0001, Cmd_Init
        0x204d574,      1, 0.0018, V_Init
        0x200ba90,    131, 0.0372, Cvar_RegisterVariable
        0x20072c0,    131, 0.0113, Cmd_Exists
        0x204d460,      3, 0.0409, BuildGammaTable
        0x2006dec,      1, 0.0004, Chase_Init
        0x2013f68,      1, 0.0000, Host_InitVCR
        0x200b108,     36, 0.1166, Con_Printf

At the moment I can only see how much time has been spent between the beginning and exit of a function, and that includes all the time spent in child functions. It's a bit crap but it'll do for now... Oh and the executable size is pretty epic now - including all this profiling code has increased its size by like 20%. Not cool.
Apparently there's support for per-line profiling in gcc - that'd be really useful but I doubt I'd be able to get both the executable and data in memory at the same time :-(

EDIT: the method mentioned by sasq would be probably best if I wanted high-resolution timing, plus the executable wouldn't be any larger. Dunno how fast I can set a timer though.
_________________
Big thanks to everyone who donated for Quake2

#112863 - simonjhall - Wed Dec 20, 2006 10:19 pm

I've done yet more work on the profiler and I think it's pretty much finished now. You can now get the self time, call traces, etc for every function run by your program - and it seems pretty accurate too! Quickest project ever :-D

And I'm thinking of releasing both this and Quake 1 at the same time, and probably re-releasing the debugger too (with ARM7 goodness). Stay tuned!

I'd still like per-line (ie, pc) profiling, but I don't think that's gonna happen. I'd also like to do hit counting on the floating-point emulation functions inserted by the compiler, but I'd need to be able to recompile libgcc to do that. Any chance of getting profile versions of this with the next devkitpro?
_________________
Big thanks to everyone who donated for Quake2

#112874 - chishm - Thu Dec 21, 2006 12:03 am

simonjhall wrote:
And I'm thinking of releasing both this and Quake 1 at the same time, and probably re-releasing the debugger too (with ARM7 goodness). Stay tuned!

How's the lawyer situation going with that?
_________________
http://chishm.drunkencoders.com
http://dldi.drunkencoders.com

#113058 - simonjhall - Fri Dec 22, 2006 4:26 pm

chishm wrote:
How's the lawyer situation going with that?

Err... Yeah...
_________________
Big thanks to everyone who donated for Quake2

#113996 - Puyo - Tue Jan 02, 2007 2:59 pm

Hello simonjhall! Any improvments on this one?
I was wondering about how this works? Is it some kind of library? Will it be compatible with DevkitPro`s gcc? Interested in trying it out.

#113998 - simonjhall - Tue Jan 02, 2007 3:39 pm

Nah, I sorta finished it! There's also so little code needed I didn't even bother putting it in a library!

It doesn't use gcc's -pg option (or whatever the profile option is) as I couldn't find any docs as to the format of the gprof data that should be emitted. Also it only inserts a stub at the beginning of the function, and not at the end. I couldn't figure out how to do the things I wanted with just that (I needed an exit call) so I found another way...

gcc has an function instrumentation option which inserts a call to a function of your choice on entry and exit of every function (that is built with the option). That's cool, but you need to store for every function a few bits of data like time spent, number of hits etc. You could keep this in some kind of array, but you'd have to search for that entry every time the function is hit, or use some kind of hash table. So what I really wanted was an easily findable word of memory for every function. I could then put in that word a pointer to where the real stats are stored (my array). gcc also provides an option to push the name of the function directly before each function so I basically trash that and put my pointer in there :-)

When a function is exited the amount of time spent in it is computed by taking the current time minus the entry time minus the time spent in child functions. The time is measured in hblanks. It's surprisingly accurate and slows down Quake by just a few percent. At some predetermined point (eg after 100 frames) I then dump the results. On the PC I have a perl script which parses the output and makes it all nice and readable.

It's been a complete godsend as there are some functions I never realised were expensive!
Have a read of this, as this is where I derived the core function hook info from:
http://www.logix.cz/michal/devel/CygProfiler/
It only does half of the job though (does a fprintf on entry and exit, that's it) so I had to fill out the rest.

Oh and my version doesn't work in thumb at the moment but by the looks of it it's gcc's fault!
_________________
Big thanks to everyone who donated for Quake2

#114039 - HyperHacker - Wed Jan 03, 2007 3:02 am

simonjhall wrote:
gcc also provides an option to push the name of the function directly before each function

o_o and how do I activate that?
_________________
I'm a PSP hacker now, but I still <3 DS.

#114128 - simonjhall - Thu Jan 04, 2007 10:07 am

Sorry about the delay. I think it's something like -mpoke-function-name.
I'll paste the code up here when I get back from work, if you like.

It'd be nice to not have to use this option, as it increases the file size a bit more than I'd like... And I still have to fix the thumb support. Once I've done that I'll be able to see which functions perform best in which mode.
_________________
Big thanks to everyone who donated for Quake2

#114486 - Epicenter - Mon Jan 08, 2007 9:06 am

This might be too simplistic for your needs but have you considered just having the program dump a 'log' file to your SD card with a report of performance? I use a profiling system in a game I wrote for PC (will probably be porting to Dreamcast) that, during each frame, times how long the major elements of generating that frame took and writes it out to a file. I don't capture every frame as I'd be writing 60 lines of text a second to the log. But I do just grab sort of a cross section of them to get an idea of the where the major slowdown is happening.

EDIT: looks like you already solved your problem. way to not read the thread. :D Go me.

#114494 - simonjhall - Mon Jan 08, 2007 1:30 pm

Haha! What I wanted was a general-purpose profiler which found out exactly which functions were slow... I already knew which parts of the game were slow :-)

I still never got round to posting the code for this thing here...
_________________
Big thanks to everyone who donated for Quake2

#115636 - Puyo - Thu Jan 18, 2007 1:26 am

Hi! Back again. Done everything as you suggested. And it`s probably working allright. Could this be true (for test.c from cygprofiler):
Code:
On enter:

             strtVC   ---   level
+ notmain      153    0    1
+ function3    159    0    2
+ function2    161    0    3
+ function1    162    0    4
On exit:

             VCspent      VCOUNT
- function1     2     2    164
- function2     2     2    165
- function3     5     5    166
- notmain       10    10    168

Code:
int function1 (void)
{ return 0; }

int function2 (void)
{ return function1 () + 1; }

int function3 (void)
{ return function2 () + 1; }

int notmain (void)
{ return function3 (); }

And a question - do you count HBlanks too? I mean if function takes more than one HBlank to exit, you would never know that with VCount only, right?
EDIT: Posted "mine" test.c.

#115671 - simonjhall - Thu Jan 18, 2007 8:43 am

Ah, it looks like you're recording the total amount of time spent in function X and all sub-functions and then printing this time out for each function. (it looks like that anyway - stop me if I'm wrong!)

What you really want to record is the 'self time' of a function - the amount of time spent in the function but not including any of the functions called from that function. Imagine if you have a recursive function which calls itself a hundred times then calls something expensive - the total time recorded by the recursive function will be much higher than the run time of the program! So in order to figure out the time for a function you do (exit_time - entry_time) - time_spent_in_all_subfunctions (which may be recursive)

Even if I've read your results wrong, kudos on the profiling!
Oh and I do use the horizontal blanking interrupt. I find this combined with the prolog and epilog code increase the demo runtime (on Q) by about 10%. The results do look very accurate though, so hblanks are the way forward :-)

EDIT: there's a case which I spent ages getting the bugs out of. Dig this:
Code:
function x
{
    y();
    z();
}

function y()
{
   whatever
}

function z()
{
   whatever
}
I forgot that a function can call more than one function! It prevented me from calculating the self time properly.
_________________
Big thanks to everyone who donated for Quake2

#116140 - theli - Mon Jan 22, 2007 11:20 am

o_O erhmm.. can someone release the code you used for profiling?
simonjhall? Puyo?
please? :)

#116142 - simonjhall - Mon Jan 22, 2007 12:17 pm

I keep meaning to but I've been dead busy recently!
I'll try and stick it up here later when I get back from work.
_________________
Big thanks to everyone who donated for Quake2

#116168 - Puyo - Mon Jan 22, 2007 8:42 pm

Quote:
Ah, it looks like you're recording the total amount of time spent in function X and all sub-functions and then printing this time out for each function. (it looks like that anyway - stop me if I'm wrong!)

You knew it, I was doing exactly that =). Except that I thought I was handling all sub-functions. Guess not. Oh well... And printing for every function originally was producing all those extra VCounts. And your test case acts like it should. Thanks for help.

theli:
Sure, you can download example here. It`s probably have a few bugs left & warnings to remove, but I never bothered about `em. Only file that have profiling enabled in example is test.c (see makefile). By default it outputs to file "profile.txt" in root dir. You can remove DUMP define and change OUTPUT define to stdout, to print results on screen.

#116171 - simonjhall - Mon Jan 22, 2007 9:54 pm

Should I post mine too, so we can compare? I'm kinda wary of putting files in my webspace in case my ISP gets arsey over quotas etc... (again)
I guess they are only a few kilobytes in size though...
_________________
Big thanks to everyone who donated for Quake2

#116181 - Puyo - Tue Jan 23, 2007 12:12 am

Ofcourse. I`m interested, wanna check for if results differ. If you wan`t you can upload it to free hosting, but i don`t think that few KB will get you in trouble with ISP.

#116184 - simonjhall - Tue Jan 23, 2007 12:30 am

Ok, I just uploaded it.
http://myweb.tiscali.co.uk/simonhall/cyg-profile.c
http://myweb.tiscali.co.uk/simonhall/cyg-profile.h
http://myweb.tiscali.co.uk/simonhall/resolve.pl

You compile your (ARM-only) regular code with -finstrument-functions -mpoke-function-name. Don't compile the files which handle disk access or anything related to the profiling functions, otherwise you'll get stuck in a loop.

Initialise the profiler with cygprofile_begin().
Start profiling with cygprofile_enable().
Stop profiling with cygprofile_disable().
Write the results to disk with cygprofile_end().
Parse the results (on the PC) with
Code:
./resolve.pl squake /cygdrive/k/CYGLOG.TXT > /cygdrive/k/sorted.csv

...or something like that :-)

These files are tailored for Quake btw, so they won't compile outa the box. I normally run my profiling for 700 frames (nearly the length of the Necropolis demo) then write everything to disk - that's why there's some divide-by-700s in the Perl script!

Oh and thanks go out to the people who wrote the original cyg-profile stuff (before I rewrote all the juicy bits)
_________________
Big thanks to everyone who donated for Quake2

#116364 - Puyo - Wed Jan 24, 2007 8:39 pm

So I did found a lot of bugs in my profiler. Basically because of my misunderstanding of DS hardware & lack of knowledge in C.
1st: few posts ago I was asking about HBlank interrupts, but actually I was thinking about using VBlank, simply because it happens more rarely & you could add 262 HBlanks for 1 VBlank + REG_VCOUNT to measure current time. WRONG. Actually VBlank happens at 192 line & that changes a lot. I could probably use it too, but some checking needs to be done than.
2nd: saving start time of levels, not functions (solves recursivness).

[ random C questions ]
And shouldn`t 'fwrite("name", "w")' erase all data from file?
What this means:
warning: array subscript has type 'char'
for
Code:
// entry is a array of structures & funcName is array of chars
entry[i].funcName[30] = 0; // Null-terminate string

#116369 - simonjhall - Wed Jan 24, 2007 10:05 pm

For the horizontal blank I too was interested in doing the vblank/hblank split thing in order to count the time (eg using VCOUNT), but in the end I just set up an interrupt. When the interrupt occurs it just increases some variable by one.
Yeah, calling a one-line function 15000 times per second isn't too efficient, but I don't really care :-)

And the fwrite thing - shouldn't that be fopen instead? But yeah, it should clear the file and start writing from byte zero. When you save it won't include any of the existing stuff. I think... Or is that "w+"?

And the subscript warning thing - I think it's talking about either 30 or i (prob i). Can't see why it'd care though...
_________________
Big thanks to everyone who donated for Quake2

#139075 - DragonMinded - Sat Sep 01, 2007 4:12 pm

Was this ever going to be properly hosted?
_________________
Enter the mind of the dragon.

http://dragonminded.blogspot.com

Seriously guys, how hard is it to simply TRY something yourself?

#139127 - simonjhall - Sun Sep 02, 2007 1:42 pm

Yeah, I'm not with tiscali any more so I'm not surprised my web space doesn't work!
Here's it hosted on drunkencoders (thanks again to dovoto):
http://quake.drunkencoders.com/profiler/resolve (perl script)
http://quake.drunkencoders.com/profiler/cyg-profile.c
http://quake.drunkencoders.com/profiler/cyg-profile.h

Instructions per the post a few above this. It's still Quakified, but shouldn't be hard to convert to whatever you want to run it with.
NEW STUFF: it now works in THUMB mode, so you're no longer forced to use ARM mode to get it to work.
_________________
Big thanks to everyone who donated for Quake2