#53316 - Nacho - Wed Sep 07, 2005 1:45 pm
Hello there! I?m coding a simple 3D engine using plain C and VHam as a compiling toolchain. So far I?ve managed to raster an affine textured triangle that rotates on its own Y-axis using mode 3 but it?s very very slow! That?s why I decided to come here so that you can suggest me some kind of optimization I should look into so I can speed this thing up a little bit.
Some information I think you might want to know about the code:
* Everything is done in fixed point arithmetic using a 16.16 format.
* The only place that a division is performed is inside the fixed point division function:
Code: |
inline FIXP1616
FIXP1616_Div(FIXP1616 n1,FIXP1616 n2)
{
return ((n1<<8)/(n2>>8));
} /* End FIXP1616_Div() */
|
* The modulus operator is not used at all.
All input and suggestions will be highly appreciated. Also, if you need to have access to certain parts of the code just ask me and I?ll post it here.
Thanks in advance for your help!
--Nacho
#53321 - ribrdb2 - Wed Sep 07, 2005 2:47 pm
maybe you should profile it and figure out what needs to be optimized. That's how I usually optimize my 3d engines.
#53341 - Miked0801 - Wed Sep 07, 2005 6:36 pm
The division occurs how often? Usually once per scanline for texturing.
1. Quick fix - use the built it GBA Div function instead of /. This will x2 - x3 the speed of the divide.
2. Consider grabbing an optimized ARM divider instead. Jeff F. has one on his site. That makes it go another 50% faster.
3. Make a HUGE 1/X lookup table and multiply by that instead of dividing. This makes the division a non-factor.
!!!0. Profle your code so you know where the slow downs are occuring.
#53378 - Royale00 - Thu Sep 08, 2005 1:39 am
I wouldnt recommend mode 3 for any kind of 3d rendering as you only get one framebuffer in hardware. Depending on how far you want your engine to progress in features I would suggest mode 4 which provides 2 framebuffers and is color indexed. As far as coding goes use a look up table for divisions as Mike said and if you use mode 4 you can get away with writing multiple pixels at once because of the bus width to the vram. You may also want to unroll loops and code some of them in assembler but again its all about what you want to do with your engine.
#53387 - Nacho - Thu Sep 08, 2005 2:29 am
Thanks for answering to my post!
@ ribrdb2: Thanks for your suggestion. I?ve never used any kind of profiler for GBA, so which one do you suggest?
@ Miked0801: Thanks for your advice on alternatives to the division operator. I?ve searched for Jeff?s optimized ARM divider but the link seems to be broken ( www-test.cirrus.com/en/pubs/ presentation/ep9312pres-epf-1.pdf ). Can you post it here?
@ Royale00: Thanks for your advice. I made a mistake in my original post by saying that I was using mode 3 when actually I?m using mode 4. You said that in mode 4 multiple pixels can be written at once because of the bus width to VRAM. Does the memset function take advantage of that or should I roll out my own implementation?
Thanks again for your replies!
--Nacho
#53388 - crossraleigh - Thu Sep 08, 2005 2:39 am
http://www.peter-teichmann.de/adiv1e.html
_________________
My world is black and white, but if I blink fast enough, I see it in grayscale.
#53485 - Miked0801 - Thu Sep 08, 2005 6:35 pm
Quote: |
Does the memset function take advantage of that or should I roll out my own implementation?
|
Please tell me you are not calling memset to write pixels. Please tell me you are instead doing this yourself in either assembly or hand-optimized C code. If you are using memset, this will be bottleneck #1 to fix. Check the archives as we've posted highly optimized assembler pixel/line drawers in the past.
#53520 - Quirky - Thu Sep 08, 2005 11:19 pm
Nacho wrote: |
I?ve never used any kind of profiler for GBA, so which one do you suggest?
|
gprof works with VBA. I used it on my own flat shaded poly 3D engine, here is more or less my main.c:
Code: |
#ifdef ENABLE_PROFILING
#include "vba.h"
#endif
int main(void)
{
#ifdef ENABLE_PROFILING
monstartup(0x0800045c, 0x080155a8);
#endif
initialise_gba();
while (1) {
// intro sequence
start();
// everything else..
main_loop();
#ifdef ENABLE_PROFILING
moncleanup();
#endif
}
return 0;
}
|
Then I add -pg -DENABLE_PROFILING to my CFLAGS when I want to profile the code. The code addresses for profiling are project depednent, take them from a map/objdump file. make clean all and ready to go. Play the game normally on VBA, make sure it passes through moncleanup() when you want to end profiling. For my game, I exit the main_loop() function when it is game over. Get killed == end profiling. Simple.
You need to link in libvba from http://vba.ngemu.com/downloads.shtml
The output is gmon.out that contains the profiling data. Use gprof mygame.elf and it produces a big load of info on calls, time taken and stuff. From here you can see what takes the longest. In my case a nasty C scanline routine was a bottle neck, so I coded it in arm with an unrolled str loop and placed in iwram.
That (unfortunately) had the effect of moving the routine out of the memory range profiled, but the noticeable speed up made the decision justified. Even if you don't fancy coding in assembler, the profiling will help to see which routines are called most often and then you can decide to either ditch some of them if not needed or improve them if they are vital. HTH.
#53544 - Nacho - Fri Sep 09, 2005 2:17 am
Thanks for the replies!
@ crossraleigh: Thanks for the link! I have already downloaded a set of optimized functions from here: http://www.gbadev.org/index.php?ID=109 but thanks anyway for the link!
@ Miked0801: As a matter of fact, yes, I?m using memset in the flat shaded rendering functions. I undestand that it could be optimized by using ASM, but how can I optimize it in C? In any case, the function that gets called to render a textured triangle works on a pixel by pixel basis (because of the interpolation), so I should look for the bottlenecks in that function elsewhere. Nonetheless, I?m interested in the optimization of the memset function so I can speed up the flat shaded renderer. Thanks for your suggestion, I have been checking the archives and found the SWI library I mentioned above. I?ll check during the weekend for the pixel and line drawers. Thanks again for your help!
@ Quirky: Thanks for telling me about gprof and how to use it. I?ll try it in the weekend and let you know about the results from the profiling tests. Do you have a binary file with a demo of your flat shaded 3D engine?
Thanks again for all your help!
--Nacho
#53555 - DekuTree64 - Fri Sep 09, 2005 4:19 am
Generally you'll want to do straight pointer arithmetic instead of function calls in your inner loops. Something like:
Code: |
u16 *lineBuffer = screen + yTop*SCREEN_WIDTH;
u16 *lineBufferEnd = screen + yBottom*SCREEN_WIDTH;
while(lineBuffer < lineBufferEnd)
{
u16 *dest = lineBuffer + FIXED_TO_INT(xStart);
u16 *end = lineBuffer + FIXED_TO_INT(xEnd);
while (dest < end)
*dest++ = color;
xStart += xStartInc;
xEnd += xEndInc;
lineBuffer += SCREEN_WIDTH;
} |
Oh, and you only need to divide once per triangle for affine texturing, not every scanline. The u and v gradients will (theoretically) always come out to be the same on each scanline.
Another nice thing about that is then you don't even need to interpolate the u/v values down the right side of the triangle, because they're not needed. You only need the left edge starting values, and the constant gradients to add after plotting each pixel.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku
#53682 - Nacho - Sat Sep 10, 2005 4:23 pm
Quirky wrote: |
The code addresses for profiling are project depednent, take them from a map/objdump file. |
How do I create the map/objdump file? I?ve been searching in the forum but all I couldn?t find the answer (there were many questions regarding the creation of a map file for nintendo ds, though). Can I determine those addresses with the memory viewer included in Visual Boy Advance?
Thanks in advance!
--Nacho
#53694 - tepples - Sat Sep 10, 2005 6:36 pm
The line of your batch file or makefile that calls objcopy will generally have the form 'arm-elf-objcopy -O binary game.elf game.gba'. Find the name of the elf file and then run
arm-elf-nm -n game.elf
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.
#53708 - Nacho - Sat Sep 10, 2005 9:28 pm
Tepples, thanks for replying. I?m a little bit lost with all this stuff so I?m going to bother you again with my questions. I?m using VHAM as a compiler toolchan so the elf file is automatically created whenever I compile the project. Unfortunately, the arm-elf-nm file is not included in VHAM?s installation but I managed to get it from DevKitAdvance. I ran the command arm-elf-nm -n Raster3D.elf and it displayed a very long list of memory addresses and names but the problem is that I can only see the last part because Win32?s console allows me to scroll only a few lines above. How can I solve this? Is it possible to compile the project so that I can see the memory layout in Visual Boy Advance?
Thanks again for your help!
--Nacho
#53726 - tepples - Sun Sep 11, 2005 7:31 am
Nacho wrote: |
I ran the command arm-elf-nm -n Raster3D.elf and it displayed a very long list of memory addresses and names but the problem is that I can only see the last part because Win32?s console allows me to scroll only a few lines above. How can I solve this? |
Read this page, especially "Redirecting output".
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.
#53741 - Quirky - Sun Sep 11, 2005 11:09 am
I use -Wl,-Map,game.map at link time to create the file game.map. It doesn't actually matter that much, just so long as you get your code that you want to profile in the range. You can probably make a decent guess from the size of your gba file:
ls -l myfile.gba | awk '{printf "0x%x\n",0x08000000+$5}'
#53752 - Nacho - Sun Sep 11, 2005 3:35 pm
tepples, thanks for the link. I redirected the output to a file named dump.out and found out that the upper and lower memory values are 0x08000218 and 0x0800A780.Then I ran gprof Raster3D.elf from the command prompt and got the following error message:
Code: |
BFD: Dwarf Error: Invalid or unhandled FORM value: 14.
BFD: Dwarf Error: Invalid or unhandled FORM value: 14.
3 [main] gprof 3012 handle_exceptions: Exception: STATUS_ACCESS_VIOLATION
677 [main] gprof 3012 open_stackdumpfile: Dumping stack trace to gprof.exe.stackdump
|
Maybe I haven't written the right addresses. Here's what I have done to find them so that you can point me out if I have done something wrong:
* I looked for main's address in dump.out:
* I scrolled down the file till I found the last function from the last compiled file (which is vba.s):
Code: |
0800a774 T vbalog
0800a780 T ham_EmptyInt
|
Are the addresses ok? I also get the following message from VBA whenever I try to run Raster3D.gba:
Unsupported BIOS function fe called from 0800a776. A BIOS file is needed in order to get the correct behaviour
After that, I click the OK button and the emulation starts. Thanks again for all your help!!
--Nacho
EDIT: I think it's important to say that all the functions are in EWRAM.
#53767 - Quirky - Sun Sep 11, 2005 6:54 pm
Nacho wrote: |
tepples, thanks for the link. I redirected the output to a file named dump.out and found out that the upper and lower memory values are 0x08000218 and 0x0800A780.
EDIT: I think it's important to say that all the functions are in EWRAM. |
You need to specify addresses in EWRAM for your profile. 0x02000000 to 0x02040000 ought to do it!
As for the errors: are you using devkitARM with gcc 4.0?
#53788 - Nacho - Sun Sep 11, 2005 11:31 pm
Quirky wrote: |
You need to specify addresses in EWRAM for your profile. 0x02000000 to 0x02040000 ought to do it! |
Sorry, I made a mistake in my previous post. I?ve been reading the CowBite specs and found out that 0x08000000 is the beginning of the GAME PAK ROM area. I (wrongly) assumed that, by default, functions were automatically placed in EWRAM by gcc. Nonetheless, now I?m completely lost! Which parameters should I pass to monstartup()? The GAME PAK ROM area? Or should I pass EWRAM?
Quirky wrote: |
As for the errors: are you using devkitARM with gcc 4.0? |
I?m using VHAM, but only as a compiler toolchain. I downlaoded the latest release of devkitARM because I needed arm-elf-nm.exe.
Quirky, thanks again for your help!
--Nacho
#53795 - tepples - Mon Sep 12, 2005 1:33 am
Excuse me if I haven't understood the whole thread, but if you're trying to make a cart-based game, why exactly are you running code other than a savegame driver from EWRAM instead of ROM? Are you trying to make a multiboot version? Or are you trying to save battery, or allow for pak-swapping?
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.
#53800 - Nacho - Mon Sep 12, 2005 2:08 am
tepples wrote: |
Excuse me if I haven't understood the whole thread, but if you're trying to make a cart-based game, why exactly are you running code other than a savegame driver from EWRAM instead of ROM? Are you trying to make a multiboot version? Or are you trying to save battery, or allow for pak-swapping? |
No, no, I?m not trying to run the code from EWRAM. I had the wrong idea that the code that wasn?t placed in IWRAM went straight into EWRAM and that EWRAM began at 0x08000000. That is, I though that the program?s code could only be placed in IWRAM or EWRAM. After reading Cowbite?s specs I found out that I was wrong. Now, as I?ve stated in my last post, I would like to know which address range should I pass to monstartup() so that I can start profiling my code.
Thanks for all your help!
--Nacho
#53808 - tepples - Mon Sep 12, 2005 3:27 am
To catch all code running in ROM use 0x08000000-0x09FFFFFF.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.
#53809 - Nacho - Mon Sep 12, 2005 3:34 am
tepples wrote: |
To catch all code running in ROM use 0x08000000-0x09FFFFFF. |
I?ve tried it, but I get the same error when executing gprof. Do you mind if I send you the elf file?
--Nacho
#53810 - tepples - Mon Sep 12, 2005 3:39 am
I don't know anything about gprof, so please take your gprof questions somewhere else. When I want to profile something that runs in a tight loop, I use internal timers such as VCOUNT.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.
#53811 - Nacho - Mon Sep 12, 2005 3:52 am
tepples wrote: |
I don't know anything about gprof, so please take your gprof questions somewhere else. When I want to profile something that runs in a tight loop, I use internal timers such as VCOUNT. |
Nevermind, thanks anyway for all your help!
--Nacho
#53892 - Nacho - Tue Sep 13, 2005 2:56 am
DekuTree64: Seems I?ve been paying too much attention to gprof and I forgot to answer your post. Sorry for that.
DekuTree64 wrote: |
Generally you'll want to do straight pointer arithmetic instead of function calls in your inner loops. |
Yes, I?m doing pointer arithmetics in the affine texture mapper, but is a loop faster than a call to memset()?
DekuTree64 wrote: |
Oh, and you only need to divide once per triangle for affine texturing, not every scanline. The u and v gradients will (theoretically) always come out to be the same on each scanline. |
Yes, I?m only performing divisions at the very beginning of the function to find the gradients. The only operation performed inside the loop is addition.
DekuTree64 wrote: |
Another nice thing about that is then you don't even need to interpolate the u/v values down the right side of the triangle, because they're not needed. You only need the left edge starting values, and the constant gradients to add after plotting each pixel. |
That?s a cool optimization! I?ll rewrite the renderer using that optimization whenever I get the time!
Thanks again for your input! Hah, not only you help me out at gamedev.net, but also here =P
--Nacho
#53997 - kusma - Wed Sep 14, 2005 1:01 pm
a couple of easy yet effective optimizations/cheats here... not sure how you do things, but my routines got a nice speedup from these.
- texture-map in 2x1, but rasterize 1x1 ;)
- interpolate u and v in one single register, in the format UUuuVVvv
that way you can reorder them to actual texture-offsets by doing (uv & 0xFF00) | (uv >> 24). this compiles to two instructions and only one addition. there are some minor issues with u overflowing into v on wrapping (and oposite), but as this is only one fractional-bit, this is not noticable (hey, it's one 256th of a texel).
- unroll your filling-loop heavily. a duffs device is nice to speed things up while still having dynamic fill-length. something like this:
Code: |
#define ITERATION /* the code to fill one pixel here*/
register unsigned c = counter >> 4;
switch (counter&15)
{
do {
ITERATION;
case 15: ITERATION;
case 14: ITERATION;
case 13: ITERATION;
case 12: ITERATION;
case 11: ITERATION;
case 10: ITERATION;
case 9: ITERATION;
case 8: ITERATION;
case 7: ITERATION;
case 6: ITERATION;
case 5: ITERATION;
case 4: ITERATION;
case 3: ITERATION;
case 2: ITERATION;
case 1: ITERATION;
case 0:;
} while (c--);
}
#undef ITERATION
|
#54073 - yamaneko - Thu Sep 15, 2005 10:04 am
kusma , with this texture rootine. How many polygon, filling the screen in mod4, can you display at 30 fps?
_________________
山猫
#54076 - kusma - Thu Sep 15, 2005 10:47 am
that i'm unsure of. it depends on a lot of factors, like polygon size and width. i can benchmark my filler, but i dont have time for that now ;)
#54564 - Nacho - Tue Sep 20, 2005 1:18 pm
kusma wrote: |
- texture-map in 2x1, but rasterize 1x1 ;) |
You mean I should plot every texel twice so that the inner loop performs fewer calculations?
kusma wrote: |
- unroll your filling-loop heavily. a duffs device is nice to speed things up while still having dynamic fill-length. |
Great! Thanks! I even didn?t know about the existance of this techinique. I?ll look into it, thanks!
#54661 - kusma - Wed Sep 21, 2005 11:11 am
Nacho wrote: |
You mean I should plot every texel twice so that the inner loop performs fewer calculations?
|
well, when you do an 8bit write to vram, it ends up as a 16bit write with both high and low byte set to the 8bit value you wrote. abusing this behavour is a nice way to draw two pixels at the same time in mode4 ;)
#54769 - Nacho - Thu Sep 22, 2005 1:34 pm
kusma wrote: |
well, when you do an 8bit write to vram, it ends up as a 16bit write with both high and low byte set to the 8bit value you wrote. abusing this behavour is a nice way to draw two pixels at the same time in mode4 ;) |
Abusive, yet effective! :) Thanks for the trick!
I?ve changed my filler?s inner loop (mode 4) so that it makes use of Duff?s device. Here?s the new version:
Code: |
for(iLoopY=iY1;iLoopY<=iY3;iLoopY++,lpDestBuffer=lpAux) {
iCurrX = iXS = FIXP1616_WHOLE_PART(fixpXS);
iXE = FIXP1616_WHOLE_PART(fixpXE);
lpAux = lpDestBuffer;
lpDestBuffer += (iXS>>1);
iVar = iXE - iXS + 1;
n = (iVar + 7) >> 3;
if( (IS_ODD(iCurrX) && IS_ODD(iVar)) ||
(IS_EVEN(iCurrX) && IS_EVEN(iVar)) ) {
switch((iVar) & 0x07) {
case 0: do{ PUT_EVEN_PIXEL_M4(lpDestBuffer,u8ColorEntry);
case 7: PUT_ODD_PIXEL_M4(lpDestBuffer,u8ColorEntry);
case 6: PUT_EVEN_PIXEL_M4(lpDestBuffer,u8ColorEntry);
case 5: PUT_ODD_PIXEL_M4(lpDestBuffer,u8ColorEntry);
case 4: PUT_EVEN_PIXEL_M4(lpDestBuffer,u8ColorEntry);
case 3: PUT_ODD_PIXEL_M4(lpDestBuffer,u8ColorEntry);
case 2: PUT_EVEN_PIXEL_M4(lpDestBuffer,u8ColorEntry);
case 1: PUT_ODD_PIXEL_M4(lpDestBuffer,u8ColorEntry);
} while(--n>0);
} /* End switch */
} /* End if */
else {
switch((iVar) & 0x07) {
case 0: do{ PUT_ODD_PIXEL_M4(lpDestBuffer,u8ColorEntry);
case 7: PUT_EVEN_PIXEL_M4(lpDestBuffer,u8ColorEntry);
case 6: PUT_ODD_PIXEL_M4(lpDestBuffer,u8ColorEntry);
case 5: PUT_EVEN_PIXEL_M4(lpDestBuffer,u8ColorEntry);
case 4: PUT_ODD_PIXEL_M4(lpDestBuffer,u8ColorEntry);
case 3: PUT_EVEN_PIXEL_M4(lpDestBuffer,u8ColorEntry);
case 2: PUT_ODD_PIXEL_M4(lpDestBuffer,u8ColorEntry);
case 1: PUT_EVEN_PIXEL_M4(lpDestBuffer,u8ColorEntry);
} while(--n>0);
} /* End switch */
} /* End else */
fixpXS += fixpDXLeft;
fixpXE += fixpDXRight;
lpAux += u8ScreenWidthDiv2;
} /* End for iLoopY */
|
Code: |
#define PUT_EVEN_PIXEL_M4(lpDestBuffer,u8ColorEntry) \
*lpDestBuffer |= u8ColorEntry;
#define PUT_ODD_PIXEL_M4(lpDestBuffer,u8ColorEntry) \
*(lpDestBuffer++) |= (u8ColorEntry << 8);
|
As you can see from the previous snippet, I divided the code in two cases, depending on whether the first pixel to be plotted is even or odd. Is it possible to merge both cases into one? I?m asking because I spent a few hours thinking other approaches but I couldn?t find a way to do it. Thanks!
--Nacho
#54797 - Miked0801 - Thu Sep 22, 2005 6:09 pm
Here:
fillLine()
if(firstPixelAddr & 0x01)
{
PUT_ODD_PIXEL
}
while(numtofill >> 1)
{
write both pixels
}
if(pixelAddr & 0x01)
{
writeLast Pixel;
}
}
#54853 - Nacho - Fri Sep 23, 2005 2:19 am
Miked0801, thanks for your post but I?m not sure how your pseudocode solves the problem I mentioned. I had to write two different cases since the Duff?s device cares not only if the first pixel is even or odd but also cares about the remaining number of pixels. Do you mind clarifying your pseucode a bit? Thanks again!
--Nacho
#54911 - kusma - Fri Sep 23, 2005 3:50 pm
Nacho wrote: |
Code: |
#define PUT_EVEN_PIXEL_M4(lpDestBuffer,u8ColorEntry) \
*lpDestBuffer |= u8ColorEntry;
#define PUT_ODD_PIXEL_M4(lpDestBuffer,u8ColorEntry) \
*(lpDestBuffer++) |= (u8ColorEntry << 8);
|
|
this looks like flatshading to me. in that case, i'd rather do 32bit writes direcly.
#54918 - Nacho - Fri Sep 23, 2005 5:20 pm
It is flatshading, indeed! But if I do 32 bit writes, how do I take into account in a clean manner those special cases where the fill length is less than 4 pixels?
--Nacho
Edit: I?ve checked my previous posts and found out that I didn?t say I was referring to the flat shader. Sorry :(
#54920 - Miked0801 - Fri Sep 23, 2005 5:38 pm
if(VramAddr & 0x03)
{
// Write individual pixels
}
else if(remaining run length is >= 4)
{
// Write pixels in groups of 4.
}
else
{
// Write last pixels
}
For smaller fills (say less than about 8 pixels across) filling in groups of 2 will be more efficient. We left ours at 2 pixels as we had lots of smaller polys as opposed to a few large ones.
#55311 - Nacho - Tue Sep 27, 2005 6:04 pm
Ok, I?ll try it during the weekend (this is my mid-term exams? week) and let you know about the results.
Thanks again for your input!
--Nacho
#56392 - Nacho - Sat Oct 08, 2005 12:32 pm
I?ve modified my filler to take advantage of 32 bits writes. I?m posting the inner loop here in case somebody needs it someday:
Code: |
for(iLoopY=iY1;iLoopY<=iY3;iLoopY++,lpusDestBuffer=lpusAuxBuffer,
lpuiDestBuffer=lpuiAuxBuffer) {
iXS = FIXP1616_WHOLE_PART(fixpXS);
iXE = FIXP1616_WHOLE_PART(fixpXE);
uiNPixelsToWrite = iXE - iXS + 1;
if(MOD_4(iXS)) {
lpusDestBuffer += (iXS>>1);
do {
PUT_PIXEL_M4(lpusDestBuffer,u8ColorEntry,iXS);
if(IS_ODD(iXS)) lpusDestBuffer++;
iXS++; uiNPixelsToWrite--;
} while(uiNPixelsToWrite && MOD_4(iXS));
} /* End if */
if(uiNPixelsToWrite >= 4) {
lpuiDestBuffer += (iXS>>2);
do {
*lpuiDestBuffer++ = uiColorEntry;
iXS += 4; uiNPixelsToWrite -= 4;
} while(uiNPixelsToWrite >= 4);
} /* End if */
if(uiNPixelsToWrite) {
lpusDestBuffer = lpusAuxBuffer + (iXS>>1);
do {
PUT_PIXEL_M4(lpusDestBuffer,u8ColorEntry,iXS);
if(IS_ODD(iXS)) lpusDestBuffer++;
iXS++; uiNPixelsToWrite--;
} while(uiNPixelsToWrite);
} /* End if */
fixpXS += fixpDXLeft;
fixpXE += fixpDXRight;
lpusAuxBuffer += uiScreenWidthDiv2;
lpuiAuxBuffer += uiScreenWidthDiv4;
} /* End for iLoopY */
|
The macros are defined as:
Code: |
#define PUT_PIXEL_M4(lpusDestBuffer,u8ColorEntry,x) \
*lpusDestBuffer |= IS_EVEN(x) ? (u8ColorEntry) : ((u8ColorEntry) << 8);
#define MOD_4(a) ((a) & 0x3)
|
Thanks again for all your help, I?m learning a lot here!
--Nacho
#56669 - Miked0801 - Mon Oct 10, 2005 3:47 pm
What exactly is this code trying to do? There's no comments and it's playing with so many different variable names that I am a bit confused...
#56744 - Nacho - Tue Oct 11, 2005 12:58 am
Okay, I?ll clarify a bit. The code above is the inner loop of my triangle flat filler function and it rasterizes flat bottom triangles.
* iY1 and iY3 specify the lines where the rasterization begins and ends, respectively.
* lpusDestBuffer and lpuiDestBuffer are used for 16 bit and 32 bit writes, respectively. lpusAuxBuffer and lpuiAuxBuffer store the beginning of the current line that is being rasterized (they?re incremented at the bottom of the loop?s body).
* iXS and iXE specify in which column the rasterization begins and ends, respectively. The data is stored in fixed point format in fixpXS and fixpXE and their integral parts are retrieved by FIXP1616_WHOLE_PART().
Code: |
if(MOD_4(iXS)) {
lpusDestBuffer += (iXS>>1);
do {
PUT_PIXEL_M4(lpusDestBuffer,u8ColorEntry,iXS);
if(IS_ODD(iXS)) lpusDestBuffer++;
iXS++; uiNPixelsToWrite--;
} while(uiNPixelsToWrite && MOD_4(iXS));
} /* End if */
|
The previous snippet deals with those lines that don?t start in a "multiple of 4" pixel. The 16 bit writes pointer is used.
Code: |
if(uiNPixelsToWrite >= 4) {
lpuiDestBuffer += (iXS>>2);
do {
*lpuiDestBuffer++ = uiColorEntry;
iXS += 4; uiNPixelsToWrite -= 4;
} while(uiNPixelsToWrite >= 4);
} /* End if */
|
The previous piece of code writes 4 bytes at the time as long as the number of pixels to write is equal or more than 4.
Code: |
if(uiNPixelsToWrite) {
lpusDestBuffer = lpusAuxBuffer + (iXS>>1);
do {
PUT_PIXEL_M4(lpusDestBuffer,u8ColorEntry,iXS);
if(IS_ODD(iXS)) lpusDestBuffer++;
iXS++; uiNPixelsToWrite--;
} while(uiNPixelsToWrite);
} /* End if */
|
This snippet deals with the remaining pixels (3 pixels at most).
Code: |
fixpXS += fixpDXLeft;
fixpXE += fixpDXRight;
lpusAuxBuffer += uiScreenWidthDiv2;
lpuiAuxBuffer += uiScreenWidthDiv4;
} /* End for iLoopY */
|
In this last snippet the start and finish points are incremented with the horizontal gradient and the screen pointers are moved to the following scanline.
Hope this has clarified things a bit!
--Nacho