gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

DS development > Various 3D engine questions (was: 3D engine lagging?)

#164682 - Echo49 - Thu Nov 13, 2008 9:27 pm

Code:
   queue<object*> buffer;
   FillBuffer(buffer);
   
   list<object*> drawing;
   
   u32 startTime = clock.Time();
   
   while(1)
   {
      glPushMatrix();
      
      /* magic starts here */
      while (buffer.size() > 0 && clock.Time() - startTime >= buffer.front()->GetTime())
      {
         buffer.front()->SetDraw(true);
         drawing.push_back(buffer.front());
         buffer.pop();
      }
      
      for (list<object*>::iterator it = drawing.begin(); it != drawing.end(); ++it)
         (*it)->Draw();
      
      while (drawing.size() > 0 && drawing.front()->GetDraw() == false)
         drawing.pop_front();
      
      /* magic finishes here */
      
      glPopMatrix(1);
      
      iprintf("\x1b[22;0H%i/%i      ", drawing.size(), buffer.size());
      iprintf("\x1b[23;0HTIME:%i",clock.Time());
      iprintf("\x1b[23;20HVCOUNT:%i  ", REG_VCOUNT);
      
      swiWaitForVBlank();   
      glFlush(0);
      
      //increment gameclock
      clock.Step();
   }


This is the main loop of my code. When the program is idling, REG_VCOUNT reads 197 (192 if I take out the iprintfs). However, if REG_VCOUNT exceeds about 205 (about 5-6 objects being drawn), the game starts lagging, as if it's taking 2 frames instead of 1 to do the drawing.

The drawing is only a single quad per object (4 vertices), so I can't be exceeding any sort of vertex limit.

In the code, the member accessed by SetDraw() and GetDraw() is automatically set to false after it has finished drawing, about 3000ms per object.

Does anyone know why this is happening, and how I can fix it?


Last edited by Echo49 on Fri Nov 14, 2008 12:15 am; edited 1 time in total

#164683 - elhobbs - Thu Nov 13, 2008 10:06 pm

there is no need to use
Code:
swiWaitForVBlank();
if your code is taking more than one frame to put new objects in the graphics pipeline then it will spin waiting for the next vblank

#164684 - DekuTree64 - Thu Nov 13, 2008 10:19 pm

It's still a good idea to wait for VBlank, since you'll most likely be doing some 2D things on the other screen, and that generally needs more careful timing to avoid messing with VRAM for the next frame before the previous has even finshed drawing.

The problem is that the flush should come before the wait. After you send a flush command, it doesn't actually execute until the start of VBlank. So if you send it right after the start of VBlank, it will wait all the way until the next one.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

#164685 - Echo49 - Thu Nov 13, 2008 10:23 pm

Thanks, that solved it.

I'm curious though, if glFlush() waits for VBlank, doesn't that make swiWaitForVBlank() redundant?

#164687 - DekuTree64 - Thu Nov 13, 2008 10:37 pm

It doesn't make the CPU wait for VBlank, it just sits in the 3D FIFO until VBlank, while the CPU moves on with whatever it was doing. But generally you want the CPU to wait for VBlank too.

The stalling thing elhobbs mentioned comes up because the flush command sitting in the FIFO prevents any other commands from being processed until it finishes. So if you send a bunch more commands like for rendering a model, the FIFO will fill up. And when the CPU tries to write to the FIFO while it's full, the write stalls until space opens up for it.

If you're only doing 3D, a stall would have basically the same effect as a wait for VBlank, but it's generally not a good idea since I don't think the CPU can process interrupts like HBlank and timers while stalled, but it can during swiWaitForVBlank.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

#164688 - Echo49 - Thu Nov 13, 2008 11:30 pm

Thanks for the explanation.

I have another question. What is the difference between each of the texture coordinates transformation modes?

ie.
TEXGEN_OFF
TEXGEN_TEXCOORD
TEXGEN_NORMAL
TEXGEN_POSITION

One other thing is that although No$gba emulates it fine, on my DS some/all (can't tell, because some are pretty small) my textures have the last column of pixels put in the first column instead.

The texture is 128x128, so I use the coordinates (0,0), (128,0), (0,128), (128,128) when mapping the texture to my quad. I have not set any wrapping or repeating settings.

Any ideas on this?

#164689 - naleksiev - Fri Nov 14, 2008 12:17 am

Echo49 wrote:
The texture is 128x128, so I use the coordinates (0,0), (128,0), (0,128), (128,128) when mapping the texture to my quad. I have not set any wrapping or repeating settings.

Any ideas on this?


First you better set wrap(repeat) to none instead of not setting it. Also your texture coordinates should be from 0 to 127 not to 128.

#164692 - Echo49 - Fri Nov 14, 2008 1:59 am

The problem still exists even when I disable wrapping. More of the texture is cut off if I reduce the texture coordinates to 127.

This is what happens when I compile my code:
[Images not permitted - Click here to view it]

At first I thought I might be loading the texture wrong, but if it displays properly in No$gba then there shouldn't be a problem?

In any case, this is the code I'm currently using. I can't see any problems with it.
Code:
glGenTextures(1, textures);
int palette = gluTexLoadPal((u8*)palette_bin, 16, GL_RGB16);
glBindTexture(0, textures[0]);
glTexImage2D(0, 0, GL_RGB16, TEXTURE_SIZE_128, TEXTURE_SIZE_128, 0,
    (GL_TEXTURE_COLOR0_TRANSPARENT | TEXGEN_TEXCOORD) & ~(GL_TEXTURE_WRAP_S | GL_TEXTURE_WRAP_T),
    (u8*)texture_bin);

// ...

glBindTexture(0, textures[0]);
glColorTable(GL_RGB16, palette);

glBegin(GL_QUADS);

GFX_TEX_COORD = TEXTURE_PACK(0,0);
glVertex3v16(x,y,0);
GFX_TEX_COORD = TEXTURE_PACK(inttot16(128),0);
glVertex3v16(x+width,y,0);
GFX_TEX_COORD = TEXTURE_PACK(inttot16(128),inttot16(128));
glVertex3v16(x+width,y+height,0);
GFX_TEX_COORD = TEXTURE_PACK(0,inttot16(128));
glVertex3v16(x,y+height,0);

glEnd();


To make it look "right" on the DS, I have to use (2,2) to (130,130) instead.


Last edited by Echo49 on Fri Nov 14, 2008 3:20 am; edited 1 time in total

#164693 - DiscoStew - Fri Nov 14, 2008 3:10 am

I've not gotten myself into textures much, but does the direction for setting the texture coordinates follow the same rule like with vertices? Looking at the textured quad example provided in devkitARM, and comparing it to yours, it looks like the two are going in opposite directions, where one goes clockwise, and the other counter-clockwise. What I mean is this....

Yours is like this......
Code:

GFX_TEX_COORD = TEXTURE_PACK(0, 0);
...
GFX_TEX_COORD = TEXTURE_PACK(inttot16(128), 0);
...
GFX_TEX_COORD = TEXTURE_PACK(inttot16(128), inttot16(128));
...
GFX_TEX_COORD = TEXTURE_PACK(0, inttot16(128));


but the devkitARM example is this...
Code:

GFX_TEX_COORD = TEXTURE_PACK(0, inttot16(128));
...
GFX_TEX_COORD = TEXTURE_PACK(inttot16(128), inttot16(128));
...
GFX_TEX_COORD = TEXTURE_PACK(inttot16(128), 0);
...
GFX_TEX_COORD = TEXTURE_PACK(0, 0);


See what I mean? Try reversing the direction, and see if anything changes. If this reversed direction was purposefully set like this for orientating the texture and is the cause of this problem, there are the Flip flags you could use (though that "would" require texture wrapping according to the specs). To make a note, I think you need to keep it at 128, and not go one down, as I'm pretty sure that in order to use the entire texture, you have to encompass that entire area.

Also, while it doesn't change anything, you don't need to forcefully "not" set the texture wrapping like you have it there, since the actual entry starts with nothing (or known as 0).
_________________
DS - It's all about DiscoStew

#164694 - Echo49 - Fri Nov 14, 2008 3:29 am

Mapping the texture either way doesn't change whether it starts at the wrong column or not.

It appears to affect all the textures I'm using. I could add in constants to shift it but that's so ugly...

#164695 - elhobbs - Fri Nov 14, 2008 3:40 am

Echo49 wrote:
Thanks for the explanation.

I have another question. What is the difference between each of the texture coordinates transformation modes?

ie.
TEXGEN_OFF
TEXGEN_TEXCOORD
TEXGEN_NORMAL
TEXGEN_POSITION

One other thing is that although No$gba emulates it fine, on my DS some/all (can't tell, because some are pretty small) my textures have the last column of pixels put in the first column instead.

The texture is 128x128, so I use the coordinates (0,0), (128,0), (0,128), (128,128) when mapping the texture to my quad. I have not set any wrapping or repeating settings.

Any ideas on this?
here are the comments from the videoGL.h file
Code:
   TEXGEN_OFF      = (0<<30), /*!< use unmodified texcoord */
   TEXGEN_TEXCOORD = (1<<30), /*!< multiply texcoords by the texture-matrix */
   TEXGEN_NORMAL   = (2<<30), /*!< set texcoords equal to normal * texture-matrix, used for spherical reflection mapping */
   TEXGEN_POSITION = (3<<30)  /*!< set texcoords equal to vertex * texture-matrix */

the projection matrix can have a large impact on texture coordinates - particulary if you are not looking directly along an axis. While trying to implement a crosshair feature I had problems using the traditional approach of aligning a quad in fron of the view position. the texture would get all squigly depending on the view direction. my point is - the texture coordinates are not perfect on the ds.

#164696 - Echo49 - Fri Nov 14, 2008 10:11 am

I'm supposed to manually fudge the coordinates until it looks right? The thing that really bites me is that it works perfect on an emulator, but is wrong on the real hardware. You'd think it'd be the other way around :/

#164698 - elhobbs - Fri Nov 14, 2008 2:45 pm

I think there are two factor that may be effecting the alignment. the size of the object compared to the size of the texture - does the image need to be scaled a lot to fit on the object? and the texture coordinates for a texture width of 128 is 0 to 127 - 128 should wrap to 0 if wrapping is turned on.

I do not know about no$gba but desmume converts the fixed point texture coords to floating point before passing them off to opengl. so, I am not really surprised that an emulator would have better precision.

#164703 - DiscoStew - Fri Nov 14, 2008 7:21 pm

I've been trying to mimic your code with the devkitARM examples, but I've seen no problems, both in emulator and real hardware. Is there anything else you are doing with the hardware memory registers that you aren't showing us in the code snippet you gave? How about you condense your project (back it up first) to just showing a quad with the texture on it, and letting us have at it to find the problem? It's even possible that you might find the problem while doing this too. I just can't see how to find the problem without having the overall perspective of what you are doing.
_________________
DS - It's all about DiscoStew

#164705 - Echo49 - Fri Nov 14, 2008 9:07 pm

I will try to post a stripped down version of my code asap.

Would the way I init'd the 3D world have anything to do with it? It was mainly copy-pasted except for GL_BLEND, gluPerspective() and gluLookAt(). Basically any number xyy corresponds to x.yyf, from left to right, top to bottom. The reason I did this is because I'm porting a game from PC with that resolution, and I didn't want to recalculate all the sprite positions. The fovy value in gluPerspective "zooms" the field to the right position.

Code:
   //init 3d video
   videoSetMode(MODE_0_3D);
   vramSetBankA(VRAM_A_TEXTURE);
   vramSetBankE(VRAM_E_TEX_PALETTE);
   
   glInit();
   glEnable(GL_BLEND | GL_TEXTURE_2D | GL_ANTIALIAS);
   
   // setup the rear plane
   glClearColor(20,20,31,31);
   glClearPolyID(63);
   glClearDepth(0x7FFF);
   
   glViewport(0,0,255,191);
   
   glMatrixMode(GL_PROJECTION);
   glLoadIdentity();
   gluPerspective(101.2, 4.0/3.0, 0.1, 100); //fovy, aspect(width/height), zNear, zFar
   
   // camera is flipped around a bit - x increases left to right, y increases top to bottom (0,0) to (640,480)
   gluLookAt(   3.2, 2.4, -2.0,      //camera possition
               3.2, 2.4, 0.0,      //look at
               0.0, -1.0, 0.0);   //up

   glMatrixMode(GL_MODELVIEW);

#164707 - Echo49 - Sat Nov 15, 2008 1:23 am

Also, I have another question:

I was looking for some code which would allow me to rotate sprites around the local z axis. I found this post but in the end I settled for this:
Code:
glPushMatrix();
glTranslatef(x,y,0);
glRotatef(angle,0,0,1);
glTranslatef(-x,-y,0);

Draw();

glPopMatrix(0);

Aside from the use of the floating point version of the functions, are there any other downsides to doing it this way rather than the way pointed out in the other topic?

#164708 - DiscoStew - Sat Nov 15, 2008 2:18 am

Echo49 wrote:
Also, I have another question:

I was looking for some code which would allow me to rotate sprites around the local z axis. I found this post but in the end I settled for this:
Code:
glPushMatrix();
glTranslatef(x,y,0);
glRotatef(angle,0,0,1);
glTranslatef(-x,-y,0);

Draw();

glPopMatrix(0);

Aside from the use of the floating point version of the functions, are there any other downsides to doing it this way rather than the way pointed out in the other topic?


At this point, if anyone uses more than 1 axis at a time, it's pretty much better to go with Quaternions imo, but in your case with only 1 axis, you don't have to even use glRotate. Instead, why not try the single rotation functions, or for you specifically, glRotateZi. It's based on integers, but the LUT it uses is limited to 512 entries. Since the code is accessible from the "videoGL.h" file, you can see how it works, and make your own with more precision if needed.
_________________
DS - It's all about DiscoStew

#164713 - silent_code - Sat Nov 15, 2008 3:34 pm

Do you initialize the texture matrix to identity?
_________________
July 5th 08: "Volumetric Shadow Demo" 1.6.0 (final) source released
June 5th 08: "Zombie NDS" WIP released!
It's all on my page, just click WWW below.

#164717 - Echo49 - Sat Nov 15, 2008 8:33 pm

Code:
   // setup the rear plane
   glClearColor(20,20,31,31);
   glClearPolyID(63);
   glClearDepth(0x7FFF);
   
   glViewport(0,0,255,191);
   
   glMatrixMode(GL_PROJECTION);
   glLoadIdentity();
   gluPerspective(101.2, 4.0/3.0, 0.1, 100); //fovy, aspect(width/height), zNear, zFar
   
   // camera is flipped around a bit - x increases left to right, y increases top to bottom (0,0) to (640,480)
   gluLookAt(   3.2, 2.4, -2.0,      //camera possition
               3.2, 2.4, 0.0,      //look at
               0.0, -1.0, 0.0);   //up

   glMatrixMode(GL_MODELVIEW);

I do it once when I initialize but never again.

#164718 - Maxxie - Sat Nov 15, 2008 8:43 pm

I only see the projection matrix set there, not the texture matrix

Matrix Mode Texture: http://nocash.emubase.de/gbatek.htm#ds3dmatrixloadmultiply
Texture Matrix useage: http://nocash.emubase.de/gbatek.htm#ds3dtexturecoordinates
_________________
Trying to bring more detail into understanding the wireless hardware

#164722 - elhobbs - Sun Nov 16, 2008 1:42 am

I believe the texture matrix is only used for textures that are defined with TEXGEN_TEXCOORD. also, glInit sets it to identity. so, lacking code to change it then it would be fine.

#164746 - Echo49 - Mon Nov 17, 2008 5:41 am

Do I do
Code:
   glMatrixMode(GL_TEXTURE);
   glLoadIdentity();

to initialize it, or do I do something tricky with registers and the like?

#164750 - elhobbs - Mon Nov 17, 2008 9:25 am

Echo49 wrote:
Do I do
Code:
   glMatrixMode(GL_TEXTURE);
   glLoadIdentity();

to initialize it, or do I do something tricky with registers and the like?
yes, that is how it is done.

however, if you do not change it anywhere this is already handled for you when you called glInit() (I looked in the libnds source code to confirm) - which you need to do before using any of the gl* functions in libnds.

#164782 - Echo49 - Thu Nov 20, 2008 10:18 pm

OK, I think I've found the problem.

After a bit of messing around, it seems that the DS dislikes anything that isn't aligned to 4 bytes.

The following code all output different images (ie. shifted by n bytes) on No$gba, but all output the same thing on my DS! (ie. as if they were all the first one, pointer-2)
Code:
glTexImage2D(0, 0, GL_RGB4, TEXTURE_SIZE_32, TEXTURE_SIZE_32, 0, defaultparam, (u8*)(texture_bin-2));
glTexImage2D(0, 0, GL_RGB4, TEXTURE_SIZE_32, TEXTURE_SIZE_32, 0, defaultparam, (u8*)(texture_bin-1));
glTexImage2D(0, 0, GL_RGB4, TEXTURE_SIZE_32, TEXTURE_SIZE_32, 0, defaultparam, (u8*)texture_bin);
glTexImage2D(0, 0, GL_RGB4, TEXTURE_SIZE_32, TEXTURE_SIZE_32, 0, defaultparam, (u8*)(texture_bin+1));

Does anyone know anything about this? It seems to be independent of the order I generate the textures in, and also independent of the order I #include the textures in.

#164783 - elhobbs - Thu Nov 20, 2008 11:20 pm

yes, it does need to be aligned. libnds uses swiCopy to move the data to VRAM.
Code:
      swiCopy((uint32*)texture, addr , size / 4 | COPY_MODE_WORD);
the ds does not like unaligned ints.

#164784 - Echo49 - Thu Nov 20, 2008 11:27 pm

How I can make it aligned? At the moment I'm just #including them one after the other, and the number of bytes in all my textures are exact multiples of 4. I can't see how they could become unaligned :/

edit:
OK, so right now I'm doing this ugly hack to make it work properly. Since this is all done only once at the start of the game, efficiency is not really an issue but if there are better ways to do it I'd really like to know.
Code:
void LoadGLTexture(GL_TEXTURE_TYPE_ENUM type, int sizeX, int sizeY, const u8* texture)
{
   u32 size = 64 * (2 << (sizeX + sizeY));

   //align it to 4 bytes
   u32* _temp = new u32[size];
   u8* temp = (u8*)_temp;
   
   for (u32 i=0; i<size; ++i)
      temp[i] = texture[i];
   
   glTexImage2D(0, 0, type, sizeX, sizeY, 0, GL_TEXTURE_COLOR0_TRANSPARENT, temp);
   
   delete[] _temp;
}

#164786 - DiscoStew - Fri Nov 21, 2008 8:48 am

I think this is what you are looking for...

Code:
#define ALIGN(m)   __attribute__((aligned (m)))


This is under the "jtypes.h" file in the nds section of devkitARM, so it should already be included in your project. You place it in front on the same line as where the data you want aligned is defined, and when your project is compiled, it will align that data by whatever byte boundary you specify, which in this case would be per "4"-bytes.
_________________
DS - It's all about DiscoStew

#164787 - Echo49 - Fri Nov 21, 2008 10:12 am

Since my files are external files which are #included, do I add that into the makefile which generates the .h files?

#164788 - Cearn - Fri Nov 21, 2008 10:21 am

Echo49 wrote:
How I can make it aligned? At the moment I'm just #including them one after the other, and the number of bytes in all my textures are exact multiples of 4. I can't see how they could become unaligned :/

It's not the size that makes them unaligned, it's the datatype (possibly combined with using #include to add them to the project). If the data are byte-arrays, they will have byte-alignment. If you #include the array and the includer has loose bytes in the global scope that are word-aligned, the array could end up right behind the byte and not be aligned to 32-bits. For details on alignment and how it may affect you, see tonc:data-align.

Depending on exactly how the arrays are generated, byte-arrays (and halfword arrays) for graphics data may be alignment-unsafe even if you do not #include them. For C-arrays, you can add use ALIGN(4) on the definition (not just the declaration) to force word-alignment. Tools that export to assembly tend to properly align their data by themselves, so these are preferred over C-arrays. If this has happened for bin2o-generated data, I'm not sure there's anything you can do :\ .

#164789 - Echo49 - Fri Nov 21, 2008 10:48 am

Cearn wrote:
For C-arrays, you can add use ALIGN(4) on the definition (not just the declaration) to force word-alignment.
Do you mean definition AND declaration, or just the definition? (just want to make sure)

#164798 - Cearn - Fri Nov 21, 2008 3:35 pm

Well, I thought you'd have to do it at the definition since that's what the linker actually looks at, but now that I've tested it, it seems that adding it to just the declaration works as well.

Code:

// Example 1: no alignment
extern u8 foo[16];

u8 a;         // 00ED
u8 foo[16];   // 00DD ( crap :( )
u8 b;         // 00DC

// Example 2: with alignment
extern u8 foo[16] ALIGN(4);

u8 a;         // 00F0
u8 foo[16];   // 00E0 (yay :D )
u8 b;         // 00DC


Note, however, that this only works if the declaration of alignment is inside the same file as the definition, so effectively it really has to be added to the definition after all.

#164811 - Echo49 - Sat Nov 22, 2008 10:09 am

Mmm I can't get it to work this way. I guess I'll just stick with the current way I do it. One other thing though - can I replace the for loop with memcpy or something?

Code:
   u32* _temp = new u32[size];
   u8* temp = (u8*)_temp;
   
   for (u32 i=0; i<size; ++i)
      temp[i] = texture[i];

   //...
   delete[] _temp;

#164818 - DiscoStew - Sat Nov 22, 2008 8:50 pm

Memcpy should work better than looping like you have, but what exactly is not working with ALIGN(4)? Does nothing change when you use it like Cearn has shown?
_________________
DS - It's all about DiscoStew

#164823 - Echo49 - Sun Nov 23, 2008 4:25 am

The textures are .bin files included as a .h, so the directions for how to generate the .h files are in the makefile, right? I don't know much about makefiles, but I found the bit that outputs the contents of the .h file and tacked ALIGN(4) to the end of it like Cearn showed me above but it had no effect.