#159547 - silent_code - Wed Jul 02, 2008 2:19 pm
Hell-ow!
In the new demo I'm currently working on, which is based on the VSD (using devkitARM R23b and libnds 20071023) I released earlier, I get 30 FPS when enabling lighting...
That seems odd, because I'm only using one light, which is only set once at the start of the frame. Material propperties are only set four times (at max.) throughout the frame and I'm not hitting any hard limits - neither primitive / vertex list RAM (around 1400 triangles and 4200 vertices), nor rendering buffer lines (46 free in last frame, which is the max., as two of the 48 lines available are always getting filled, it seems)...
It's regardless weather the default specular table is being used (bit 15 in the specular / emissive color word of the material command) ect.
Disabling lighting, the frame rate pops back to 60.
Nothing fancy is involved, nothing else gets disabled, just the lighting.
Does anyone have a clue why that happens? The hardware is supposed to display scenes at 60 Herz with all features enabled and a fully filled pipeline, that means maxing out the throughput with as much geometry as supported by the hardware.
I guess I'll have to investigate that a little further, right?
_________________
July 5th 08: "Volumetric Shadow Demo" 1.6.0 (final) source released
June 5th 08: "Zombie NDS" WIP released!
It's all on my page, just click WWW below.
Last edited by silent_code on Wed Jul 02, 2008 10:04 pm; edited 3 times in total
#159549 - elhobbs - Wed Jul 02, 2008 2:58 pm
silent_code wrote: |
Does anyone have a clue why that happens? The hardware is supposed to display scenes at 60 Herz with all features enabled and a fully filled pipeline, that means maxing out the throughput with as much geometry as supported by the hardware.
I guess I'll have to investigate that a little further, right? |
I am not sure that statement is accurate. I think I remember reading something along the lines that it is possible to overload the geometry engine if there are too many objects on a scanline, even if the total number of polygons and vertices are below the limit for the whole scene.
EDIT: I want to say I saw this in some GDC pdfs that were floating around the forum
#159550 - silent_code - Wed Jul 02, 2008 3:13 pm
Thank you. :^)
Yes, I know, that happens and I am pretty sure that case is indicated by the number of min. free rendering lines in the previous frame (the number is available through a memory location), which is at 46 (best value possible) all the time. :^/
But again, the problem is, the frame drop only happens when I enable lighting (I am still testing which command causes that), nothing else is changed. The exact same scene, only without lighting, renders at 60 hz. (?) 8^|
Maybe I am overflowing the graphics command buffer? I'm using "immediate mode", no display lists.
EDIT: It is also independent of how much is seen on screen! It even happens when both list RAMs are empty (no geometry on screen!) That's so confusing! X^(
EDIT: I am starting to believe I am actually really sending too many commands, which causes a stall, because I am sending a rather big scene (2880 triangles, 1744 * 3 vertices with position, UV and normal - color is set via material) brute force to the hardware. In that case, I guess, it doesn't matter how much geometry is visible... Can anyone confirm that?
_________________
July 5th 08: "Volumetric Shadow Demo" 1.6.0 (final) source released
June 5th 08: "Zombie NDS" WIP released!
It's all on my page, just click WWW below.
Last edited by silent_code on Wed Jul 02, 2008 6:28 pm; edited 2 times in total
#159556 - sverx - Wed Jul 02, 2008 5:12 pm
silent_code wrote: |
[...] I am sending a rather big scene (2880 triangles, 1744 * 3 vertices with position, UV and normal - color is set via material) ... |
I've read here on wikipedia about a 2048 triangles per frame hard limit. Maybe it's not, but I had to tell you :)
#159558 - silent_code - Wed Jul 02, 2008 5:40 pm
sverx wrote: |
I've read here on wikipedia about a 2048 triangles per frame hard limit. Maybe it's not, but I had to tell you :) |
Thanks, but as I have already posted, I am not hitting the hard limit for visible (that's what you mean) triangles and / or vertices.
Again, this only happens when lighting is enabled - and only then! The very same scene without lighting will run at 60 hz! |^C
Then again, to remove any confusion: It even happens without anything on screen and with the primitive / vertex list RAMs empty! 8^|
_________________
July 5th 08: "Volumetric Shadow Demo" 1.6.0 (final) source released
June 5th 08: "Zombie NDS" WIP released!
It's all on my page, just click WWW below.
Last edited by silent_code on Wed Jul 02, 2008 6:25 pm; edited 1 time in total
#159560 - elhobbs - Wed Jul 02, 2008 5:56 pm
silent_code wrote: |
sverx wrote: | I've read here on wikipedia about a 2048 triangles per frame hard limit. Maybe it's not, but I had to tell you :) |
Thanks, but as I have already posted, I am not hitting the hard limit for *visible* (that's what you mean) triangles and / or vertices.
Again, this only happens when lighting is enabled - and ONLY then! The VERY SAME SCENE without lighting will run at 60 Herz!
And again, to remove any confusion: IT EVEN HAPPENS WITH NOTHING ON SCREEEN and the primitive / vertex list RAMs EMPTY!!!! |
at the risk of producing more rage/frustration induced replies ;) do you have any code that we can look at?
#159563 - silent_code - Wed Jul 02, 2008 6:15 pm
<lol> What rage? I'm totally easy. :^) The caps were just there to make sure nobody overlooks that. :^D
Well, it's simply the VSD from my page, but with minor corrections, that are not related to the problem.
I'm using an extended dataset, which includes a bigger scene (a *lot* more than just a cube) and an updated version of the character model from the original demo (now it has 504 triangles and around 432 vertices - although most vertices are redundand!)
I'll try to reproduce what I get in the other demo.
I'll be back.
_________________
July 5th 08: "Volumetric Shadow Demo" 1.6.0 (final) source released
June 5th 08: "Zombie NDS" WIP released!
It's all on my page, just click WWW below.
#159565 - ritz - Wed Jul 02, 2008 6:18 pm
I took a quick peek at your 1.5.1 source and the following is just a big guesstimation:
- you're poly format has light0 always enabled no matter what
- when your boolean light flag is true, you start calling the glNormal(NORMAL_PACK()), otherwise you simply don't call glNormal
So, my guess is that you might be running slightly above 60fps when the light flag is false, and your glNormal calls are dropping you below 60fps (giving you the 30) when the flag is true.
#159567 - silent_code - Wed Jul 02, 2008 6:32 pm
Yeah, that's what I am suspecting, too, but why is that?
I'm not certain how the thing is called, but is it possible that the Graphics FIFO is overflowed and stalls execution for an additional frame when too many commands are pushed? That would indicate why it is indipendent of what is actually displayed on screen.
What that basically means, is that the culling and clipping stage can't process the amount of data fast enough and the FIFO gets full at some point, stalling untill the FIFO is empty again (thus dropping a frame.)
Then, what is is called (in general and in libnds) and how much space is there? Where can I check how full it is?
_________________
July 5th 08: "Volumetric Shadow Demo" 1.6.0 (final) source released
June 5th 08: "Zombie NDS" WIP released!
It's all on my page, just click WWW below.
#159569 - Maxxie - Wed Jul 02, 2008 6:50 pm
If this is the reason of dropping, you can test the langth by pushing increasing amounts of NOPs into the fifo until you notice the framedrop.
#159572 - DekuTree64 - Wed Jul 02, 2008 6:53 pm
Does it run at 60 with lighting enabled on a simpler scene?
In most games I've worked on, just sending enough commands to get near the poly/vertex limits takes most of a 60Hz frame. That does have processing of animation matrices mixed in, but also display lists rather than immediate mode for the polygons themselves.
The GPU is quite fast though. RAM is a much bigger bottleneck, so most likely the GPU spending most of its time waiting around for the CPU to give it commands, and the CPU is waiting around for RAM to give it data, and then spending some more time to decide how to give the data to the GPU.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku
Last edited by DekuTree64 on Wed Jul 02, 2008 6:57 pm; edited 1 time in total
#159574 - silent_code - Wed Jul 02, 2008 6:56 pm
@DT64: Yes, it does...
(I'll post more in a few minutes!)
A few seconds later: Removing the draw call of one of the characters fixed it. 8^|
What I need to know are some guesstimated numbers (or a link to a gbatek reference - I've been browsing it for hours and couldn't find anything) for the amount of command data / frame, if possible.
I'll be doing some scene management anyway, so I will still be able to draw more than one character on screen, but it just seemed to odd, that the framerate would drop when sending the normals along with positions and UVs. ... :^/
@ Maxxie: I guess not sending the additional character did the job, too, but thanks for the suggestion! :^)
@DT64: Yes, I know the processors are quite fast and culling and clipping that amount of geometry should not be the problem. We can even savely assume that not even those opperations have to be performed, because the framedrop also occurs with no geometry inside the view frustum. :^/
So from what you wrote, I guess it has to be the memory... ?
EDIT: Is that what I am looking for?
gbatek wrote: |
FIFO / PIPE Number of Entries
The FIFO has 256 entries, additionally, there is a PIPE with four entries (giving a total of 260 entries). If the FIFO is empty, and if the PIPE isn't full, then data is moved directly into the PIPE, otherwise it is moved into the FIFO. If the PIPE runs half empty (less than 3 entries) then 2 entries are moved from the FIFO to the PIPE. The state of the FIFO can be obtained in GXSTAT.Bit16-26, observe that there may be still data in the PIPE, even if the FIFO is empty. Check the busy flag in GXSTAT.Bit27 to see if the PIPE or FIFO contains data (or if a command is still executing).
Each PIPE/FIFO entry consists of 40bits of data (8bit command code, plus 32bit parameter value). Commands without parameters occupy 1 entry, and Commands with N parameters occupy N entries. |
And especially:
gbatek wrote: |
If the FIFO is full, then a wait is generated until data is removed from the FIFO, ie. the STR opcode gets freezed, during the wait, the bus cannot be used even by DMA, interrupts, or by the NDS7 CPU. |
So the following should solve the problem, right?
gbatek wrote: |
GXFIFO Access via DMA
Larger pre-calculated data blocks can be sent directly to the FIFO. This is usually done via DMA (use DMA in Geometry Command Mode, 32bit units, Dest=4000400h/fixed, Length=NumWords, Repeat=0). The timings are handled automatically, ie. the system (should) doesn't freeze when the FIFO is full (see below Overkill note though). DMA starts when the FIFO becomes less than half full, the DMA does then write 112 words to the GXFIFO register (or less, if the remaining DMA transfer length gets zero). |
But I'm still not 100% sure that's it. :^/
_________________
July 5th 08: "Volumetric Shadow Demo" 1.6.0 (final) source released
June 5th 08: "Zombie NDS" WIP released!
It's all on my page, just click WWW below.
#159579 - sajiimori - Wed Jul 02, 2008 7:46 pm
It sounds like you're not finishing the upload during a VDraw, presumably because the CPU is blocking while waiting for your vertex commands to finish (apparently because the hardware is taking longer to process them, due to the lighting calculations).
But you can totally run a high-poly scene with all lights turned on, at 60fps -- I just tried it not long ago, so I think you're running into a bottleneck somewhere. Edit: But then again, my background model was vertex lit, and only the characters were using dynamic lights...
I never use the separate gfx registers (just above the FIFO in address space). Are they a lot slower than the FIFO? If so, you could try using the FIFO instead. You don't have to build a display list -- you just have to upload the data in a slightly different way.
#159585 - silent_code - Wed Jul 02, 2008 8:06 pm
Thanks. :^)
Well, currently, I'm just using videoGL's immediate mode commands (glVertex etc).
But, does invisible geometry get lit? I think the lighting calculations are not even touching any of the invisible stuff (which isn't in the list RAMs anyway.)
I have tried to read the number of FIFO entries and it's allways 0...
I'll poke the Graphics Status a little bit more, but I didn't get any valuable information, yet.
This becomes interesting. :^)
_________________
July 5th 08: "Volumetric Shadow Demo" 1.6.0 (final) source released
June 5th 08: "Zombie NDS" WIP released!
It's all on my page, just click WWW below.
#159588 - zeruda - Wed Jul 02, 2008 8:16 pm
Well, all I can say is I have just tested and I pushed around 1900-2000 textured polygons with 1 hardware light with antialiasing and fogging switched on as well as collision engine and stuff and it ran 60 fps quite easily, probably around 25% cpu time to spare. And from what I remember using 4 lights caused no problems at all either.
glVertex is very slow. No chance of hitting 60fps if you have a complex scene and are using that, you have to use display lists. I think I only managed around 400 or 500 polygons on screen before the framerate dropped with that. The only other thing I can think of is you are doing shadows and stuff right? I haven't gotten around to that yet but there might be something there that causes an issue when combined with lighting.
#159590 - sajiimori - Wed Jul 02, 2008 8:27 pm
I would not be surprised if the hardware did the lighting calculations for off-screen polys. Those polys don't end up in the final buffers, but perhaps the calculations are done before clipping.
#159591 - silent_code - Wed Jul 02, 2008 8:40 pm
Nope, actually, it doesn't matter, if I use shadowing or not, but for testing purposes I have disabled it... the only thing that affects it is sending normals (immediate mode, via glNormal) or not (when lighting is "disabled") - all other effects, even capturing, don't have ANY effect on the bahaviour. I disabled shadowing to reduce the geometry load.
Also, reducing the geometry and sending normals for it will also work.
That means the Geometry Command FIFO stalls the bus.
I haven't done any special testing, so I don't have any real numbers, but I sure get nearly 1000 textured, shaded, antialiased, fogged, shadowed (including the shadow geometry in the "1000") and all bells and whissles triangles before the frame drop... so I guess it's still reasonable. I was just shocked that I didn't hit the limit and still it would drop a frame.
Well, DMA'ed display lists are the way to go, if I want to be able to max out geometry usage, I guess.
Thanks for all the advises so far. I'm still open to suggestions. :^D
Btw: I've tried to read the Graphics Status register (in the main loop) and here's what I've found:
Code: |
uint32 gfx_status = GFX_STATUS;
(gfx_status & BIT(0)); // = 0
(gfx_status & BIT(1)) >> 1; // = 0
(gfx_status & (31 << 8)) >> 8; // bits 8..12 = 0
(gfx_status & BIT(13)) >> 13; // = 0
(gfx_status & BIT(14)) >> 14; // = 0
(gfx_status & BIT(15)) >> 15; // = 0
(gfx_status & (511 << 16)) >> 16; // bits 16..24 = 0 -> Number of 40bit-entries in Command FIFO (0..256)
(gfx_status & BIT(24)) >> 24; // = 0 -> yes, bit 24 again, it means: Command FIFO Full
(gfx_status & BIT(25)) >> 25; // = 1 -> Command FIFO Less Than Half Full
(gfx_status & BIT(26)) >> 26; // = 1 -> Command FIFO Empty
(gfx_status & BIT(27)) >> 27; // = 0 -> Geometry Engine Busy
(gfx_status & (3 << 30)) >> 30; // bits 30..31 = 0
|
What's wrong with this? ... EDIT: It needs shifting. EDIT: Fixed :^)
The relevant parts have been named.
I am doing this just before flushing. Doing it after flushing just switches the Geometry Engine Busy bit from 0 to 1.
Should I try something else?
EDIT: I have tried printing DISP3DCNT / GFX_CONTROL. At any stage it reads 0x099B;
If I unserstand that correctly, it means that neither "Color Buffer RDLINES Underflow" (render buffer line underflow - the lowest I get is 35 and in avg. it's 44 now, so that's not a problem), nor "Polygon / Vertex RAM Overflow" occured.
Shouldn't the latter one be set when a frame was dropped?
EDIT: Nope, because I don't fill the RAM to that point... silly me. ;^)
How can a frame drop be recognized, other than counting FPS?
_________________
July 5th 08: "Volumetric Shadow Demo" 1.6.0 (final) source released
June 5th 08: "Zombie NDS" WIP released!
It's all on my page, just click WWW below.
Last edited by silent_code on Wed Jul 02, 2008 10:10 pm; edited 4 times in total
#159599 - silent_code - Wed Jul 02, 2008 9:59 pm
sajiimori wrote: |
I would not be surprised if the hardware did the lighting calculations for off-screen polys. Those polys don't end up in the final buffers, but perhaps the calculations are done before clipping. |
Aren't lighting calculations done in view or screen space (on most platforms)?
And doing lighting before clipping is highly unlikely. Just look at the warping caused in clipped primitives... that includes warped UV and lighting, which is due to the new "clip" vertices.
I even think, that clipping happens after culling, so most of all (offscreen or not) primitives would get discarted at an early stage, with minimum computations.
I could be wrong, though.
_________________
July 5th 08: "Volumetric Shadow Demo" 1.6.0 (final) source released
June 5th 08: "Zombie NDS" WIP released!
It's all on my page, just click WWW below.
#159602 - DekuTree64 - Wed Jul 02, 2008 10:56 pm
sajiimori wrote: |
I never use the separate gfx registers (just above the FIFO in address space). Are they a lot slower than the FIFO? |
Faster, actually. For the FIFO, you have to feed it a command number plus parameters. With the registers the command gets poked in automatically so the CPU has less work to do. Of course display lists are faster when you can't code specific command sequences.
silent_code wrote: |
How can a frame drop be recognized, other than counting FPS? |
You could check if the VCount register is >= 192 just before sending the swap buffer command.
silent_code wrote: |
And doing lighting before clipping is highly unlikely. |
Judging by the fact that lighting and vertex colors overwrite eachother, my guess is that the lighting calculations are done by the normal command itself, rather than storing the normal and waiting until you send a vertex.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku
#159629 - nce - Thu Jul 03, 2008 4:45 am
indeed the only problem here is more likely the immediate mode.
I was doing my test a long time ago in immediate mode and using the worst type ever : triangle list !
result maximum triangles I can push in one frame : ~1500. and that was 1 color, 1 uv , 1 position for each vertex ( 3 per triangles )
Since then I started to first implement a trianglestrip and after converting evertyhing in calllist.... no other choice I think.
(I'm still planning the use the FIFO irq to parallelize the rendering with my other code)
_________________
-jerome-
#159632 - a128 - Thu Jul 03, 2008 8:24 am
use also VTX_XY, XZ,YZ vertex commands....they need 1cycle less then VTX16
#159644 - silent_code - Thu Jul 03, 2008 2:38 pm
Thanks for the hints guys!
So, there's really nothing I can do about it, but go on with the features I want to implement, which includes using DL and a scene graph anyway. :^)
Plus, I want to get away from the aweful scructures used in the demo. I really had to foce myself to use them, because I wanted it to be as simple as possible. :^)
So, let's see how things turn out. :^)
PS: I still have to experiment with reading the GFX_CONTROL register, which should tell me how full the FIFO is etc. (Any suggestions on that?)
_________________
July 5th 08: "Volumetric Shadow Demo" 1.6.0 (final) source released
June 5th 08: "Zombie NDS" WIP released!
It's all on my page, just click WWW below.